Guy Harrison - Yet Another Database Blog

Friday

Jan262007

Grid control therapy

Friday, January 26, 2007 at 10:03PM

I've had a lot of bad experiencing installing and managing the OEM grid control. Today has been the worst day ever and - as an alternative to self-mutilation - I decided that I would document my woes as they occur as a sort of therapy

Here's the background to my latest woes:

On Monday I realized that the database that holds my test grid control repository was severly corrupt: a bunch of ora-600s with internal codes that were associated with rollback segment corruption. Since the DB had no "real" data and I was lacking a recent backup I decided to rebuild the (RAC cluster) database.

The rebuild was no drama, but since this DB had my grid control repository I decided I'd better re-install grid control as well. That's when everything went south.

The installation would proceed normally without errors until it reported errors configuring the agent.

Logs showed that the agent was installed apparently OK,but that the management server was no-where to be found. opmn status showed every other friggin process but not even an entry for the OMS. The SYSMAN schema had not even been properly installed. Somehow, installing the OMS for the second time was screwing things up.

[oracle@mel601416 ~]$ emctl status oms
Oracle Enterprise Manager 10g Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
Oracle Management Server is not functioning because of the following reason:
Unexpected error occurred. Check error and log files.

There's no trace files indicating OMS has ever really started (No emoms.trc for instance).

The only logs are emctl logs - created when I do a status request - with AgentStatus.pm: Unknown command encountered.

So next I tried tearing everything down and installing grid control with a new database, thinking that surely this would avoid the apparent "I don't need to install a DB" problem. No luck. No database created - no attempt to start OMS.

So I tried renaming the /etc/oraInst.loc file so my installer was unaware of all previous installs. Sigh. Same result.

I spent many hours scouring technet, metalink and googling.... eventually found http://forums.oracle.com/forums/thread.jspa?threadID=337544&start=15&tstart=0 in which a bunch of poor souls who have encountered this problem commiserate. Suggestions from there include:

Remove any 'legacy' listeners (9i)
This mysterious sequence:

echo `find / -name libdb.so.2 -print` >> /etc/ld.so.conf
find . -name libdb.so.2 -print
.......
vi /etc/ldso.conf
- add result from find
ldconfig -v (run as root)

clean up /etc/hosts (remove ipv6 and check hostname is exactly right)
turn off seLinux
remove all symlinks in the installation directory path

The last one really rang a bell with me, since only a few weeks ago I had in fact symlinked the oracle home to a new filesystem. So....

Hooray! Avoiding the symlinks allowed the OMS configuration to proceed. Still had many hours of trying to install: first had to remove SYSMAN and MGMT_VIEW schemas (see metalink note 358627.1). Then I had mysterious communication failures during the final phase of opmn setup. Sigh. Being as it was close to midnight I blew it all away in the hope that a straight forward install would now work and (at about 1am) .... Kapla! (Klingon for Victory: see http://www.khemorex-klinzhai.de/e/Hol/).

Alls well that ends well?

This isn't the first time I've struggled with grid control install. My original attempt to install on a windows system never succeeded and while my previous linux based control worked fine most of the time, I still must have spend many hours fiddling with configuration files to make everything talk together properly.

I'm hardly an unbiased source - our Spotlight and Foglight products are sometimes seen to be competitive so you might want to take my opinions with a grain of salt. However, I've become convinced that while Grid control and OEM undoubtedly reduce DBA overheads (though I think in some areas - particularly diagnostics and RAC OEM is way short of acceptable) the overhead of managing and configuration OEM/Grid is just too high. And the reason is that the OEM stack is just way too complex a solution for the task at hand. If you do a opmnctl status you'll see a list of at list eight seperate entities that interact to perform OMS services - and I bet that very few DBAs know what each of them are. And that's before you add in the agents and internal web apps that perform the various OEM functions.

Compare the OEM/Grid implementation to the far simpler MySQLs LAMP based solution: the number of moving parts in the MySQL soluiton are far less than in OEM, and yet it appears to provide virtually the same functionality (for the DBA at least).

I can't help thinking that in the future every Oracle DBA is going to need to be an expert in the OEM software stack - and consequently an expert in J2EE, Apache, OC4J as well as OEM specifically - to be able to manage enterprise deployments. That's good news for the Oracle DBA job market - as automation reduces the overhead of managing the database, the overhead of managing the automation itself keeps us all in work :-).

Guy Harrison |

1 Comment |

Thursday

Jan182007

Using EXPLAIN EXTENDED to see view query rewrites

Thursday, January 18, 2007 at 9:22AM

At the MySQL Mini Conference in Sydney this week we discussed how to use EXPLAIN EXTENDED to view the rewrites undertaken by the MySQL optimizer. IN particular, to see if MySQL performs a merge of the query into the view definition, or if it creates a temporary table.

It can be tricky to optimize queries using views, since it's often hard to know exactly how the query will be resovled - will MySQL push merge the text of the query and the view, or will it use a temporary table containing the views result set and then apply the query clauses to that?

In general, MySQL merges query text except when the view definition includes a GROUP BY or UNION. But to be sure we can use EXPLAIN EXTENDED. This also helps when we get confusing output in the EXPLAIN output.

For instance if we have a view definition like this:

CREATE VIEW user_table_v AS
   SELECT *
   FROM information_schema.tables ist
      WHERE table_type='BASE TABLE';

and try and explain a query like this:

explain select * from user_table_v WHERE table_schema='mysql'\G

We get output like this, which might be difficult to interpret unless we know the view defition:

*************************** 1. row ***************************
   id: 1
select_type: SIMPLE
      table: ist
      type: ALL
possible_keys: NULL
      key: NULL
   key_len: NULL
      ref: NULL
      rows: 2
   filtered: 100.00
      Extra: Using where

Note the table "ist", only by looking at the view definition can we interepret this. But if we do an EXPLAIN EXTENDED followed by a SHOW WARNINGS we see the exact text:

*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `ist`.`TABLE_NAME` AS `TABLE_NAME` from `information_schema`.`tables` `ist` where ((
`ist`.`TABLE_SCHEMA` = _utf8'mysql') and (`ist`.`TABLE_TYPE` = _utf8'BASE TABLE'))
1 row in set (0.00 sec)

And from this we can see that MySQL did indeed merge the WHERE clauses of both the query and the view definition.

If we look at the output for a view like this:

CREATE VIEW table_types_v AS
   SELECT table_type,count(*)
      FROM information_schema.tables ist
      GROUP BY table_type;

The we see the following output, in which we can see that MySQL created a temporary table and then applied the WHERE clause from the query:

*************************** 1. row ***************************
   id: 1
select_type: PRIMARY
      table: NULL
      type: NULL
possible_keys: NULL
      key: NULL
   key_len: NULL
      ref: NULL
      rows: NULL
   filtered: NULL
      Extra: Impossible WHERE noticed after reading const tables
*************************** 2. row ***************************
   id: 2
select_type: DERIVED
      table: ist
      type: ALL
possible_keys: NULL
      key: NULL
   key_len: NULL
      ref: NULL
      rows: 2
   filtered: 100.00
      Extra: Using temporary; Using filesort
2 rows in set, 1 warning (0.00 sec)

*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `table_types_v`.`table_type` AS `table_type`,`table_types_v`.`count(*)` AS `count(*)
` from `mysql`.`table_types_v` where (`table_types_v`.`table_type` = _utf8'BASE TABLE')
1 row in set (0.00 sec)

EXPLAIN EXTENDED is an invaluable tool for tuning SQL statements, and even more so when working with views.

Guy Harrison |

2 Comments |

tagged

mysql in

mysql

Wednesday

Jan032007

D.I.Y. MySQL 5.1 monitoring

Wednesday, January 3, 2007 at 11:32PM

I wrote recently about using events and the new processlist table in MySQL 5.1 to keep track of the number of connected processes. Although having the PROCESSLIST available as an INFORMATION SCHEMA table is usefull, it seemed to me that having SHOW GLOBAL STATUS exposed in a similar fashion would be far more useful. So at the MySQL UC last year, I asked Brian Aker if that would be possible. I know how many enhancement requests MySQL has to deal with, so I was really happy to see that table appear in the latest 5.1 build (5.1.14 beta).

This table, together with the EVENT scheduler, lets us keep track of the values of status variables over time without having to have any external deamon running. This won't come anywhere near to matching what MySQL have made avalable in Merlin, but still could be fairly useful. So lets build a simple system using events to keep track of "interesting" status variables....

I also thought I’d take the opportunity of working with the MySQL Workbench to design the tables involved. Unfortunately the product is effectively unusable on my system until at least rc7 (due when?) and b/c it only support windows, I’m not up for a download the source and compile. Oh well. I used our own (Quest Software's) Toad Data Modeller instead.

The idea is to schedule some events that trap (snapshot) the contents of the PROCESSLIST and GLOBAL_STATUS variables into tables that can be used to track trends and monitor performance in a simple way. I created three tables to hold the data:

GHSNAP_STATUS_VARIABLES lists the variables names we consider "interesting". If the variable_name is listed here and CAPTURE_FLAG=1, then we will capture the value of the variable. VARIABLE_SOURCE indicates the table from which the data is obtained - I only implemented GLOBAL_VARIABLES for now
GHSNAP_SNAPSHOTS contains one row for each snapshot of the table we take. The UPTIME column is used to detect server restarts so we don't calculate nonsense deltas or rates
GHSNAP_SNAPSHOT_VALUES contains one row for each variable for each snapshot. We capture the raw value of the variable, its change since the last snapshot and the rate of change per second.

This file installs the tables, stored procedures and events. You may have to run it from the MySQL query browser to avoid errors caused by the command line client not processing DELIMITER statements properly. The script creates a new database GH_SNAPSHOTS. Three stored procedures are created:

ghsnap_populate_variables creates default entries in the GHSNAP_STATUS_VARIABLES table and is run only during installation. By default I set it up to capture all the INNODB%, COM% and QCACHE% variables but you might want to change the CAPTURE_FLAG for those you do/don't want.
ghsnap_take_snapshot takes a snapshot of GLOBAL_STATUS and stores it in the snapshot tables. It also works out if the server has been restarted and if not, calculates rates and deltas.
ghsnap_delete_snapshots deletes all snapshots more than a certain number of days old.

There's two events defined:

ghsnap_take_snap_event runs (by default) every five minutes and simply executes ghsnap_take_snapshot.
ghsnap_delete_snap_event runs every 5 hours (by default) and deletes snapshots more than 1 day old (you can edit the installation script to change this default).

There's some obvious additional things that we could do with this (optimze deletes via partitioning, calculate ratios, capture other state, etc) but this probably has some real value already. Now I can issue queries such as this:

select snapshot_id,snapshot_timestamp , sum( case variable_name when 'COM_COMMIT' then variable_value_rate end ) commit_ps, sum( case variable_name when 'COM_SELECT' then variable_value_rate end ) select_ps, sum( case variable_name when 'COM_INSERT' then variable_value_rate end ) insert_ps, sum( case variable_name when 'COM_DELETE' then variable_value_rate end ) delete_ps, sum( case variable_name when 'COM_UPDATE' then variable_value_rate end ) update_ps from ghsnap_snapshots join ghsnap_snapshot_values using (snapshot_id) where snapshot_id>70 group by snapshot_id,snapshot_timestamp

To view the activity on my server. Using a reporting or charting tool I can generate usefull charts, etc. For instance, using the BIRT module in eclipse I created the following chart showing my SQL execution rates:

Nothing earth-shattering but still useful and - with the event scheduler, stored procedures and the new INFORMATION_SCHEMA tables we don't need any external infrastructure to capture this information.

We'll probably implement something like this in the next version of our freeware Spotlight on MySQL as well as better diagnostics on replication and exploiting the fact that in 5.1 you can get SELECT access to logs. Feel free to check it out.

P.S. don't forget that to set EVENT_SCHEDULER=1 to enable events on your 5.1 server. And watch out on Windows where the event scheduler in 5.1.14 still seems a bit unstable.

Guy Harrison |

2 Comments |

mysql

Wednesday

Sep272006

10g tracing quick start

Wednesday, September 27, 2006 at 8:40PM

Oracle’s released a few new facilities to help with tracing in 10g, here’s a real quick wrap up of the most significant:

Using the new client identifier

You can tag database sessions with a session identifier that can later be used to identify sessions to trace. You can set the identifier like this:

begin
dbms_session.set_identifier('GUY1');
end;

You can set this from a login trigger if you don’t have access to the source code. To set trace on for a matching client id, you use DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE:

BEGIN

DBMS_MONITOR.client_id_trace_enable (client_id    => 'GUY1',
                                    waits       => TRUE,
                                    binds       => FALSE
                                    );
END;

You can add waits and or bind variables to the trace file using the flags shown.

Tracing by Module and/or action

Many Oracle-aware applications set Module and action properties and you can use these to enable tracing as well. The serv_mod_act_trace_enable method allows you to set the tracing on for sessions matching particular service, module, actions and (for clusters) instance identifiers. You can see current values for these usng the following query:

SELECT DISTINCT instance_name, service_name, module, action
FROM gv$session JOIN gv$instance USING (inst_id);

INSTANCE_NAME SERVICE_NA MODULE ACTION

Guy Harrison |

10g time model query

Wednesday, September 27, 2006 at 8:36PM

Joining the 10g time model to the traditional wait interface views and taking advantage of the wait_class data is something most of us have probably done. Here's my standard queries that do that thing....

fCOLUMN wait_class format a20
COLUMN name format a30
COLUMN time_secs format 999,999,999,999.99
COLUMN pct format 99.99

SELECT wait_class, NAME, ROUND (time_secs, 2) time_secs,
      ROUND (time_secs * 100 / SUM (time_secs) OVER (), 2) pct
FROM (SELECT n.wait_class, e.event NAME, e.time_waited / 100 time_secs
         FROM v$system_event e, v$event_name n
         WHERE n.NAME = e.event AND n.wait_class <> 'Idle'
               AND time_waited > 0
      UNION
      SELECT 'CPU', 'server CPU', SUM (VALUE / 1000000) time_secs
         FROM v$sys_time_model
         WHERE stat_name IN ('background cpu time', 'DB CPU'))
ORDER BY time_secs DESC;

Which generates CPU and wait event times broken down to the event name:

WAIT_CLASS NAME TIME_SECS PCT

Guy Harrison |