Next Generation
Databases: NoSQL,
NewSQL and Big Data

Buy at Amazon
Buy at Apress

Search

Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter

                                    

 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 

                                                

Friday
Jan262007

Grid control therapy

I've had a lot of bad experiencing installing and managing the OEM grid control.  Today has been the worst day ever and - as an alternative to self-mutilation - I decided that I would document my woes as they occur as a sort of therapy

<p>oem blog</p>  

Here's the background to my latest woes:

On Monday I realized that the database that holds my test grid control repository was severly corrupt: a bunch of ora-600s with internal codes that were associated with rollback segment corruption.  Since the DB had no "real" data and I was lacking a recent backup I decided to rebuild the (RAC cluster) database.

The rebuild was no drama, but since this DB had my grid control repository I decided I'd better re-install grid control as well.  That's when everything went south.

The installation would proceed normally without errors until it reported errors configuring the agent. 

Logs showed that the agent was installed apparently OK,but that the management server was no-where to be found.  opmn status showed every other friggin process but not even an entry for the OMS.  The SYSMAN schema had not even been properly installed.  Somehow, installing the OMS for the second time was screwing things up.

[oracle@mel601416 ~]$ emctl status oms
Oracle Enterprise Manager 10g Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.
Oracle Management Server is not functioning because of the following reason:
Unexpected error occurred. Check error and log files.

There's no trace files indicating OMS has ever really started (No emoms.trc for instance).

The only logs are emctl logs - created when I do a status request - with AgentStatus.pm: Unknown command encountered.

So next I tried tearing everything down and installing grid control with a new database, thinking that surely this would avoid the apparent "I don't need to install a DB" problem.  No luck.  No database created - no attempt to start OMS.

So I tried renaming the /etc/oraInst.loc file so my installer was unaware of all previous installs. Sigh.  Same result.   

I spent many hours scouring technet, metalink and googling.... eventually found http://forums.oracle.com/forums/thread.jspa?threadID=337544&start=15&tstart=0 in which a bunch of poor souls who have encountered this problem commiserate.  Suggestions from there include:

     
  • Remove any 'legacy' listeners (9i)
  •  
  • This mysterious sequence:

echo `find / -name libdb.so.2 -print` >> /etc/ld.so.conf
find . -name libdb.so.2 -print
.......
vi /etc/ldso.conf
- add result from find
ldconfig -v  (run as root)                   

     
  • clean up /etc/hosts (remove ipv6 and check hostname is exactly right)
  •  
  • turn off seLinux
  •  
  • remove all symlinks in the installation directory path
     

The last one really rang a bell with me, since only a few weeks ago I had in fact symlinked the oracle home to a new filesystem.  So....

Hooray!  Avoiding the symlinks allowed the OMS configuration to proceed.  Still had many hours of trying to install:  first had to remove SYSMAN and MGMT_VIEW schemas (see metalink note 358627.1).  Then I had mysterious communication failures during the final phase of opmn setup.  Sigh.  Being as it was close to midnight I blew it all away in the hope that a straight forward install would now work and (at about 1am) .... Kapla!  (Klingon for Victory: see http://www.khemorex-klinzhai.de/e/Hol/). 

Alls well that ends well?

This isn't the first time I've struggled with grid control install. My original attempt to install on a windows system never succeeded and while my previous linux based control worked fine most of the time, I still must have spend many hours fiddling with configuration files to make everything talk together properly.

I'm hardly an unbiased source - our Spotlight and Foglight products are sometimes seen to be competitive so you might want to take my opinions with a grain of salt.  However, I've become convinced that while Grid control and OEM undoubtedly reduce DBA overheads (though I think in some areas - particularly diagnostics and RAC OEM is way short of acceptable) the overhead of managing and configuration OEM/Grid is just too high.  And the reason is that the OEM stack is just way too complex a solution for the task at hand.  If you do a opmnctl status you'll see a list of at list eight seperate entities that interact to perform OMS services - and I bet that very few DBAs know what each of them are.   And that's before you add in the agents and internal web apps that perform the various OEM functions.

Compare the OEM/Grid implementation to the far simpler MySQLs LAMP based solution: the number of moving parts in the MySQL soluiton are far less than in OEM, and yet it appears to provide virtually the same functionality (for the DBA at least).

I can't help thinking that in the future every Oracle DBA is going to need to be an expert in the OEM software stack - and consequently an expert in J2EE, Apache, OC4J as well as OEM specifically - to be able to manage enterprise deployments. That's good news for the Oracle DBA job market - as automation reduces the overhead of managing the database, the overhead of managing the automation itself keeps us all in work :-).


 

Thursday
Jan182007

Using EXPLAIN EXTENDED to see view query rewrites

At the MySQL Mini Conference in Sydney this week we discussed how to use EXPLAIN EXTENDED to view the rewrites undertaken by the MySQL optimizer.  IN particular, to see if MySQL performs a merge of the query into the view definition, or if it creates a temporary table.

It can be tricky to optimize queries using views, since it's often hard to know exactly how the query will be resovled - will MySQL push merge the text of the query and the view, or will it use a temporary table containing the views result set and then apply the query clauses to that?

In general, MySQL merges query text except when the view definition includes a GROUP BY or UNION.  But to be sure we can use EXPLAIN EXTENDED.  This also helps when we get confusing output in the EXPLAIN output.

For instance if we have a view definition like this:

 

CREATE VIEW user_table_v AS
     SELECT *
       FROM information_schema.tables ist
        WHERE table_type='BASE TABLE';

and try and explain a query like this:

 

explain select * from user_table_v WHERE table_schema='mysql'\G

We get output like this, which might be difficult to interpret unless we know the view defition:

 

*************************** 1. row ***************************
          id: 1
  select_type: SIMPLE
        table: ist
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
     filtered: 100.00
        Extra: Using where

Note the table "ist",  only by looking at the view definition can we interepret this.  But if we do an EXPLAIN EXTENDED followed by a SHOW WARNINGS we see the exact text:

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `ist`.`TABLE_NAME` AS `TABLE_NAME` from `information_schema`.`tables` `ist` where ((
`ist`.`TABLE_SCHEMA` = _utf8'mysql') and (`ist`.`TABLE_TYPE` = _utf8'BASE TABLE'))
1 row in set (0.00 sec)

And from this we can see that MySQL did indeed merge the WHERE clauses of both the query and the view definition.

If we look at the output for a view like this:

CREATE VIEW table_types_v AS
       SELECT table_type,count(*)
         FROM information_schema.tables ist
        GROUP BY table_type;

The we see the following output, in which we can see that MySQL created a temporary table and then applied the WHERE clause from the query:

*************************** 1. row ***************************
          id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE noticed after reading const tables
*************************** 2. row ***************************
          id: 2
  select_type: DERIVED
        table: ist
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
     filtered: 100.00
        Extra: Using temporary; Using filesort
2 rows in set, 1 warning (0.00 sec)

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `table_types_v`.`table_type` AS `table_type`,`table_types_v`.`count(*)` AS `count(*)
` from `mysql`.`table_types_v` where (`table_types_v`.`table_type` = _utf8'BASE TABLE')
1 row in set (0.00 sec)

EXPLAIN EXTENDED is an invaluable tool for tuning SQL statements, and even more so when working with views.

Wednesday
Jan032007

D.I.Y. MySQL 5.1 monitoring

I wrote recently about using events and the new processlist table in MySQL 5.1 to keep track of the number of connected processes.  Although having the PROCESSLIST available as an INFORMATION SCHEMA table is usefull, it seemed to me that having SHOW GLOBAL STATUS exposed in a similar fashion would be far more useful.  So at the MySQL UC last year, I asked Brian Aker if that would be possible.  I know how many enhancement requests MySQL has to deal with, so I was really happy to see that table appear in the latest 5.1 build (5.1.14 beta). 

This table, together with the EVENT scheduler, lets us keep track of the values of status variables over time without having to have any external deamon running.  This won't come anywhere near to matching what MySQL have made avalable in Merlin, but still could be fairly useful.  So lets build a simple system using events to keep track of "interesting" status variables....

I also thought I’d take the opportunity of working with the MySQL Workbench to design the tables involved.   Unfortunately the product is effectively unusable on my system until at least rc7 (due when?) and b/c it only support windows, I’m not up for a download the source and compile.  Oh well.  I used our own (Quest Software's) Toad Data Modeller instead.   

The idea is to schedule some events that trap (snapshot) the contents of the PROCESSLIST and GLOBAL_STATUS variables into tables that can be used to track trends and monitor performance in a simple way.  I created three tables to hold the data:

Toaddatamodeller_2

  1. GHSNAP_STATUS_VARIABLES lists the variables names we consider "interesting".  If the variable_name is listed here and CAPTURE_FLAG=1, then we will capture the value of the variable.  VARIABLE_SOURCE indicates the table from which the data is obtained - I only implemented GLOBAL_VARIABLES for now
  2. GHSNAP_SNAPSHOTS contains one row for each snapshot of the table we take.  The UPTIME column is used to detect server restarts so we don't calculate nonsense deltas or rates
  3. GHSNAP_SNAPSHOT_VALUES contains one row for each variable for each snapshot.  We capture the raw value of the variable, its change since the last snapshot and the rate of change per second.

This file installs the tables, stored procedures and events.  You may have to run it from the MySQL query browser to avoid errors caused by the command line client not processing DELIMITER statements properly.  The script creates a new database GH_SNAPSHOTS.  Three stored procedures are created:

  1. ghsnap_populate_variables creates default entries in the GHSNAP_STATUS_VARIABLES table and is run only during installation.  By default I set it up to capture all the INNODB%, COM% and QCACHE% variables but you might want to change the CAPTURE_FLAG for those you do/don't want.
  2. ghsnap_take_snapshot takes a snapshot of GLOBAL_STATUS and stores it in the snapshot tables.  It also works out if the server has been restarted and if not, calculates rates and deltas.
  3. ghsnap_delete_snapshots deletes all snapshots more than a certain number of days old.

There's two events defined:

  1. ghsnap_take_snap_event runs (by default) every five minutes and simply executes ghsnap_take_snapshot.
  2. ghsnap_delete_snap_event runs every 5 hours (by default) and deletes snapshots more than 1 day old (you can edit the installation script to change this default).

There's some obvious additional things that we could do with this (optimze deletes via partitioning, calculate ratios, capture other state, etc) but this probably has some real value already.  Now I can issue queries such as this:

select snapshot_id,snapshot_timestamp ,
       sum( case variable_name when 'COM_COMMIT' then variable_value_rate end
          ) commit_ps,
       sum( case variable_name when 'COM_SELECT' then variable_value_rate end
          ) select_ps,
       sum( case variable_name when 'COM_INSERT' then variable_value_rate end
          ) insert_ps,
       sum( case variable_name when 'COM_DELETE' then variable_value_rate end
          ) delete_ps,
       sum( case variable_name when 'COM_UPDATE' then variable_value_rate end
          ) update_ps
  from ghsnap_snapshots join ghsnap_snapshot_values using (snapshot_id)
where snapshot_id>70
group by snapshot_id,snapshot_timestamp

To view the activity on my server.  Using a reporting or charting tool I can generate usefull charts, etc.  For instance, using the BIRT module in eclipse I created the following chart showing my SQL execution rates:

Birt

Nothing earth-shattering but still useful and - with the event scheduler, stored procedures and the new INFORMATION_SCHEMA tables we don't need any external infrastructure to capture this information. 

We'll probably implement something like this in the next version of our  freeware Spotlight on MySQL as well as better diagnostics on replication and exploiting the fact that in 5.1 you can get SELECT access to logs.  Feel free to check it out.   

P.S. don't forget that to set EVENT_SCHEDULER=1 to enable events on your 5.1 server.  And watch out on Windows where the event scheduler in 5.1.14 still seems a bit unstable.

Wednesday
Sep272006

10g tracing quick start

Oracle’s released a few new facilities to help with tracing in 10g,  here’s a real quick wrap up of the most significant:

Using the new client identifier

You can tag database sessions with a session identifier that can later be used to identify sessions to trace.  You can set the identifier like this:

begin
    dbms_session.set_identifier('GUY1');
end;


You can set this from a login trigger if you don’t have access to the source code.  To set trace on for a matching client id, you use DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE:

BEGIN

   DBMS_MONITOR.client_id_trace_enable (client_id      => 'GUY1',
                                        waits          => TRUE,
                                        binds          => FALSE
                                       );
END;

You can add waits and or bind variables to the trace file using the flags shown.

Tracing by Module and/or action

Many Oracle-aware applications set Module and action properties and you can use these to enable tracing as well.  The serv_mod_act_trace_enable method allows you to set the tracing on for sessions matching particular service, module, actions and (for clusters) instance identifiers.  You can see current values for these usng the following query:

SELECT DISTINCT instance_name, service_name, module, action
           FROM gv$session JOIN gv$instance USING (inst_id);



INSTANCE_NAME    SERVICE_NA MODULE                         ACTION

Wednesday
Sep272006

10g time model query

Joining the 10g time model to the traditional wait interface views and taking advantage of the wait_class data is something most of us have probably done.  Here's my standard queries that do that thing....

fCOLUMN wait_class format a20
COLUMN name format a30
COLUMN time_secs format 999,999,999,999.99
COLUMN pct format 99.99

SELECT   wait_class, NAME, ROUND (time_secs, 2) time_secs,
         ROUND (time_secs * 100 / SUM (time_secs) OVER (), 2) pct
    FROM (SELECT n.wait_class, e.event NAME, e.time_waited / 100 time_secs
            FROM v$system_event e, v$event_name n
           WHERE n.NAME = e.event AND n.wait_class <> 'Idle'
                 AND time_waited > 0
          UNION
          SELECT 'CPU', 'server CPU', SUM (VALUE / 1000000) time_secs
            FROM v$sys_time_model
           WHERE stat_name IN ('background cpu time', 'DB CPU'))
ORDER BY time_secs DESC;

Which generates CPU and wait event times broken down to the event name:

WAIT_CLASS           NAME                                     TIME_SECS    PCT