dbKoda - a modern open source databases IDE, now available for MongoDB Download it here!

Next Generation
Databases: NoSQL,
NewSQL and Big Data

Buy at Amazon

Read sample at Amazon

Buy at Apress

Latest Postings:

Search

Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter

                                    

Powered by Squarespace

 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 

                                                

Entries in perl (2)

Sunday
Jul312011

A perl utility to improve exadata cellcli statistics

I’ve only recently had the opportunity to start experimenting with our relatively new Exadata 1/4 rack.  Like most Exadata newbs the most immediately useful new tool is the cellcli utility which lets us get as storage cell utilities.  Kerry Osborne’s blog post on top 10 commands is required reading when getting started.

Amongst Cellcli annoyances are:

  1. You can’t join between METRICDEFINITION and METRICCURRENT or other metric tables.  So you can’t easily see the descriptions of the metric values you are viewing.
  2. CELLCLI only reports the values for the current storage cell, so you can’t get a global view across the cells.

I decided a good first project would be to write a utility that could display the metric definitions and the values together, and which would merge values from all of the storage cells.   Since my old friend perl is present on the storage cells, it wasn’t too hard to put together a tool to do this.   The script is here

Setup

 

You’ll want to have password-less SSH connectivity between cells for this to work.   Many of us will be familiar with doing this for RAC, but if you don’t, a quick primer is available at  http://blogs.translucentcode.org/mick/archives/000230.html.   I wrote this script to partially automate the procedure:

   1: # Helper script to setup passwordless SSH
   2: if [ $# -eq  0 ]
   3: then
   4:     echo "Usage $0 remoteHost"
   5:     exit 1
   6: fi
   7:  
   8: ssh-keygen -t dsa
   9: while [ $# -gt 0 ]; do
  10:     export remoteHost=$1
  11:     scp ~/.ssh/id_dsa.pub ${remoteHost}:
  12:     ssh $remoteHost 'mkdir .ssh; cat ~/id_dsa.pub >>~/.ssh/authorized_keys;chmod 644 ~/.ssh/authorized_keys '
  13:     shift
  14: done

 

From a database node, run the script with the names of all the storage cells on the command line.  I have three cells so I issued the command like this:

[melbourne@exa1db02 ~]$ bash setup_ssh.sh exa1cel01 exa1cel02 exa1cel03

Then you will be prompted with various SSH messages about passphrases, and so on.  Just hit return or “y” if prompted, except when asked for passwords.  You’ll have to enter the password for each storage cell twice to get everything setup. After that you should be able to ssh to the storage cells without being prompted for a password.

Usage

 

Running the command with no arguments will generate the following usage message:

 

[melbourne@exa1db02 ~]$ perl cellcli.pl
Usage perl cellcli.pl [options]
Options:
        --hosts -H      comma seperated lists of cell nodes
        --mask -m       print metrics starting with this string
        --desc -d       print descriptions of metrics
        --cells -c      print individual cell metrics
        --sci -s        Use scientific notation
        --stats -a      Print min/max/avgs
        --help -h       print this

Example:
        perl cellcli -m FC_IO_RQ_R -h exa1cel01,exa1cel02,exa1cel03

NB: utility assumes passwordless SSH from this host to the cell nodes
see
http://guyharrison.squarespace.com/blog/2011/7/31/aggregating-exadata-cellcli-statistics.html

 

The only compulsory argument is “—hosts”, which requires a comma-seperated list of the cell nodes.    Most of the time you will want to use the “—mask” argument as well, which applies a filter to the metric names.   With no other arguments you’ll then get a list of each matching statistic and the sum of values across all the cells:

[melbourne@exa1db02 ~]$ perl cellcli.pl --hosts exa1cel01,exa1cel02,exa1cel03 --mask FC_IO_RQ_R.*

Name                    Sum
----------------------- ------------
FC_IO_RQ_R               8.55164e+06
FC_IO_RQ_R_MISS               675223
FC_IO_RQ_R_MISS_SEC                0
FC_IO_RQ_R_SEC                   0.1
FC_IO_RQ_R_SKIP            1.742e+08
FC_IO_RQ_R_SKIP_SEC             33.6

Note that mask matches Unix regular expressions, not Oracle style “%” wildcards - '.*' matches anything. 

The --desc argument adds the description (somewhat truncated) for each metric:

image

The --cells argument prints the values from each individual cell:

image

 

The --stats argument causes min,max and average values to be printed.  The average is the average of sums across cells, not the average of all values.  SOme statistics will have more than one value in each cell (GD_IO_RQ_W_SM for example) and the average is the average SUM of values across cells.  

image

You can use all the arguments together, but you’re output might be very wide!

Restrictions and caveats

 

When a metric has multiple object values they are simply summed.  That probably works OK for some statistics but maybe not for others.  For instance the disk grid IO counters are broken down by the name of the grid disk within the cell:

image

My utility simply sums all of those values:

image

 

 

Hope you find this of use.  Please let me know if you’ve got any suggestions…..

 

Wednesday
Oct282009

Performant Oracle programming:  perl

In the last installment, we looked at the fundamentals of efficient Oracle programming with C#.  In particular, we saw how to use bind variables and how to perform array processing.  Now lets take a look at perl.

I have a strong affection for perl:  I started using perl somewhere in the early nineties and did some simple hacking to allow the Oracle interface of the day - oraperl - to use stored procedures.  My very first true performance monitoring tools were all written using perl.  Perl has always had good support for Oracle and it's possible to write very efficient and powerful Oracle database utilities in perl.

You can get the perl program that implements the examples in this posting here

Thanks for all the perl!

Anyone who uses perl with Oracle today owes a debt of thanks to the many people who have maintained the Oracle perl interface (DBD::Oracle) over the years.

Kevin Stock created the original perl 4 "oraperl" modules, which Tim Bunce used to create the modern DBD::Oracle.  Tim maintained it for many years, but handed it over to John Scoles of the Pythian group in 2006.  Pythian are now sponsoring the ongoing maintenance of DBD::Oracle and they do this without any particular expectation of financial reward. It  demonstrates what great guys and outstanding members of the Oracle community they are, and those of us using DBD::Oracle owe them a debt of gratitude. The Pythian hosted home page for DBD::Oracle is here.

Bind variables

As with most languages, it's easier in perl to use literals than to use bind variables.  So for instance, in the following snippet we issue the same SQL statement in a loop, but because we are inserting the literal value of the variable "$x_value" into the SQL on line 236, Oracle will consider every SQL to be a unique statement. 

 


To use bind variables, it's simply required that you insert a placeholder in your SQL text - either a "?" or numbered parameters like ":1", and then assign them values with the bind_param call (see line 184 below).   Note that we prepare the statement only once (on line 179) even though we bind and execute it many times. 

 

  

Using bind variables results in significant performance gains,  both because of the reduction in parse time CPU and because of the reduced contention for library cache mutexes.  When I ran the script I was the only user on the database so mutex contention was not an issue.  Even so, the reduction in parse time resulted in an almost a 75% reduction in execution time:

 

Array Fetch

Array fetch is handled automatically by DBD::Oracle.   The RowCacheSize property of the database handle sets the size of the fetch array, which by default is set to 500.  

Perl supports a range of fetch methods:  fetchrow_array, fetchrow_arrayref, fetchrow_hashref and  fetchall_arrayref.  DBD::Oracle uses array processing regardless of the DBI method you use to fetch your rows.  

In the following example, we set the array fetch size, then process each row one at a time.  DBD::Oracle fetches rows in batches behind the scenes. 

 

We set the size of the array on line 61.  You don't really need to do this unless you think that 500 is an inappropriate array size - perhaps if the row size is very small or very large.  

A casual look at the DBI APIs might suggest that you need to use the fetchall_arrayref method to exploit array processing.  For instance, in the following perl code, the contents of the PERLDEMO table are loaded into a array of rows in a single call. 

Fetching the data in this way actually doesn't change the size of the array requests to Oracle:  all it achieves is to load all the rows into local memory.  This is usually not a good idea for massive tables, since you may get a memory allocation error or starve other programs - including Oracle - for memory.   Even if you don't cause a memory shortage, the overhead of allocating the memory will usually result in slower processing than if you process the rows using fetchrow_array or one of the other row-at-a-time fetch calls. 

Array Insert

As with a lot of languages,  it's easy to fetch in arrays in perl, but slightly harder to insert with arrays.  The "natural" way of inserting rows is to bind each row into the INSERT statement and execute.   Here's that simple technique, which does not exploit the array interface:

 

The value for the row to be inserted is bound in lines 148-149, then inserted in line 150.  Each row is inserted in a separate call, and so every row requires a network round trip to the database and a unique interaction with the oracle server code.

 It's only a little bit more complex to bind an array:

We bind the arrays using the bind_param_array method on lines 133 and 134.  The execute_array method (line 136) activates the array insert.  The size of the array can be adjusted by setting the ora_array_chunk_size property (line 132). 

As in all languages,  it's a very significant improvement to use array insert.  We see below that using the array methods reduced elapsed time by over 80%:

 

Conclusion

 Perl supports a very rich and mature interface to Oracle and it's certainly possible to write high-performance perl programs that interact with Oracle. 

Although it's usually not necessary to explicitly code array fetch,  you do need to explicitly code bind variables and array inserts and you should generally do so if you wish your perl script to interact efficiently with Oracle. 

There's lots of other performance features of the DBD::Oracle driver that I haven't covered in this basic introduction.  You can read up on all the capabilities of the driver in its documentation page on CPAN.