Next Generation
Databases: NoSQL,
NewSQL and Big Data

Buy at Amazon
Buy at Apress

Search

Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter

                                    

 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 

                                                

Sunday
Jul312011

A perl utility to improve exadata cellcli statistics

I’ve only recently had the opportunity to start experimenting with our relatively new Exadata 1/4 rack.  Like most Exadata newbs the most immediately useful new tool is the cellcli utility which lets us get as storage cell utilities.  Kerry Osborne’s blog post on top 10 commands is required reading when getting started.

Amongst Cellcli annoyances are:

  1. You can’t join between METRICDEFINITION and METRICCURRENT or other metric tables.  So you can’t easily see the descriptions of the metric values you are viewing.
  2. CELLCLI only reports the values for the current storage cell, so you can’t get a global view across the cells.

I decided a good first project would be to write a utility that could display the metric definitions and the values together, and which would merge values from all of the storage cells.   Since my old friend perl is present on the storage cells, it wasn’t too hard to put together a tool to do this.   The script is here

Setup

 

You’ll want to have password-less SSH connectivity between cells for this to work.   Many of us will be familiar with doing this for RAC, but if you don’t, a quick primer is available at  http://blogs.translucentcode.org/mick/archives/000230.html.   I wrote this script to partially automate the procedure:

   1: # Helper script to setup passwordless SSH
   2: if [ $# -eq  0 ]
   3: then
   4:     echo "Usage $0 remoteHost"
   5:     exit 1
   6: fi
   7:  
   8: ssh-keygen -t dsa
   9: while [ $# -gt 0 ]; do
  10:     export remoteHost=$1
  11:     scp ~/.ssh/id_dsa.pub ${remoteHost}:
  12:     ssh $remoteHost 'mkdir .ssh; cat ~/id_dsa.pub >>~/.ssh/authorized_keys;chmod 644 ~/.ssh/authorized_keys '
  13:     shift
  14: done

 

From a database node, run the script with the names of all the storage cells on the command line.  I have three cells so I issued the command like this:

[melbourne@exa1db02 ~]$ bash setup_ssh.sh exa1cel01 exa1cel02 exa1cel03

Then you will be prompted with various SSH messages about passphrases, and so on.  Just hit return or “y” if prompted, except when asked for passwords.  You’ll have to enter the password for each storage cell twice to get everything setup. After that you should be able to ssh to the storage cells without being prompted for a password.

Usage

 

Running the command with no arguments will generate the following usage message:

 

[melbourne@exa1db02 ~]$ perl cellcli.pl
Usage perl cellcli.pl [options]
Options:
        --hosts -H      comma seperated lists of cell nodes
        --mask -m       print metrics starting with this string
        --desc -d       print descriptions of metrics
        --cells -c      print individual cell metrics
        --sci -s        Use scientific notation
        --stats -a      Print min/max/avgs
        --help -h       print this

Example:
        perl cellcli -m FC_IO_RQ_R -h exa1cel01,exa1cel02,exa1cel03

NB: utility assumes passwordless SSH from this host to the cell nodes
see
http://guyharrison.squarespace.com/blog/2011/7/31/aggregating-exadata-cellcli-statistics.html

 

The only compulsory argument is “—hosts”, which requires a comma-seperated list of the cell nodes.    Most of the time you will want to use the “—mask” argument as well, which applies a filter to the metric names.   With no other arguments you’ll then get a list of each matching statistic and the sum of values across all the cells:

[melbourne@exa1db02 ~]$ perl cellcli.pl --hosts exa1cel01,exa1cel02,exa1cel03 --mask FC_IO_RQ_R.*

Name                    Sum
----------------------- ------------
FC_IO_RQ_R               8.55164e+06
FC_IO_RQ_R_MISS               675223
FC_IO_RQ_R_MISS_SEC                0
FC_IO_RQ_R_SEC                   0.1
FC_IO_RQ_R_SKIP            1.742e+08
FC_IO_RQ_R_SKIP_SEC             33.6

Note that mask matches Unix regular expressions, not Oracle style “%” wildcards - '.*' matches anything. 

The --desc argument adds the description (somewhat truncated) for each metric:

image

The --cells argument prints the values from each individual cell:

image

 

The --stats argument causes min,max and average values to be printed.  The average is the average of sums across cells, not the average of all values.  SOme statistics will have more than one value in each cell (GD_IO_RQ_W_SM for example) and the average is the average SUM of values across cells.  

image

You can use all the arguments together, but you’re output might be very wide!

Restrictions and caveats

 

When a metric has multiple object values they are simply summed.  That probably works OK for some statistics but maybe not for others.  For instance the disk grid IO counters are broken down by the name of the grid disk within the cell:

image

My utility simply sums all of those values:

image

 

 

Hope you find this of use.  Please let me know if you’ve got any suggestions…..

 

Tuesday
Jul052011

MongoDB, Oracle and Toad for Cloud Databases

We recently added support for MongoDB in Toad for Cloud Databases, so I took the opportunity of writing my first MongoDB Java program and taking the Toad functionality for a test drive.

MongoDB is a non-relational, document oriented database that is extremely popular with developers (see for instance this Hacker news poll).   Toad for cloud databases allows you to work with non-relational data using SQL by normalizing the data structures and converting SQL to the non-relational calls.

I wanted to get started by creating some MongoDB collections with familiar data.  So I wrote a Java program that takes data out of the Oracle sample schema, and loads it into Mongo as documents.  The program is here.

 

The key parts of the code are shown here:

   1: while (custRs.next()) { // For each customer
   2:     String custId = custRs.getString("CUST_ID");
   3:     String custFirstName = custRs.getString("CUST_FIRST_NAME");
   4:     String custLastName = custRs.getString("CUST_LAST_NAME");
   5:  
   6:     //Create the customer document 
   7:     BasicDBObject custDoc = new BasicDBObject();
   8:     custDoc.put("_id", custId);
   9:     custDoc.put("CustomerFirstName", custFirstName);
  10:     custDoc.put("CustomerLastName", custLastName);
  11:     // Create the product sales document 
  12:     BasicDBObject customerProducts = new BasicDBObject();
  13:     custSalesQry.setString(1, custId);
  14:     ResultSet prodRs = custSalesQry.executeQuery();
  15:     Integer prodCount = 0;
  16:     while (prodRs.next()) { //For each product sale 
  17:         String  timeId=prodRs.getString("TIME_ID"); 
  18:         Integer prodId = prodRs.getInt("PROD_ID");
  19:         String prodName = prodRs.getString("PROD_NAME");
  20:         Float Amount = prodRs.getFloat("AMOUNT_SOLD");
  21:         Float Quantity = prodRs.getFloat("QUANTITY_SOLD");
  22:         // Create the line item document 
  23:         BasicDBObject productItem = new BasicDBObject();            
  24:         productItem.put("prodId", prodId);
  25:         productItem.put("prodName", prodName);
  26:         productItem.put("Amount", Amount);
  27:         productItem.put("Quantity", Quantity);
  28:         // Put the line item in the salesforcustomer document 
  29:         customerProducts.put(timeId, productItem);
  30:         if (prodCount++ > 4) { // Just 5 for this demo
  31:             prodCount = 0;
  32:             break;
  33:         }
  34:     }
  35:     // put the salesforcustomer document in the customer document 
  36:     custDoc.put("SalesForCustomer", customerProducts);
  37:  
  38:     System.out.println(custDoc);
  39:     custColl.insert(custDoc);  //insert the customer 
  40:     custCount++;
  41:  
  42: }

Here’s how it works:

Lines Description
1-4 We loop through each customer,   retrieving the key customer details
7-10 We create a basic MongoDB document that contains the customer details
12 We create another MongoDB document that will contain all the product sales for the customer
16-21 Fetching the data for an individual sale for that customer from Oracle
23-27 We create a document for that single sale
29 Add the sale to the document containing all the sales
36 Add all the sales to the customer
39 Add the customer document to the collection

 

The MongoDB API is very straight forward; much easier than similar APIs for HBase or Cassandra.

When we run the program, we create JSON documents in Mongo DB that look like this:

   1: { "_id" : "7" , "CustomerFirstName" : "Linette" , "CustomerLastName" : "Ingram" , 
   2:     "SalesForCustomer" : {
   3:         "2001-05-30 00:00:00" : { "prodId" : 28 , "prodName" : "Unix/Windows 1-user pack" , "Amount" : 205.76 , "Quantity" : 1.0} , 
   4:         "1998-04-18 00:00:00" : { "prodId" : 129 , "prodName" : "Model NM500X High Yield Toner Cartridge" , "Amount" : 205.48 , "Quantity" : 1.0}
   5:     }
   6: }
   7: { "_id" : "8" , "CustomerFirstName" : "Vida" , "CustomerLastName" : "Puleo" , 
   8:     "SalesForCustomer" : { 
   9:         "1999-01-27 00:00:00" : { "prodId" : 18 , "prodName" : "Envoy Ambassador" , "Amount" : 1726.83 , "Quantity" : 1.0} , 
  10:         "1999-01-28 00:00:00" : { "prodId" : 18 , "prodName" : "Envoy Ambassador" , "Amount" : 1726.83 , "Quantity" : 1.0} , 
  11:         "1998-04-26 00:00:00" : { "prodId" : 20 , "prodName" : "Home Theatre Package with DVD-Audio/Video Play" , "Amount" : 608.39 , "Quantity" : 1.0} ,
  12:         "1998-01-19 00:00:00" : { "prodId" : 28 , "prodName" : "Unix/Windows 1-user pack" , "Amount" : 216.99 , "Quantity" : 1.0} , 
  13:         "1998-03-19 00:00:00" : { "prodId" : 28 , "prodName" : "Unix/Windows 1-user pack" , "Amount" : 216.99 , "Quantity" : 1.0} 
  14:     }
  15: }

 

Toad for Cloud “renormalizes” the documents so that they resemble something that we might use in a more conventional database.  So in this case,   Toad creates two tables from the Mongo collection, one for customers, and one which contains the sales for a customer.   You can rename the auto-generated foreign keys and the sub-table name to make this a bit clearer, as in the example below:

 

SNAGHTML1a75d15d

 

We can more clearly see the relationships in the .NET client by using Toad’s visual query builder (or we could have used the database diagram tool):

 

SNAGHTML1a7b4f62

 

MongoDB has a pretty rich query language, but it’s fairly mysterious to those of us are used to SQL, and it’s certainly not as rich as the SQL language.  Using Toad for Cloud, you can issue ANSI standard SQL against your MongoDB tables and quickly browse or perform complex queries.  Later this year,   this Mongo support will emerge in some of our commercial data management tools such as Toad for Data Analysts and our soon to be announced BI tools.

 

SNAGHTML1bfbfc5b

Wednesday
Jun082011

A first look at Oracle on Amazon RDS

It's been possible for a some time to run Oracle within the Amazon cloud.  I wrote an introduction to Oracle on EC2 back in 2008,  and we use an EC2 pretty extensively here at quest for testing and deployment.   Up until now, Oracle has been made available within EC2 instances which appear for all purposes to be remote hosts.  Recently, Amazon  made oracle available in a more directly cloud based fashion as one of the options to their Relational Database Service (RDS).  Using RDS, Amazon takes complete responsibility for database management providing a more complete Database As A Service (DBaaS) offering, similar to what Microsoft offers with SQL Azure.   Amazon handles installation, upgrades, backups and so forth.  In theory, you can use the database without needing to do any administrative work.

Indeed Amazon on takes so much responsibility that long-time DBAs like me are unlikely to be really happy; the very notion of being unable to connect to the operating system for instance is disturbing, and it’s currently missing support for many of our favourite tools.

However I wanted to give it a test drive, so here's some screenshots and observations to give you a feel for it.

Creating the Oracle RDS database

 

To use any of the Amazon cloud facilities, you need and Amazon AWS account.  Once you have that, you create a new RDS database from the AWS console, where you can launch a new MySQL or Oracle database:

image

You then specify the parameters for the new instance.  At the time of writing, some of these screens clearly had not been updated from the older MySQL options.  Here we can specify:

  • Type of license.   If you have a site license you can use that (as I have below).  If not you can pay Amazon by the hour and Amazon presumably kick some of that back to Oracle.
  • Oracle version.  Note also that you can ask to have the version updated for you with patches automatically – quite cool. 
  • Instance Class.  This is the size of the virtual machine and determines memory and CPU availability.  These match the equivalent EC2 instance types (see http://aws.amazon.com/ec2/#instance) and vary from 1 core with 1.7GB to 8 (faster) cores with 64GB).   Charging varies accordingly of course.
  • DB instance identifier.  At first you might think that is the database service name, but NO, it’s an Amazon identifier.  You specify the service name on the next page.
  • Username and Password of the DBA account you’ll use.

image

On next page we specify the database (eg Service/SID) name and the port.  NOTE that at the moment the default port is 3306, which is the MySQL listener port – you should change it if you want a more familiar listener port like 1521.

image

 

Click OK a couple more times and your database will start up.  Takes 5 minutes or so to get running.

Connecting to the database

 

Before you can connect, however, you’ll need to create a security group which essentially defines the Amazon firewall for your instance.   Below I’m configuring to let anything through (not ideal, but I travel a lot so I can’t predict my IP address range).  The CDIR/IP is a mask that defines what IP addresses will be allowed to connect to the database instance. 

image

Now we can connect.  We need to know the hostname, which is shown as the “Endpoint” in the DB Instance description page.    The port and service name are also here:

image

We could create a TNSNAMES entry using those details or connect via SQL Plus in the form username/password@host:port/Service.  Below, I specify these details directly into TOAD:

 

image

Restrictions

 

As easy as it is to get started with RDS, there are a lot of restrictions.  Most significantly:

  • You can’t connect to the OS. 
  • Datafiles must be Oracle Managed Files.  No filesytem specifications are allowed.  ASM is not supported.
  • Some commands are not supported, including ALTER SYSTEM.  However there are PLSQL packages that provide a lot of the common facilities such as switching logs, checkpointing, tracing and killing sessions.
  • You can only access the alert log and trace files via an external table interface
  • No OEM!!!  And for that matter, monitoring products such as Quest’s own Spotlight and Foglight cannot be installed (due to the above restrictions).   Even some parts of TOAD won’t function since they rely on forbidden commands 

The idea with RDS and these missing features  is that “you ain’t going to need it” -  the database is self managing and these DBA utilities simply aren’t required.   For more details check out this part of the RDS manual.

 

Changing the database CPU and memory

 

One of the great things about RDS is that you can resize the instance.  If  you want more memory or CPU, you can use the modify tab to change the specification:

image

This does require a reboot, which either happens during the maintenance window or – if you so specified – immediately.   Still, much easier than achieving the same result on EC2 – which is almost impossible.

Monitoring

 

Although there’s no OEM and no way to run Spotlight or other 3rd party monitoring tools, there is a very basic monitoring screen.

image

Parameters

Parameters are managed through parameter groups, which assign values dynamically.  For instance, in the default parameter group, memory target is set to 75% of total system memory.  If you reboot the system into a higher class, the memory will change.  You can change the values for a parameter group and you can create your own.

image

Conclusion

 

End to end, it probably takes about 10 minutes to get set up once you have your account details.  Not bad.  And Amazon takes care of backups and the like.

However, it’s probably not something I’ll ever do again, because I can’t really cope with the idea of not being able to go directly to the alert log, trace files, etc.  And I definitely don’t like not being able to use tools like Spotlight or even OEM.   Amazon provides work arounds for lots of this stuff – see this page for instance – but I’m too used to the command line Smile.  Still, it’s worth knowing how this all works and – who knows – it might be handy one day when I want to quickly set up a 64GB SGA database

Friday
May202011

RJDBC 2.0 performance improvements

But hopefully this will be the last RJDDC posting the I make for a while :-).

Simon Urbanek has made some fixes to RJDBC which resolve the performance issues five referred to in my last post.  As you can see below, these fixes have led to a pretty dramatic performance improvements:

17-05-2011 10-29-16 AM RJDBC 2.0

Furthermore, RJDBC is substantially easier to install than Roracle, and is of course portable across different databases.

If you’re really concerned with RJDBC performance and have memory to burn you can improve performance someone by selecting all rows in a single internal fetch, like this:

jdata2<-fetch(dbSendQuery(jcon,sqltext),n=rowsToFetch);

This improves performance, though Roracle still has the advantage:

18-05-2011 10-56-47 AM RJDBC2

You can get the new RJDBC here.

Tuesday
May102011

RORACLE vs RJDBC

In this post I looked at hooking up the R statistical system to Oracle for either analysing database performance data or analysing other data that happened to be stored in Oracle.   A reader asked me if I’d compared the performance of RJDBC – which I used in that post - with the RORACLE package.  I hadn’t, but now I have and found some pretty significant performance differences.

RJDBC hooks up pretty much any JDBC datasource into R, while ROracle uses native libraries to mediate the connection.  RJDBC is probably easier to start with, but ROracle can be installed pretty easily, provided you create some client libary entries.  So for me,  after downloading the ROracle package, my install looked something like this (run it as root):

cd $ORACLE_HOME/bin
genclntsh
genclntst

R CMD INSTALL --configure-args='--enable-static' /root/Desktop/ROracle_0.5-9.tar.gz

It was pretty obvious right away that RJDBC was much slower than ROracle.  Here’s a plot of elapsed times for variously sized datasets:

8-05-2011 3-09-24 PM R1

The performance of RJDBC degrades fairly rapidly as the size of the data being retrieved from Oracle increases,  and would probably be untenable for very large data sets. 

The RJDBC committers do note that RJDBC performance will not be optimal:

The current implementation of RJDBC is done entirely in R, no Java code is used. This means that it may not be extremely efficient and could be potentially sped up by using Java native code. However, it was sufficient for most tasks we tested. If you have performance issues with RJDBC, please let us know and tell us more details about your test case.

The default array size used by RJDBC is only 10, while the default for Roracle is 500… could this be the explaination for the differences in performance?

You can’t change the default RJDBC fetch size (at least, I couldn’t work out how to), but you can change ROracle's.  Here’s a breakdown of elapsed time for RJDBC and Roracle using defaults, and for ROracle using a fetch size of 10:

10-05-2011 11-56-52 AM R3

As you can see, issue does not seem to be the array size alone.   I suspect the overhead of building up the data frame from the result set in RJDBC is where the major inefficiency occurs.  Increasing the array fetch size might reduce the impact of this, but the array fetch size alone is not the cause of the slow performance.    

Conclusion

The current implementation of RJDBC is easy to install and fairly portable, but doesn’t provide good performance when loading large data sets.   For now, ROracle is the better choice.   

Scripts:

Create test table (SQL)

scalability testing (R script)

Comparison of elapsed times and array size (R script).