Next Generation
Databases: NoSQL,
NewSQL and Big Data

Buy at Amazon
Buy at Apress

Search

Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter

                                    

 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 

                                                

Entries in Oracle (38)

Tuesday
Sep272011

Configuring Exadata flash as grid disk

The default – or at least a very common - configuration for Exadata is to configure all the flash as Exadata Smart Flash Cache (ESFC).   This is a simple and generally performant configuration, but won’t be the best choice for all cases.  In particular, if you have table which is performance critical, and it could fit in the flash storage you have available, you might be better off configuring some of your flash as grid disk, creating an ASM disk group from that, and putting the table there.

Here’s the procedure:

1. Drop the flash cache, create a new flashcache of a smaller size, then create the griddisks from the unallocated space.  These CELLCLI commands do that:

CellCLI> drop flashcache
Flash cache exa1cel01_FLASHCACHE successfully dropped
CellCLI> create flashcache all size=288g
Flash cache exa1cel01_FLASHCACHE successfully created
CellCLI> create griddisk all flashdisk prefix=ssddisk

There’s 384G of flash on each storage cell, so the above commands create about 96G of SSD grid disk.   Run those commands on each cell node, perhaps by using the CCLI command (see this post for an example).

2. The above procedure will create disks in the format o/cellIpAddress/ssddisk_FD_*_cellnode.  Log into an ASM instance, and issue the following command to create a diskgroup from those disks:

SQL> 
  1  create diskgroup DATA_SSD normal redundancy disk 'o/*/ssddisk*'
  2  attribute 'compatible.rdbms'='11.2.0.0.0',
  3  'compatible.asm'='11.2.0.0.0',
  4  'cell.smart_scan_capable'='TRUE',
  5* 'au_size'='4M'

Alternatively you can use the database control for the ASM instance to create the new diskgroup.  Your new flash disks should show up as candidate disks.

The relative performance of flash disks, vs flash cache is similar in Exadata to what I’ve seen using the Database flash cache.  Placing an object directly on flash is faster than using the cache, although the cache is very effective.  Here’s the results for 200,000 primary key lookups across  1,000,000 possible primary keys:

image

Tuesday
Sep272011

Clearing the Exadata smart flash cache using dcli

I’ve been doing some performance benchmarks on our exadata box specifically focusing on the performance of the smart flash cache.  I found that even if I switched the CELL_FLASH_CACHE storage setting to NONE,  the flash cache will still keep cached blocks in flash and would therefore give me artificially high values for “cell flash cache read hits” statistic when I set CELL_FLASH_CACHE back to DEFAULT or KEEP.  What I needed was a way to flush the Exadata flash cache.

Unfortunately there doesn’t seem to be a good way to flush the flash cache – no obvious CELLCLI command.   Maybe I’ve missed something, but for now I’m dropping and recreating the flash cache before each run.

Luckily the dcli command lets me drop and recreate on each cell directly from the database node and even sets up passwordless connections.  Here’s how to do it.

Firstly, create a script that will drop and recreate the flash cache for a single cell:

$ cat flushcache.sh
cellcli <<!
drop flashcache;
create flashcache all;

Now, use ccli to execute that on each cell node (I have three named exa1cel01,exa1cel02,exa1cel03:

$ dcli -c exa1cel01,exa1cel02,exa1cel03 --serial -k -l userid -x flushcache.sh

The “-k” option copies the ssh key to the cell nodes which means that after the first execution you’ll be able to do this without typing in the password for each cell node.   The “—serial” option makes each command happen one after another rather than all at once – you probably don’t need this…

Anyone know a better way to flush the Exadata flash cache?

Sunday
Jul312011

A perl utility to improve exadata cellcli statistics

I’ve only recently had the opportunity to start experimenting with our relatively new Exadata 1/4 rack.  Like most Exadata newbs the most immediately useful new tool is the cellcli utility which lets us get as storage cell utilities.  Kerry Osborne’s blog post on top 10 commands is required reading when getting started.

Amongst Cellcli annoyances are:

  1. You can’t join between METRICDEFINITION and METRICCURRENT or other metric tables.  So you can’t easily see the descriptions of the metric values you are viewing.
  2. CELLCLI only reports the values for the current storage cell, so you can’t get a global view across the cells.

I decided a good first project would be to write a utility that could display the metric definitions and the values together, and which would merge values from all of the storage cells.   Since my old friend perl is present on the storage cells, it wasn’t too hard to put together a tool to do this.   The script is here

Setup

 

You’ll want to have password-less SSH connectivity between cells for this to work.   Many of us will be familiar with doing this for RAC, but if you don’t, a quick primer is available at  http://blogs.translucentcode.org/mick/archives/000230.html.   I wrote this script to partially automate the procedure:

   1: # Helper script to setup passwordless SSH
   2: if [ $# -eq  0 ]
   3: then
   4:     echo "Usage $0 remoteHost"
   5:     exit 1
   6: fi
   7:  
   8: ssh-keygen -t dsa
   9: while [ $# -gt 0 ]; do
  10:     export remoteHost=$1
  11:     scp ~/.ssh/id_dsa.pub ${remoteHost}:
  12:     ssh $remoteHost 'mkdir .ssh; cat ~/id_dsa.pub >>~/.ssh/authorized_keys;chmod 644 ~/.ssh/authorized_keys '
  13:     shift
  14: done

 

From a database node, run the script with the names of all the storage cells on the command line.  I have three cells so I issued the command like this:

[melbourne@exa1db02 ~]$ bash setup_ssh.sh exa1cel01 exa1cel02 exa1cel03

Then you will be prompted with various SSH messages about passphrases, and so on.  Just hit return or “y” if prompted, except when asked for passwords.  You’ll have to enter the password for each storage cell twice to get everything setup. After that you should be able to ssh to the storage cells without being prompted for a password.

Usage

 

Running the command with no arguments will generate the following usage message:

 

[melbourne@exa1db02 ~]$ perl cellcli.pl
Usage perl cellcli.pl [options]
Options:
        --hosts -H      comma seperated lists of cell nodes
        --mask -m       print metrics starting with this string
        --desc -d       print descriptions of metrics
        --cells -c      print individual cell metrics
        --sci -s        Use scientific notation
        --stats -a      Print min/max/avgs
        --help -h       print this

Example:
        perl cellcli -m FC_IO_RQ_R -h exa1cel01,exa1cel02,exa1cel03

NB: utility assumes passwordless SSH from this host to the cell nodes
see
http://guyharrison.squarespace.com/blog/2011/7/31/aggregating-exadata-cellcli-statistics.html

 

The only compulsory argument is “—hosts”, which requires a comma-seperated list of the cell nodes.    Most of the time you will want to use the “—mask” argument as well, which applies a filter to the metric names.   With no other arguments you’ll then get a list of each matching statistic and the sum of values across all the cells:

[melbourne@exa1db02 ~]$ perl cellcli.pl --hosts exa1cel01,exa1cel02,exa1cel03 --mask FC_IO_RQ_R.*

Name                    Sum
----------------------- ------------
FC_IO_RQ_R               8.55164e+06
FC_IO_RQ_R_MISS               675223
FC_IO_RQ_R_MISS_SEC                0
FC_IO_RQ_R_SEC                   0.1
FC_IO_RQ_R_SKIP            1.742e+08
FC_IO_RQ_R_SKIP_SEC             33.6

Note that mask matches Unix regular expressions, not Oracle style “%” wildcards - '.*' matches anything. 

The --desc argument adds the description (somewhat truncated) for each metric:

image

The --cells argument prints the values from each individual cell:

image

 

The --stats argument causes min,max and average values to be printed.  The average is the average of sums across cells, not the average of all values.  SOme statistics will have more than one value in each cell (GD_IO_RQ_W_SM for example) and the average is the average SUM of values across cells.  

image

You can use all the arguments together, but you’re output might be very wide!

Restrictions and caveats

 

When a metric has multiple object values they are simply summed.  That probably works OK for some statistics but maybe not for others.  For instance the disk grid IO counters are broken down by the name of the grid disk within the cell:

image

My utility simply sums all of those values:

image

 

 

Hope you find this of use.  Please let me know if you’ve got any suggestions…..

 

Wednesday
Jun082011

A first look at Oracle on Amazon RDS

It's been possible for a some time to run Oracle within the Amazon cloud.  I wrote an introduction to Oracle on EC2 back in 2008,  and we use an EC2 pretty extensively here at quest for testing and deployment.   Up until now, Oracle has been made available within EC2 instances which appear for all purposes to be remote hosts.  Recently, Amazon  made oracle available in a more directly cloud based fashion as one of the options to their Relational Database Service (RDS).  Using RDS, Amazon takes complete responsibility for database management providing a more complete Database As A Service (DBaaS) offering, similar to what Microsoft offers with SQL Azure.   Amazon handles installation, upgrades, backups and so forth.  In theory, you can use the database without needing to do any administrative work.

Indeed Amazon on takes so much responsibility that long-time DBAs like me are unlikely to be really happy; the very notion of being unable to connect to the operating system for instance is disturbing, and it’s currently missing support for many of our favourite tools.

However I wanted to give it a test drive, so here's some screenshots and observations to give you a feel for it.

Creating the Oracle RDS database

 

To use any of the Amazon cloud facilities, you need and Amazon AWS account.  Once you have that, you create a new RDS database from the AWS console, where you can launch a new MySQL or Oracle database:

image

You then specify the parameters for the new instance.  At the time of writing, some of these screens clearly had not been updated from the older MySQL options.  Here we can specify:

  • Type of license.   If you have a site license you can use that (as I have below).  If not you can pay Amazon by the hour and Amazon presumably kick some of that back to Oracle.
  • Oracle version.  Note also that you can ask to have the version updated for you with patches automatically – quite cool. 
  • Instance Class.  This is the size of the virtual machine and determines memory and CPU availability.  These match the equivalent EC2 instance types (see http://aws.amazon.com/ec2/#instance) and vary from 1 core with 1.7GB to 8 (faster) cores with 64GB).   Charging varies accordingly of course.
  • DB instance identifier.  At first you might think that is the database service name, but NO, it’s an Amazon identifier.  You specify the service name on the next page.
  • Username and Password of the DBA account you’ll use.

image

On next page we specify the database (eg Service/SID) name and the port.  NOTE that at the moment the default port is 3306, which is the MySQL listener port – you should change it if you want a more familiar listener port like 1521.

image

 

Click OK a couple more times and your database will start up.  Takes 5 minutes or so to get running.

Connecting to the database

 

Before you can connect, however, you’ll need to create a security group which essentially defines the Amazon firewall for your instance.   Below I’m configuring to let anything through (not ideal, but I travel a lot so I can’t predict my IP address range).  The CDIR/IP is a mask that defines what IP addresses will be allowed to connect to the database instance. 

image

Now we can connect.  We need to know the hostname, which is shown as the “Endpoint” in the DB Instance description page.    The port and service name are also here:

image

We could create a TNSNAMES entry using those details or connect via SQL Plus in the form username/password@host:port/Service.  Below, I specify these details directly into TOAD:

 

image

Restrictions

 

As easy as it is to get started with RDS, there are a lot of restrictions.  Most significantly:

  • You can’t connect to the OS. 
  • Datafiles must be Oracle Managed Files.  No filesytem specifications are allowed.  ASM is not supported.
  • Some commands are not supported, including ALTER SYSTEM.  However there are PLSQL packages that provide a lot of the common facilities such as switching logs, checkpointing, tracing and killing sessions.
  • You can only access the alert log and trace files via an external table interface
  • No OEM!!!  And for that matter, monitoring products such as Quest’s own Spotlight and Foglight cannot be installed (due to the above restrictions).   Even some parts of TOAD won’t function since they rely on forbidden commands 

The idea with RDS and these missing features  is that “you ain’t going to need it” -  the database is self managing and these DBA utilities simply aren’t required.   For more details check out this part of the RDS manual.

 

Changing the database CPU and memory

 

One of the great things about RDS is that you can resize the instance.  If  you want more memory or CPU, you can use the modify tab to change the specification:

image

This does require a reboot, which either happens during the maintenance window or – if you so specified – immediately.   Still, much easier than achieving the same result on EC2 – which is almost impossible.

Monitoring

 

Although there’s no OEM and no way to run Spotlight or other 3rd party monitoring tools, there is a very basic monitoring screen.

image

Parameters

Parameters are managed through parameter groups, which assign values dynamically.  For instance, in the default parameter group, memory target is set to 75% of total system memory.  If you reboot the system into a higher class, the memory will change.  You can change the values for a parameter group and you can create your own.

image

Conclusion

 

End to end, it probably takes about 10 minutes to get set up once you have your account details.  Not bad.  And Amazon takes care of backups and the like.

However, it’s probably not something I’ll ever do again, because I can’t really cope with the idea of not being able to go directly to the alert log, trace files, etc.  And I definitely don’t like not being able to use tools like Spotlight or even OEM.   Amazon provides work arounds for lots of this stuff – see this page for instance – but I’m too used to the command line Smile.  Still, it’s worth knowing how this all works and – who knows – it might be handy one day when I want to quickly set up a 64GB SGA database

Friday
May202011

RJDBC 2.0 performance improvements

But hopefully this will be the last RJDDC posting the I make for a while :-).

Simon Urbanek has made some fixes to RJDBC which resolve the performance issues five referred to in my last post.  As you can see below, these fixes have led to a pretty dramatic performance improvements:

17-05-2011 10-29-16 AM RJDBC 2.0

Furthermore, RJDBC is substantially easier to install than Roracle, and is of course portable across different databases.

If you’re really concerned with RJDBC performance and have memory to burn you can improve performance someone by selecting all rows in a single internal fetch, like this:

jdata2<-fetch(dbSendQuery(jcon,sqltext),n=rowsToFetch);

This improves performance, though Roracle still has the advantage:

18-05-2011 10-56-47 AM RJDBC2

You can get the new RJDBC here.