Entries in Oracle (38)


Oracle tables vs Cassandra SuperColumns


In my last post,  I wrote some Java code to insert Oracle tables into Cassandra column families.  As much fun as this was for me, it was  fairly trivial and not a particularly useful exercise in terms of learning Cassandra. 

In Cassandra,  data modelling is very different from the relational models we are used to and one would rarely convert a complete Oracle schema from tables directly to ColumnFamilies .  Instead, Cassandra data modelling involves the creation of ColumnFamilies with SuperColumns to represent master-detail structures that are commonly referenced together

SuperColumns vs Relational schema


At the Cassandra Summit in August,  Eben Hewitt gave a presentation on Cassandra Data Modelling.   There’s a lot of nuance in that talk and in the topic, but a key point in Cassandra – as in many other NoSQL databases – is that you model data to match the queries you need to satisfy,  rather than to a more theoretically "pure" normalized form.   For relational guys, the process is most similar to radical denormalization in which you introduce redundancy to allow for efficient query processing.

For example, let’s consider the Oracle SH sample schema.  Amongst other things, it includes SALES, PRODUCTS and CUSTOMERS:


9-09-2010 3-35-32 PM Oracle sample schema

We could map each Oracle table to a Cassandra ColumnFamily, but because there are no foreign key indexes or joins,  such a Cassandra data model would not necessarily support the types of queries we want.  For instance, if we want to query sales totals by customer ID, we should create a column family keyed by customer id, which contains SuperColumns named for each product which in turn includes columns for sales totals.  It might look something like this:

ID CustomerDetails Product Name #1 Product Name #2 ………….. Product Name #N
First Name Last Name
Guy Harrison
Quantity Value
3 $100,020
Quantity Value
3 $130,000
First Name Last Name
Greg Cottman
Quantity Value
34 $10,080
Quantity Value
4 $99,000


Each customer “row” has super column for each product that contains the sales for that product.  Not all customers have all the supercolumns - each customer has supercolumns only for each product they have purchased.  The name of the SuperColumn is the name of the product.  

Giving the column the name of the product is a major departure from how we would do things in Oracle.  The name of a column or SuperColumn can be determined by the data, not by the schema - a concept completely alien to relational modelling.

Inserting into SuperColumns with Hector


To try and understand this,  I created a Cassandra columnfamily of the type “Super”.  Here’s my definition in the storage-conf.xml file:

<ColumnFamily Name="SalesByCustomer" 
Comment="Sales summary for each customer "/>

And here is some of my Hector Java program, which reads sales totals for each customer from the Oracle sample schema, and inserts them into the ColumnFamily:

   1: private static void insertSales(Connection oracleConn, Keyspace keyspace,
   2:         String cfName) throws SQLException {
   3:     int rows = 0;
   4:     ColumnPath cf = new ColumnPath(cfName);
   5:     Statement query = oracleConn.createStatement();
   7:     String sqlText = "SELECT cust_id, cust_first_name,  cust_last_name, prod_name, "
   8:             + "           SUM (amount_sold) sum_amount_sold,sum(quantity_sold) sum_quantity_sold "
   9:             + "          FROM sh.sales    "
  10:             + "          JOIN sh.customers USING (cust_id) "
  11:             + "          JOIN sh.products  USING (prod_id)  "
  12:             + "         GROUP BY cust_id, cust_first_name,  cust_last_name,  prod_name "
  13:             + "         ORDER BY cust_id, prod_name ";
  14:     ResultSet results = query.executeQuery(sqlText);
  15:     int rowCount = 0;
  16:     int lastCustId = -1;
  17:     while (results.next()) { // For each customer
  18:         Integer custId = results.getInt("CUST_ID");
  19:         String keyValue = custId.toString();
  21:         if (rowCount++ == 0 || custId != lastCustId) { // New Customer
  22:             String custFirstName = results.getString("CUST_FIRST_NAME");
  23:             String custLastName = results.getString("CUST_LAST_NAME");
  24:             System.out.printf("%s %s\n", custFirstName, custLastName);
  25:             //Create a supercolumn for customer details (first, lastname)     
  26:             cf.setSuper_column(StringUtils.bytes("CustomerDetails"));
  27:             cf.setColumn(StringUtils.bytes("customerFirstName"));
  28:             keyspace.insert(keyValue, cf, StringUtils.bytes(custFirstName));
  29:             cf.setColumn(StringUtils.bytes("customerLastName"));
  30:             keyspace.insert(keyValue, cf, StringUtils.bytes(custLastName));
  31:         }
  32:         //Insert product sales total for that customer 
  33:         String prodName = results.getString("PROD_NAME");
  34:         Float SumAmountSold = results.getFloat("SUM_AMOUNT_SOLD");
  35:         Float SumQuantitySold = results.getFloat("SUM_QUANTITY_SOLD");
  36:         //Supercolumn name is the product name 
  37:         cf.setSuper_column(StringUtils.bytes(prodName));
  38:         cf.setColumn(StringUtils.bytes("AmountSold"));
  39:         keyspace.insert(keyValue, cf, StringUtils.bytes(SumAmountSold.toString()));
  40:         cf.setColumn(StringUtils.bytes("QuantitySold"));
  41:         keyspace.insert(keyValue, cf, StringUtils.bytes(SumQuantitySold.toString()));
  43:         lastCustId = custId;
  44:         rows++;
  45:     }
  46:     System.out.println(rows + " rows loaded into " + cf.getColumn_family());
  47: }

This code is fairly straightforward,  but let’s step through it anyway:

Lines Description
7-14 Execute the Oracle SQL to get product summaries for each customer
17 Loop through each row returned (one row per product per customer)
21 Check to see if this is a completely new customer
26-30 If it is a new customer,  create the CustomerDetails SuperColumn for that customer.  The SuperColumn name is “CustomerDetails” and it contains columns for Firstname and Lastname.

Now we create a SuperColumn for a specfic product, still keyed to the customer.  The SuperColumn name is set to the name of the product (line 37).  Inside the supercolumn are placed columns “AmountSold” (lines 38-39) and “QuantitySold” (lines 40-41)

Querying SuperColumns


Inserting master detail relationships into a supercolumn column family was easy enough.  I had a lot more difficulty writing code to query the data.  The tricky part seems to be when you don’t know the name of the SuperColumn you want to read from.  There's no direct equivalent to the JDBC ResultMetaData object to query the SuperColumn names - instead you create a "SuperSlice" predictate that defines a range of SuperColumns that you want to retrieve.  It's a bit awkward to express the simple case in which you want to return all the SuperColumns. 

Below is a bit of code which retrieves sales totals for a specific customer id.  I suspect I've made a few newbie mistakes :-):

   1: public static void querySuperColumn(Keyspace keyspace, String cfName,
   2:         String keyValue) {
   4:     ColumnPath colFamily = new ColumnPath(cfName);
   5:     System.out.println("Details for customer id " + keyValue);
   7:     /* Get Customer Details */
   8:     colFamily.setSuper_column(StringUtils.bytes("CustomerDetails"));
   9:     SuperColumn custDetailsSc = keyspace
  10:             .getSuperColumn(keyValue, colFamily);
  11:     for (Column col : custDetailsSc.getColumns()) {
  12:         String colName = StringUtils.string(col.getName()); 
  13:         String colValue = StringUtils.string(col.getValue()); 
  14:         System.out.printf("\t%-20s:%-20s\n", colName, colValue);
  15:     }
  16:     /* Get dynamic columns -  */
  17:     ColumnParent colParent = new ColumnParent(cfName);
  18:     SliceRange sliceRange = new SliceRange(StringUtils.bytes(""), StringUtils
  19:             .bytes(""), false, 2 ^ 32); // TODO: what if there are > 2^32 ??                                             
  20:     SlicePredicate slicePredicate = new SlicePredicate();
  21:     slicePredicate.setSlice_range(sliceRange);
  22:     //TODO:  Surely there's an easier way to select all SC than the above??
  23:     List superSlice = keyspace.getSuperSlice(keyValue,
  24:             colParent, slicePredicate);
  25:     for (SuperColumn prodSuperCol : superSlice) {  //For each super column
  26:         String superColName = StringUtils.string(prodSuperCol.getName());
  27:         if (!superColName.equals("CustomerDetails")) { // Already displayed
  29:             System.out.printf("\n%50s:", superColName); // product Name 
  30:             List columns1 = prodSuperCol.getColumns();
  31:             for (Column col : columns1) {               // product data 
  32:                 String colName = StringUtils.string(col.getName()); 
  33:                 String colValue = StringUtils.string(col.getValue()); 
  34:                 System.out.printf("\t%20s:%-20s", colName, colValue);
  36:             }
  37:         }
  38:     }
  40: }
Lines Description
8-9 Set the superColumn to the “CustomerDetails” supercolumn
11-14 Retrieve the column values (firstname, surname) for the CustomerDetails supercolumn
17-21 Set up a “SlicePredicate” that defines the supercolumns to be queried.  I want to get all of the supercolumns (eg every product), so I set up an unbounded range (line 18) and supply that to the slice predicate (line 21)
23 Create a list of supercolumns.  This will include all the SuperColumns in the column family (including, unfortunately,  CustomerDetails)
27 Eliminate CustomerDetails from the result.  Here we only want product names
30-35 Iterate through the columns in each supercolumn.  THis will extract QuantitySold and AmountSold for each Product name


Here’s some output from the Java program.  It prints out customer Details and product sales totals for customer# 10100:

Details for customer id 101000
customerFirstName :Aidan
customerLastName :Wilbur

CD-RW, High Speed Pack of 5: AmountSold:11.99 QuantitySold:1.0
Keyboard Wrist Rest: AmountSold:11.99 QuantitySold:1.0
Multimedia speakers- 3" cones: AmountSold:44.99 QuantitySold:1.0

SuperColumns with Toad for Cloud Databases 


Toad for cloud databases now has Cassandra support, which makes querying SuperColumns s a lot easier.  SuperColumns that have dynamic names but uniform internal column structure (as in my example above) are represented by Toad for Cloud Databases as a detail table.  To put it another way,  Toad for Cloud Databases re-normalizes the data - displaying it in the format that we would typically use in an RDBMS. 

So when we point Toad for Cloud databases at our SalesByCustomer column family, it maps the column family to two tables:  one for CustomerDetails and the other - which by default it will call SalesByCustomersuper_column” – for product sales totals.  We can rename the subtable and subtable key during the mapping phase to make it clearer that it represents product details.

9-09-2010 1-56-19 PM map cassandra super col

Now if we want to extract product details for a particular customer, we can do a SQL join.  Below we build the join in the query builder, but of course we could simply code the SQL by hand as we would for any NoSQL or SQL database supported by Toad for Cloud Databases:

9-09-2010 3-49-36 PM cassandra supercol qry

And just to close the loop, here we can see that the Toad for Cloud databases query returns the same data as the Hector query:

9-09-2010 3-50-48 PM cassabdra supercol results




All NoSQL databases require that we change the way we think about data modelling, and Cassandra is no exception.  SuperColumns are an incredibly powerful construct, but I can’t say that I found them intuitive or easy.  Hopefully APIs and tooling will evolve to make life easier for those of us coming from the relational world.


Playing with Cassandra and Oracle

Cassandra  is one of the hottest of the NoSQL databases.  From a production DBAs perspective it’s not hard to see why:  while some of the other NoSQLs offer more programming bells and whistles for the developer, Cassandra is built from the ground up for total and transparency redundancy and scalability, close to the heart of every DBA.

However,  Cassandra involves some complex data modelling concepts – mainly around the notorious SuperColumn concept, and I don’t think I’ll ever understand it fully until I’ve played directly with some data.  To that end, I thought I’d start by trying to model some familiar Oracle sample schemas in Cassandra.

Toad for Cloud Databases is releasing support for Cassandra early next month (eg September 2010), so I’ve been using that – as well as Java of course – to try to get some initial data loaded.

For other NoSQL databases,  Toad for Cloud lets us create NoSQL tables from relational tables with a couple of clicks.  Unfortunately, we can’t do that with Cassandra, since you can’t create a ColumnFamily on the fly.  So my first Cassandra tasks was to write a simple program to take an Oracle table (or query) and create a matching column family.

Getting started

Getting started with Cassandra was surprisingly easy.  I followed the instructions in http://schabby.de/cassandra-installation-configuration/ to install Cassandra on my laptop, and installed the hector Java interface from http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/.

Terminology in NoSQL can be confusing, with each NoSQL database using terms differently from each other, and all of them using terms differently from RDBMS.  In Cassandra:

  • A Keyspace is like a schema
  • ColumnFamily is roughly like a table

Things get very funky when SuperColumns are introduced, but lets skip that for now.

To create a ColumnFamily in Cassandra 0.6, we have to add its name to the storage-conf.xml file which is in the Conf directory and then restart Cassandra.  In 0.7 there’ll be a way to do this without restarting the server.

Here is where I created a keyspace called “Guy” and created some ColumnFamilies to play with:

   1: "Guy">
   2:   "G_Employees" CompareWith="UTF8Type"/>
   3:   "G_Employees2" CompareWith="UTF8Type"/>
   4:   "G_Employees3" CompareWith="UTF8Type"/>
   5:   org.apache.cassandra.locator.RackUnawareStrategy
   6:   1
   7:   org.apache.cassandra.locator.EndPointSnitch


Loading data


I wrote some Java code that takes a SQL statement, and loads the result set directly into a column family.  Here’s the critical method (the complete java program with command line interface is here):

   1: private static void oracle2Cassandra(Connection oracleConn,
   2:         Keyspace keyspace, String cfName, String sqlText)
   3:         throws SQLException {
   4:     int rows = 0;
   5:     ColumnPath cf = new ColumnPath(cfName);
   6:     Statement oraQuery = oracleConn.createStatement();
   7:     ResultSet result = oraQuery.executeQuery(sqlText);
   8:     ResultSetMetaData rsmd = result.getMetaData();
   9:     while (result.next()) { // For each row in the output
  10:         // The first column in the result set must be the key value
  11:         String keyValue = result.getString(1);
  12:         // Iterate through the other columns in the result set
  13:         for (int colId = 2; colId <= rsmd.getColumnCount(); colId++) {
  14:             String columnName = rsmd.getColumnName(colId);
  15:             String columnValue = result.getString(colId);
  16:             if (!result.wasNull()) {
  17:             cf.setColumn(StringUtils.bytes(columnName));
  18:                 keyspace.insert(keyValue, cf, StringUtils
  19:                         .bytes(columnValue));
  20:             }
  21:         }
  22:         rows++;
  23:     }
  24:     System.out.println(rows + " rows loaded into " + cf.getColumn_family());
  25: }

The method take s a Oracle connection and a SQL statement, and pushes the data from that SQL into the Cassandra column family and keyspace specified.   The first column returned by the query is used on the key to the Cassandra data.

Lines 6-8 execute the statement and retrieve a ResultSet object – which contains the data – and a ResultSetMetaData object which contains the column names.  Lines 9-21 just iterate through the rows and columns and create entries in the Column Family that match.   We use the Hector setColumn methodto set the name of the column and the insert method to apply the column value.  Too easy!

Of course, I’d have no idea as to whether my job had worked if I didn’t have Toad for Cloud databases available.  Using TCD, I can map the Cassandra columnFamily to a TCD “table” and browse the table (eg Cassandra Column Family) to see the resulting data:


I can even use SQL to join the Cassandra data to the Oracle data to make absolutely certain that the data transfer went OK:



It’s surprisingly easy to get started with Cassandra.  Installation of a test cluster is a breeze, and the Hector Java API is straight forward.    Of course,  direct mapping of RDBMS tables to Cassandra ColumnFamilies doesn’t involve the complexities of advanced Cassandra data models using variable columns and SuperColumns.    Next, I’m going to try and map a more complex ColumnFamily which maps to multiple Oracle tables – hopefully won’t make my brain hurt too much!

Toad for Cloud Databases is introducing Cassandra support in the 1.1 release due out within the next two weeks.  Its a free download from toadforcloud.com


“Stolen” CPU on Xen-based virtual machines

I’ve written previously about how VMWare ESX manages CPU and how to measure your“real” CPU consumption if you are running an database in such a VM. 

VMware is currently the most popular virtualization platform for Oracle database virtualization, but Oracle’s own Oracle Virtual Machine uses the open source Xen hypervisor, as does Amazon’s Elastic Compute Cloud (EC2): which runs quite a few Oracle databases.    So Oracle databases – and many other interesting workloads – will often be found virtualized inside a Xen-based VM.

I recently discovered that there is an easy way to view CPU overhead inside a Xen VM, at least if you are running a paravirtulized linux kernel 2.6.11 or higher.   In this case, both vmstat and top support an “St” column, which describes the amount of time “stolen” from the virtual machine by Xen.  This stolen time appears to be exactly analogous to VMWare ESX ready time – it represents time that the VM was ready to run on a physical CPU, but that CPU was being allocated to other tasks – typically to another virtual machine. 

Here we see in top (on an Oracle Enterprise Linux on an EC2 instance) reporting that 13% of the CPU has been unavailable to the VM due to virtualization overhead.  Note that the graphical system monitor doesn’t reflect this – as far as it’s concerned CPU utilization has been at a steady 100%.

 12-07-2010 4-30-49 PM xenCPU

The great thing here is that you can view the overhead from within the virtual machine itself.  This is because in a paravirtualized Operating system – which are the norm in Xen based systems  - the kernel is rewritten to be virtualization aware.  The paravirtualized Linux kernel – from 2.6.11 – includes changes to vmstat and top to show the virtualization overhead.  In ESX you have to connect to the VSphere client or use one of the VMWare APIs to get this information. 

As with ESX, unless you know the virtualization overhead you can’t really interpret CPU utilization correctly.  For instance if your database is CPU bound and you get a sudden spike in response time, you need to know if that spike was caused by “stolen” CPU.  So you should keep track of the ESX ready statistic or the Xen “stolen” statistic whenever you run a database (or any critical  workload for that matter) in a VM.

We just introduced ESX support in the upcoming release of Spotlight on Oracle.  Starting with release 7.5 (which has just been made available with the latest release of Toad DBA suite) we show the virtualization overhead right next to CPU utilization and provide a drilldown giving you details of how the VM is serving your database:

7-07-2010 3-07-48 PM VMWare CPU contention 12-07-2010 4-30-49 PM Spotlight dd

We plan to add support for monitoring Xen-based virtualized databases in an upcoming release. 


The Flash cache KEEP option

Oracle’s database flash cache (AKA “database smart flash cache”) supports selective caching at the table level via the FLASH_CACHE setting in the storage clause.

Settings can be DEFAULT, KEEP or NONE.  Eg:

7-07-2010 1-26-48 PM alter table

The word KEEP is a bit unfortunate, since it perhaps implies behaviour identical to the KEEP pool of the buffer cache.  The KEEP cache is a separately managed area of specific size, so that you can control what proportion of your memory is allocated to “kept” blocks.   The flash cache KEEP setting is more to prioritise caching – there seems to be no limit on how many blocks will be kept.

For the above three tables, if we read all the rows in each more or less simultaneously, we get the following outcomes (shown here in our lovely new Spotlight on Oracle flash cache drilldown):


Not surprisingly,  where FLASH_CACHE is NONE, there is no flash cache storage.  When flash cache is KEEP, all the blocks are kept in the flash cache, and consequently there is not enough room for all the blocks from tables where FLASH_CACHE=DEFAULT.  However, unlike the KEEP buffer pool, there’s no way to limit the amount of blocks in the KEEP pool, and so you can quite easily fill up the cache with blocks form FLASH_CACHE=KEEP tables.  Once that happens, those blocks will stay there forever.

If you have Spotlight (or at least the 7.5 version about to be released with Flash cache diagnostics), you can see this happening.  Once the flash cache fills up, every new block introduced forces an old block out.  This shows up as evictions in the following chart:


Oracle clearly evicts blocks from FLASH_CACHE=DEFAULT tables in preference to FLASH_CACHE=KEEP.  Presumably (I haven't tested this) within each setting blocks are moved out using some form of Least Recently Used (LRU) or Not Recently Used algorithm. 

KEEP should therefore be used only for relatively small tables.  If you set FLASH_CACHE=KEEP for a large table (larger than the flash cache for instance) then you risk rendering the flash cache unusable for other tables.   I hope Oracle chooses to implement a limit on KEEP size in a subsequent release.  I think it would be far more useful if you could allocate multiple flash cache pools of a certain size as for KEEP and RECYCLE in the buffer cache.



Spotlight 7.5 – which includes flash cache diagnostics as well as ESX diagnostics, support for the 11gR2 parallel query queue and lots of other cool stuff - will be available next month, both on it’s own and as part of the TOAD DBA suite. 


Disabling the Intel X-25 E write cache

In my last posting on using SSD with Oracle, I said how impressed I was with the X-25 E SSD write performance.  However, at the OReilly MySQL conference last month, I attended a talk by Vadim Tkachenko and Morgan Tocker from Percona on An Overview of Flash Storage for Databases.  It was a great talk overall, but one important thing I learned is that the X-25 E has a volatile 64MB write cache.  What this means is that the X-25 can report that a block is written to disk when it is still within a RAM buffer within the device.  If the disk failed between the write to RAM and the write to flash then the data could be lost.

We’d normally regard this data loss as an unacceptable risk, so you would think that the best thing to do would be to turn the write cache off.  This can be done with the following command:

hdparm -W 0 /dev/sdb

(assuming that /dev/sdb is the flash SSD).

Turning off the write cache – as you’d expect – reduces the write IO capacity of the device.  Below we see the effect on two identical workloads:

x25e write cache 1

These workloads involve SELECT and UPDATE operations on a table which is stored on a datafile directly on the SSD.  There’s no db flash cache involved in this simulation.

The datafile write rate drops substantially and the work takes longer to complete, as we expect.  But why does the read IO rate drop as well?   The reason is because of free buffer waits

As described in this post,  when a IO subsystem has a higher read bandwidth than write bandwidth, then sessions may be able to add and update blocks in the buffer cache faster than the DBWR can clear them out.  When this happens free buffer waits occur as sessions wait for buffers to be cleared. 

We can see the free buffer waits in Spotlights event wait chart:

x25e write cache2

Disabling the write cache slows down disk performance somewhat, but it’s still a lot faster than a spinning disk.  Furthermore,  most workloads are not as update intensive as in my simulation so a lot of the time you won’t hit this problem.  Nevertheless, it's important to realize that the X-26 has this write cache and that it may be artificially increasing write throughput at the cost of write safety.

One word of caution:  I met a guy from Percona who told me that Intel doesn’t actually support the X-25 with the write cache disabled.  This is a bit disturbing, since it implies that you can choose data safety or vendor support but not both!  

Also, note that the write cache can be left enabled if the SSD is only being used for the 11GR2 database flash cache.  In that configuration failed writes to the cache in the event of a disk failure will cause no harm:  Oracle will detect that the cache has failed and will bypass the cache completely. 

Page 1 ... 3 4 5 6 7 ... 8 Next 5 Entries »