Next Generation
Databases: NoSQL,
NewSQL and Big Data

Pre-order on Amazon

Latest Postings:


Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter


 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 


Powered by Squarespace

Vector clocks


Once of the concepts I found difficult initially when looking at non-relational systems is the concept of the vector clock.  Some databases – like Cassandra - use timestamps to work out which is the “latest” transaction. If there are two conflicting modifications to a column value, the one with the highest timestamp will be considered the most recent and the most correct.

Other Dynamo systems use a more complex mechanism known as a vector clock. The vector clock has the advantage of not requiring clock synchronization across all nodes, and helps us identify transactions that might be in conflict.

Despite its name, the vector clock does not include any timestamps. Rather it is composed of a set of counters. These counters are incremented when operations complete, in a similar way to the traditional System Change Number pattern that we are familiar with from relational systems like Oracle. The set contains one counter for each node in the cluster. Whenever an operation occurs on a node, that node will increment its own counter within its vector clock. Whenever a node transmits an operation to another node it will include its vector clock within the request. The transmitted vector clock will include the highest counter for the transmitting node as well is the highest counters from other nodes that the transmitting node has ever seen.

When a node receives possibly conflicting updates from other nodes, it can compare the vector clocks to determine the relative sequencing of the requests. There is a defined set of vector clock operations that can tell if:

  • The two vector clocks come from nodes that are completely in sync
  • One node is “out of date” with respect of the other node
  • The clocks are “concurrent” in that each node has some information that is more up to date than the other node. In this case we can’t choose which update is truly the more correct.

Vector clocks are notoriously difficult to understand, though the underlying algorithm is really quite simple. The diagram below shows an example of three vector clocks incrementing across three nodes. The algorithm is somewhat simplified to improve clarity


In the example the vector clocks start out set to 0 for all nodes (1). Updates to nodes from external clients caused the nodes to increment their own element of the vector clock (2). When these changes are propagated to other nodes, the receiving node updates its vector clock and merges the vector clocks from the other nodes (3). Event (H) occurs when node 2 receives the vector clock (F) from node 1 and (G) from node 3 (4). Each of these vector clocks contain elements higher than the other - vector clock F has the higher value for node 1, while vector clock G has the higher value for node 3. There is no way for node 2 to be sure which of the two vector clocks represent the most up-to-date data - each of the sending nodes “knows” something that the other node does not, and consequently it’s not clear which of the two nodes “knows” best.

For those of us from the world of strictly consistent databases like Oracle, think of the vector clock as a set of System Change Numbers from each system.  We examine the SCNs from each node to see if there are nodes that might not have seen all the changes that have been recorded on another node.

The Vector clock in above us that Version G and Version F are conflicting – each contains information from unique updates that could both contain important information. What then, is the system to do? Here are some of the options:

  • Revert to last write wins: two updates are unlikely to have occurred at the exact same nanosecond, so one will have a higher timestamp value. We could decide that the highest timestamp “wins”.
  • Keep both copies, and require that the application or the user resolve the conflict.
  • Somehow merge the data. This is the approach taken by the original Dynamo which managed Amazon’s shopping cart. If there are two conflicting shopping carts they are merged and the worst that can happen (from Amazon’s point of view) is that you buy some things twice. Another merge can occur with things like counters: rather than having one counter increment overwrite another, we can deduce that both operations wanted to increment the counter and increment it twice. A special class of data types: Conflict-Free Replicated Data Type (CRDT) exist that allow these sort of merges to be predefined.

There are advocates for the vector clock – such as the architects of Riak - , and advocates for the timestamp system used in Cassandra. Neither party disagree about the concrete implications of the two approaches: they differ on the desirability of the consequences. Last Write Wins represents a simpler model for the application developer and administrator, Vector clocks allow for conflicts to be identified but which must then be resolved.   In a later post I’ll give an example of how you programmatically resolve conflicts in Riak.


Exploring CouchBase N1QL

Couchbase recently announced Non-first Normal Form Query Language (N1QL) – pronounced “Nickel” – a virtually complete SQL language implementation for use with document databases, and implemented within the Couchbase server 4.0.

I recently took a quick look. 

Most of the examples use the sample films documents shown below (this is the same sample data we created for MongoDB in this post):

2015-10-05_16-43-02 n1ql

N1QL allows us to perform basic queries to retrieve selected documents or attributes of selected documents:


N1QL allows us to access nested documents within the JSON structure using array notation. So for instance in the example below Actors[0] refers to the first nested document within the actors array:


We can query for subdocuments which match a search criteria using WHERE ANY syntax:


The UNNEST command allows embedded documents to be “joined” back to the parent document. So here we get one result for each Actor who starred in film 200, with the film title included in the results:




The UNNEST command allows us to perform the equivalent of joins between parent and children documents where the child documents are nested within the parent. N1QL also allows us to join across disparate documents, providing that one of the document collections contains a reference to the primary key in the other.

So for instance if we had a bucket of documents that contains the primary keys of “overdue” films in our imaginary (and by now definitely struggling) DVD store, then we can join that to the films collection to return just those films:


N1QL also contains DML statements allowing us to manipulate the contents of documents, and DDL statements allowing creation and modification of indexes.

N1QL is ambitious attempt to bring SQL into the world of document databases. It’s interesting to consider that as the same time that companies like CouchBase are introducing SQL support into their database, that companies like Oracle are introducing strong JSON support into their SQL-based database. It would seem that the two worlds are coming together.


On my way to Collaborate 2015

As I write this I’m at 35,000 ft (or so) on my way to yet another Collaborate.  This year I’ll be presenting two sessions and participating in one panel:

963: Writing to Lead
Monday, April 13  |  Banyan E, South Convention Center, Level 2
10:30 a.m. – 11:30 a.m.
Jonathan Gennick will be moderating a panel including myself, Bobby Curtis, Charles Kim, Darl Kuln and Michael Rosenblum.  We’ll be talking about how writing – book, blogs, articles, twitter, etc – to best effect.  The panel are all authors and hopefully between us we’ll have a lot of useful advise to offer the aspiring or existing book authors.
995: Top 5 Trends in Database Technology 
Tue, Apr 14, 2015 (04:30 PM - 05:30 PM) : Banyan E 
It’s been an amazing 5 years in database technology – we went from RDBMS for everything to Big Data, NoSQL, NewSQL and a lot of innovation within RDBMS as well.  This talk covers 5 big technologies that are influencing Oracle and other database technologies.  It includes some of my best recycled jokes as well hopefully some usefull technical content.  There’s an article based on this presentation if you want a preview here.

442:  Top 10 Oracle Database Tuning Tips and Techniques
Thu, Apr 16, 2015 (08:30 AM - 09:30 AM) : Palm B
Seasoned presenters know that for some reason “top 10” presentations somehow get accepted more readily than more abstract titles.  Something to do with the impression of a concrete amount of information I suppose.  But once you start giving a top 10 talk you quickly realize you can only budget 5 minutes at the most for each topic.  But actually I feel pretty happy about this talk as a high level summary of the core concepts that I’ve picked up in 25 years of Oracle performance tuning.   The main thing I’ve tried to accomplish in this talk is to organize the core concepts of performance tuning around a sensible tuning philosophy.   I’m not assuming a great deal of Oracle tuning experience. 



Sakila sample schema in MongoDB

I wanted to do some experimenting with MongoDB, but I wasn’t really happy with any of the sample data I could find in the web.  So I decided that I would translate the MySQL “Sakila” schema into MongoDB collections as part of the learning process.   

For those that don’t know, Sakila is a MySQL sample schema that was published about 8 years ago.  It’s based on a DVD rental system.   OK, not the most modern data ever, but DVDs are still a thing aren’t they??

You can get the MongoDB version of Sakilia here.  To load, use unpack using tar zxvf sakilia.tgz then use mongoimport to load the resulting JSON documents.  On windows you should be able to double click on the file to get to the JSON.  

The Sakila database schema is shown below.  There are 16 tables representing a fairly easy to understand inventory of films, staff, customers and stores.

Database diagram

When modelling MongoDB schemas, we partially ignore our relational modelling experience – “normalization” is not the desired end state.   Instead of driving our decision on the nature of the data, we drive it on the nature of operations.  The biggest decision is which “entities” get embedded within documents, and which get linked.  I’m not the best person to articulate these principles – the O’Reilly book “MongoDB Applied Design Patterns” does a pretty good job and this presentation is also useful.

My first shot at mapping the data – which may prove to be flawed as I play with MongoDB queries – collapsed the 16 tables into just 3 documents:  FILMS, STORES and CUSTOMERS.   ACTORS became a nested document in FILMS, STAFF and INVENTORY were nested into STORES, while RENTALS and PAYMENTS nested into CUSTOMERS.   Whether these nestings turn out to be good design decisions will depend somewhat on the application.  Some operations are going to be awkward while others will be expedited.

Here’s a look at the FILMS collection:


Here is STORES:


And here is CUSTOMERS:


Looks like I have to fix some float rounding issues on Smile.

The code that generates the schema is here.   It’s pretty slow, mainly because of the very high number of lookups on rentals and payments.  It would be better to bulk collect everything and scan through it but it would make the code pretty ugly.   If this were Oracle I’m pretty sure I could make it run faster but with MySQL SQL tuning is much harder.

Code is pretty straight forward.  To insert a MongoDB document we get the DBCollection, then create BasicDBObjects which we insert into the DBCollection.  To nest a documnet we create a BasicDBList and insert BasicDBObjects into it.  Then we add the BasicDBList to the parent BasicDBObject.  The following snippit illustrates that sequence.  It's mostly boilerplate code, with the only human decision being the nesting structure. 

   1: DBCollection filmCollection = mongoDb.getCollection(mongoCollection);
   3: while ( { // For each film
   5:         // Create the actors document
   6:         BasicDBObject filmDoc = new BasicDBObject();
   7:         Integer filmId = fileRs.getInt("FILM_ID");
   8:         filmDoc.put("_id", filmId);
   9:         filmDoc.put("Title", fileRs.getString("TITLE"));
  10:         // Other attributes
  11:         BasicDBList actorList = getActors(mysqlConn, filmId);
  12:         // put the actor list into the film document
  13:         filmDoc.put("Actors", actorList);
  14:         filmCollection.insert(filmDoc); // insert the film
  16:     }

Anyway, hopefully this might be of some use to those moving from MySQL to MongoDB.  Comments welcome!


Best practices for accessing Oracle from scala using JDBC

I’ve been looking for an excuse to muck about with scala for a while now.  So I thought i’d do a post similar to those I’ve done the past for .NET, python, perl and R.  Best practices for Java were included in my book Oracle Performance Survival Guide (but I’d be more than happy to post them if anyone asks).

One of the great things about scala is that it runs in the JVM, so we can use the Oracle JDBC drivers to access Oracle.  These drivers are very mature and support all the best programming practices. 

Best practices for programming Oracle in any language require at least the following:

  1. Use bind variables appropriately.
  2. Ensure you are performing array fetch when retrieving more than one row
  3. Ensure that you use array insert when doing bulk inserts

You can get the scala program which contains the code snippets below here.


If you’ve ever used Oracle with JDBC, you’ll find things very familiar.   Here’s a snippet that connects to an oracle database with username,password, host and service specified on the command line (assumes the default 1521 port, but of course this could be parameterized as well):

   1:  import java.sql.Connection
   2:  import java.sql.ResultSet
   4:  import oracle.jdbc.pool.OracleDataSource
   6:  object guyscala2 {
   7:    def main(args: Array[String]) {
   8:      if (args.length != 4) {
   9:        println("Arguments username password hostname serviceName")
  10:        System.exit(1)
  11:      }
  13:      val ods = new OracleDataSource()
  14:      ods.setUser(args(0))
  15:      ods.setPassword(args(1))
  16:      ods.setURL("jdbc:oracle:thin:@" + args(2)+":1521/"+args(3))
  17:      val con = ods.getConnection()
  18:      println("Connected")


Using Bind variables

As in most languages, it's all to easy to omit bind variables in scala.  Here's an example where the variable value is simply concatenated into a SQL string

   1:     for (cust_id <- 1 to rows) {
   2:        val s1 = con.createStatement()
   3:        s1.execute("UPDATE customers SET cust_valid = 'Y'"
   4:          + " WHERE cust_id = " + cust_id)
   6:        s1.close()
   7:      }


On line 3 we build up a SQL statement concatenating the value we want into the string and immediately exeucte it.  Each execution is a unique SQL statement which requires parsing, optimization and caching in the shared pool. 

Here’s an example using bind variables and a prepared Statement:

   1:    val s2 = con.prepareStatement(
   2:        "UPDATE customers SET cust_valid = 'Y'"
   3:          + " WHERE cust_id = :custId")
   5:      for (cust_id <- 1 to rows) {
   6:        s2.setInt(1, cust_id)
   7:        s2.execute()
   8:      }
  10:      s2.close()


Slightly more complex:  we prepare a statement on line 1, associate the bind variable on line 6, then execute on line 7.   It might get tedious if there are a lot of bind variables, but still definitely worthwhile.   Below we see the difference in execution time when using bind variables compared with concanating the variables into a string.  Bind variables definitely increase execution time.



As well as the reduction in execution time for the individual application, using bind variables reduces the chance of latch and/or mutex contention for SQL statements in the shared pool – where Oracle caches SQL statements to avoid re-parsing. If many sessions are concurrently trying to add new SQL statements to the shared pool, then some may have to wait on the library cache mutex.   Historically, this sort of contention has been one of the most common causes of poor application scalability – applications which did not use bind variables risked strangling on library cache latch or mutex as the SQL exectuion rate increased. 

Exploiting the array interface

Oracle can retrieve rows either from the database one at a time, or can retrieve rows in “batches” sometimes called “arrays”. Array fetch refers to the mechanism by which Oracle can retrieve multiple rows in a single fetch operation. Fetching rows in batches reduces the number of calls issued to the database server, and can also reduce network traffic and logical IO overhead.   Fetching rows one at a time is like moving thousands of people from one side of a river to another in a boat with all but one of the seats empty -  it’s incredibly inefficient.

Fetching rows using the array interface is simple as can be and in fact enabled by default - though with a small default batch size of 10.  The setFetchSize method of the connection and statement objects sets the number of rows to be batched.  Unfortunately, the default setting of 10 is often far too small – especially since there is typically no degradation even when the fetch size is set very large – you get diminishing, but never negative, returns as you increase the fetch size beyond the point at which every SQL*NET packet is full  

Here’s a bit of code that sets the fetch size to 1000 before executing the SQL:

   1:  val s1 = con.createStatement()
   3:  s1.setFetchSize(1000)
   4:  val rs = s1.executeQuery("Select /*fetchsize=" + s1.getFetchSize() + " */ * " +
   5:          "from customers where rownum<= " + rows)
   6:  while ( {
   7:          val c1 = rs.getString(1)
   8:          val c2 = rs.getString(2)
   9:   }
  10:   rs.close()


Here’s the elapsed times for various fetchsizes for the above query:


While the default setting of 10 is clearly better than any lower value, it’s still more than 6 times worse than a setting of 100 or 200. 

Inserting data is another situation in which we normally want to consider the array interface.  In this case we need to change our code structure a bit more noticeably.   Here’s the code we probably would write if we didn’t know about array processing:

   1:    val insSQL = "INSERT into arrayinsertTest" +
   2:        " (cust_id,cust_first_name,cust_last_name,cust_street_address) " +
   3:        " VALUES(:1,:2,:3,:4)"
   4:      val insStmt = con.prepareStatement(insSQL)
   5:      val startMs = System.currentTimeMillis
   6:      var rowCount = 0
   7:      while ( {
   8:        insStmt.setInt(1, rs.getInt(1))
   9:        insStmt.setString(2, rs.getString(2))
  10:        insStmt.setString(3, rs.getString(3))
  11:        insStmt.setString(4, rs.getString(4))
  12:        rowCount += insStmt.executeUpdate()
  13:      }
  14:      val elapsedMs = System.currentTimeMillis - startMs
  15:      println(rowCount + " rows inserted - " + elapsedMs + " ms")
  16:      con.commit()


We prepare the statement on line 4, bind the values to be inserted on lines 8-11, then execute the insert on line 12. 

WIth a few minor changes, this code can perform array inserts:


   1:      val insSQL = "INSERT into arrayinsertTest" +
   2:        " (cust_id,cust_first_name,cust_last_name,cust_street_address) " +
   3:        " VALUES(:1,:2,:3,:4)"
   4:      val insStmt = con.prepareStatement(insSQL)
   5:      val startMs = System.currentTimeMillis
   6:      var rowCount = 0
   7:      while ( {
   8:        insStmt.setInt(1, rs.getInt(1))
   9:        insStmt.setString(2, rs.getString(2))
  10:        insStmt.setString(3, rs.getString(3))
  11:        insStmt.setString(4, rs.getString(4))
  12:        insStmt.addBatch()
  13:        rowCount += 1
  14:        if (rowCount % batchSize == 0) {
  15:          insStmt.executeBatch()
  16:        }
  17:      }
  19:      val elapsedMs = System.currentTimeMillis - startMs
  20:      println(rowCount + " rows inserted - " + elapsedMs + " ms")
  21:      con.commit()


On line 12, we now call the addBatch() method instead of executeUpdate().  Once we’ve added enough rows to our batch (defined by the batchsize constant in the above code) we can call executeBatch() to insert the batch. 

Array insert gives about the same performance improvements as array fetch.  For the above example I got the performance improvement below:


To be fair, the examples above are a best case scenario for array processing – the Oracle database was running in an Amazon EC2 instance in the US, while I was running the scala code from my home in Australia!   So the round trip time was as bad as you are ever likely to see.  Nevertheless, you see pretty impressive performance enhancements from simply increasing array size in the real world all the time.


If you use JDBC to get data from Oracle RDBMS within a scala project, then the principles for optimization are the same as for Java JDBC – use preparedStatements, bind variables and array processing.  Of course there’s a lot more involved in optimizing database queries (SQL Tuning, indexing, etc), but these are the three techniques that vary significantly from language to language.  The performance delta from these simple techniques are very significant and should represent the default pattern for a professional database programmer.