dbKoda for MongoDB

dbKoda for MongoDB - a modern open source database IDE, now available for MongoDB. Download it here!

Next Generation
Databases: NoSQL,
NewSQL and Big Data

Buy at Amazon

Read sample at Amazon

Buy at Apress

Latest Postings:


Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter


Powered by Squarespace

 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 



Blockchain and databases of the future

You would have to have been living under a rock for the past few years not to have heard of Bitcoin.    Bitcoin is an electronic cryptocurrency which can be used like cash in many web transactions.  At time of writing there are about 15 million Bitcoins in circulation, trading at approximately $USD 360 each for a total value of about $USD 5.3 billion.

Bitcoin combines peer-to-peer technology and public key cryptography.  The owner of a bitcoin can use a private key to assert ownership and authorize transactions – others can use the public key to validate the transaction.  As in other peer to peer systems such as Bittorrent, there is no central server which maintains Bitcoin transactions – rather there is a distributed public ledger called the Blockchain.   I wrote a short article on blockchain here

The implications of cryptocurrencies is way beyond our scope here, but there are definite implications for database technologies in the Blockchain concept.  Blockchain replaces the trusted third party that must normally mediate any transfer of funds.   Rather than a centralized database that records transactions and authenticates each party, Blockchain allows transactions and identity to be validated by consensus with the Blockchain network – each transaction is confirmed by public-key based authentication from multiple nodes before being concluded.   Blockchain could be as disruptive to our traditional notions of banking and non-monetary transactions as peer-to-peer systems like napster were to the music business. 

The Blockchain underlying Bitcoin is public, but there can be private (or permissioned) Blockchains which are “invitation only”.  Whether Private or public, Blockchains arguably represent a new sort of shared distributed database.  Like systems based on the Dynamo model, the data in the block chain is distributed redundantly across a large number of hosts. However, the Blockchain represents a complete paradigm shift in how permissions are managed within the database. In an existing database system, the database owner has absolute control over the data held in the database. However in a Blockchain system, ownership is maintained by the creator of the data. 

Consider a database that maintains a social network like Facebook: although the application is programmed to allow only you to modify your own posts or personal details, the reality is that the Facebook company actually has total control over your online data. They can – if they wish – remove your posts, censor your posts or even modify your posts if they really wanted to. In a Blockchain based database you would retain total ownership of your posts and it would be impossible for any other entity to modify them.

Applications based on Blockchain have the potential to disrupt a very wide range of social and economic activities.  Transfers of money, property, management of global identity (passports, birth certificates), voting, permits, wills, health data, and a multitude of other transactional data could be regulated in the future by Blockchains.   The databases that currently maintain records of these types of transactions may become obsolete.

Most database owners will probably want to maintain control of the data in the database, and therefore it’s unlikely that Blockchain will completely transform database technology in the short term.  However it does seem likely that database systems will implement Blockchain based authentication and authorization protocols for specific application scenarios. Furthermore, it seems likely that formal database systems built upon a Blockchain foundation will soon emerge.


Vector clocks


Once of the concepts I found difficult initially when looking at non-relational systems is the concept of the vector clock.  Some databases – like Cassandra - use timestamps to work out which is the “latest” transaction. If there are two conflicting modifications to a column value, the one with the highest timestamp will be considered the most recent and the most correct.

Other Dynamo systems use a more complex mechanism known as a vector clock. The vector clock has the advantage of not requiring clock synchronization across all nodes, and helps us identify transactions that might be in conflict.

Despite its name, the vector clock does not include any timestamps. Rather it is composed of a set of counters. These counters are incremented when operations complete, in a similar way to the traditional System Change Number pattern that we are familiar with from relational systems like Oracle. The set contains one counter for each node in the cluster. Whenever an operation occurs on a node, that node will increment its own counter within its vector clock. Whenever a node transmits an operation to another node it will include its vector clock within the request. The transmitted vector clock will include the highest counter for the transmitting node as well is the highest counters from other nodes that the transmitting node has ever seen.

When a node receives possibly conflicting updates from other nodes, it can compare the vector clocks to determine the relative sequencing of the requests. There is a defined set of vector clock operations that can tell if:

  • The two vector clocks come from nodes that are completely in sync
  • One node is “out of date” with respect of the other node
  • The clocks are “concurrent” in that each node has some information that is more up to date than the other node. In this case we can’t choose which update is truly the more correct.

Vector clocks are notoriously difficult to understand, though the underlying algorithm is really quite simple. The diagram below shows an example of three vector clocks incrementing across three nodes. The algorithm is somewhat simplified to improve clarity


In the example the vector clocks start out set to 0 for all nodes (1). Updates to nodes from external clients caused the nodes to increment their own element of the vector clock (2). When these changes are propagated to other nodes, the receiving node updates its vector clock and merges the vector clocks from the other nodes (3). Event (H) occurs when node 2 receives the vector clock (F) from node 1 and (G) from node 3 (4). Each of these vector clocks contain elements higher than the other - vector clock F has the higher value for node 1, while vector clock G has the higher value for node 3. There is no way for node 2 to be sure which of the two vector clocks represent the most up-to-date data - each of the sending nodes “knows” something that the other node does not, and consequently it’s not clear which of the two nodes “knows” best.

For those of us from the world of strictly consistent databases like Oracle, think of the vector clock as a set of System Change Numbers from each system.  We examine the SCNs from each node to see if there are nodes that might not have seen all the changes that have been recorded on another node.

The Vector clock in above us that Version G and Version F are conflicting – each contains information from unique updates that could both contain important information. What then, is the system to do? Here are some of the options:

  • Revert to last write wins: two updates are unlikely to have occurred at the exact same nanosecond, so one will have a higher timestamp value. We could decide that the highest timestamp “wins”.
  • Keep both copies, and require that the application or the user resolve the conflict.
  • Somehow merge the data. This is the approach taken by the original Dynamo which managed Amazon’s shopping cart. If there are two conflicting shopping carts they are merged and the worst that can happen (from Amazon’s point of view) is that you buy some things twice. Another merge can occur with things like counters: rather than having one counter increment overwrite another, we can deduce that both operations wanted to increment the counter and increment it twice. A special class of data types: Conflict-Free Replicated Data Type (CRDT) exist that allow these sort of merges to be predefined.

There are advocates for the vector clock – such as the architects of Riak - , and advocates for the timestamp system used in Cassandra. Neither party disagree about the concrete implications of the two approaches: they differ on the desirability of the consequences. Last Write Wins represents a simpler model for the application developer and administrator, Vector clocks allow for conflicts to be identified but which must then be resolved.   In a later post I’ll give an example of how you programmatically resolve conflicts in Riak.


Exploring CouchBase N1QL

Couchbase recently announced Non-first Normal Form Query Language (N1QL) – pronounced “Nickel” – a virtually complete SQL language implementation for use with document databases, and implemented within the Couchbase server 4.0.

I recently took a quick look. 

Most of the examples use the sample films documents shown below (this is the same sample data we created for MongoDB in this post):

2015-10-05_16-43-02 n1ql

N1QL allows us to perform basic queries to retrieve selected documents or attributes of selected documents:


N1QL allows us to access nested documents within the JSON structure using array notation. So for instance in the example below Actors[0] refers to the first nested document within the actors array:


We can query for subdocuments which match a search criteria using WHERE ANY syntax:


The UNNEST command allows embedded documents to be “joined” back to the parent document. So here we get one result for each Actor who starred in film 200, with the film title included in the results:




The UNNEST command allows us to perform the equivalent of joins between parent and children documents where the child documents are nested within the parent. N1QL also allows us to join across disparate documents, providing that one of the document collections contains a reference to the primary key in the other.

So for instance if we had a bucket of documents that contains the primary keys of “overdue” films in our imaginary (and by now definitely struggling) DVD store, then we can join that to the films collection to return just those films:


N1QL also contains DML statements allowing us to manipulate the contents of documents, and DDL statements allowing creation and modification of indexes.

N1QL is ambitious attempt to bring SQL into the world of document databases. It’s interesting to consider that as the same time that companies like CouchBase are introducing SQL support into their database, that companies like Oracle are introducing strong JSON support into their SQL-based database. It would seem that the two worlds are coming together.


On my way to Collaborate 2015

As I write this I’m at 35,000 ft (or so) on my way to yet another Collaborate.  This year I’ll be presenting two sessions and participating in one panel:

963: Writing to Lead
Monday, April 13  |  Banyan E, South Convention Center, Level 2
10:30 a.m. – 11:30 a.m.
Jonathan Gennick will be moderating a panel including myself, Bobby Curtis, Charles Kim, Darl Kuln and Michael Rosenblum.  We’ll be talking about how writing – book, blogs, articles, twitter, etc – to best effect.  The panel are all authors and hopefully between us we’ll have a lot of useful advise to offer the aspiring or existing book authors.
995: Top 5 Trends in Database Technology 
Tue, Apr 14, 2015 (04:30 PM - 05:30 PM) : Banyan E 
It’s been an amazing 5 years in database technology – we went from RDBMS for everything to Big Data, NoSQL, NewSQL and a lot of innovation within RDBMS as well.  This talk covers 5 big technologies that are influencing Oracle and other database technologies.  It includes some of my best recycled jokes as well hopefully some usefull technical content.  There’s an article based on this presentation if you want a preview here.

442:  Top 10 Oracle Database Tuning Tips and Techniques
Thu, Apr 16, 2015 (08:30 AM - 09:30 AM) : Palm B
Seasoned presenters know that for some reason “top 10” presentations somehow get accepted more readily than more abstract titles.  Something to do with the impression of a concrete amount of information I suppose.  But once you start giving a top 10 talk you quickly realize you can only budget 5 minutes at the most for each topic.  But actually I feel pretty happy about this talk as a high level summary of the core concepts that I’ve picked up in 25 years of Oracle performance tuning.   The main thing I’ve tried to accomplish in this talk is to organize the core concepts of performance tuning around a sensible tuning philosophy.   I’m not assuming a great deal of Oracle tuning experience. 



Sakila sample schema in MongoDB

I wanted to do some experimenting with MongoDB, but I wasn’t really happy with any of the sample data I could find in the web.  So I decided that I would translate the MySQL “Sakila” schema into MongoDB collections as part of the learning process.   

For those that don’t know, Sakila is a MySQL sample schema that was published about 8 years ago.  It’s based on a DVD rental system.   OK, not the most modern data ever, but DVDs are still a thing aren’t they??

You can get the MongoDB version of Sakilia here.  To load, use unpack using tar zxvf sakilia.tgz then use mongoimport to load the resulting JSON documents.  On windows you should be able to double click on the file to get to the JSON.  

The Sakila database schema is shown below.  There are 16 tables representing a fairly easy to understand inventory of films, staff, customers and stores.

Database diagram

When modelling MongoDB schemas, we partially ignore our relational modelling experience – “normalization” is not the desired end state.   Instead of driving our decision on the nature of the data, we drive it on the nature of operations.  The biggest decision is which “entities” get embedded within documents, and which get linked.  I’m not the best person to articulate these principles – the O’Reilly book “MongoDB Applied Design Patterns” does a pretty good job and this presentation is also useful.

My first shot at mapping the data – which may prove to be flawed as I play with MongoDB queries – collapsed the 16 tables into just 3 documents:  FILMS, STORES and CUSTOMERS.   ACTORS became a nested document in FILMS, STAFF and INVENTORY were nested into STORES, while RENTALS and PAYMENTS nested into CUSTOMERS.   Whether these nestings turn out to be good design decisions will depend somewhat on the application.  Some operations are going to be awkward while others will be expedited.

Here’s a look at the FILMS collection:


Here is STORES:


And here is CUSTOMERS:


Looks like I have to fix some float rounding issues on customers.rentals.payments.amount Smile.

The code that generates the schema is here.   It’s pretty slow, mainly because of the very high number of lookups on rentals and payments.  It would be better to bulk collect everything and scan through it but it would make the code pretty ugly.   If this were Oracle I’m pretty sure I could make it run faster but with MySQL SQL tuning is much harder.

Code is pretty straight forward.  To insert a MongoDB document we get the DBCollection, then create BasicDBObjects which we insert into the DBCollection.  To nest a documnet we create a BasicDBList and insert BasicDBObjects into it.  Then we add the BasicDBList to the parent BasicDBObject.  The following snippit illustrates that sequence.  It's mostly boilerplate code, with the only human decision being the nesting structure. 

   1: DBCollection filmCollection = mongoDb.getCollection(mongoCollection);
   3: while (fileRs.next()) { // For each film
   5:         // Create the actors document
   6:         BasicDBObject filmDoc = new BasicDBObject();
   7:         Integer filmId = fileRs.getInt("FILM_ID");
   8:         filmDoc.put("_id", filmId);
   9:         filmDoc.put("Title", fileRs.getString("TITLE"));
  10:         // Other attributes
  11:         BasicDBList actorList = getActors(mysqlConn, filmId);
  12:         // put the actor list into the film document
  13:         filmDoc.put("Actors", actorList);
  14:         filmCollection.insert(filmDoc); // insert the film
  16:     }

Anyway, hopefully this might be of some use to those moving from MySQL to MongoDB.  Comments welcome!