« Optimizing the order of MongoDB aggregation steps | Main | Graph Lookup in MongoDB 3.3 »
Monday
Nov072016

Bulk inserts in MongoDB

Like most database systems,  MongoDB provides API calls that allow multiple documents to be inserted in a single operation.  I’ve written about similar interfaces in Oracle in the past – for instance in this post

Array/Bulk interfaces improve database performance markedly by reducing the number of round trips between the client and the databases – Dramatically.  To realize how fundamental an optimization this is, consider that you have a bunch of people that you are going to take across a river.  You have a boat that can take 100 people at a time, but for some reason you are only taking one person across in each trip – not smart, right?  Failing to take advantage of array inserts is very similar: you are essentially sending  network packets that could take hundreds of documents over with only a single document.

Coding bulk inserts in MongoDB is a little more work, but far from rocket science.  The exact syntax varies depending on the language.  Here we’ll look at a little bit of JavaScript code. 

 

   1: if (orderedFlag==1) 
   2:   bulk=db.bulkTest.initializeOrderedBulkOp();
   3: else 
   4:   bulk=db.bulkTest.initializeUnorderedBulkOp(); 
   5:  
   6: for (i=1;i<=NumberOfDocuments;i++) {
   7:   //Insert a row into the bulk batch
   8:   var doc={_id:i,i:i,zz:zz};
   9:   bulk.insert(doc);
  10:   // Execute the batch if batchsize reached
  11:   if (i%batchSize==0) {
  12:     bulk.execute();
  13:     if (orderedFlag==1)
  14:       bulk=db.bulkTest.initializeOrderedBulkOp();
  15:     else
  16:       bulk=db.bulkTest.initializeUnorderedBulkOp();
  17:   }
  18: }
  19: bulk.execute();

On lines 2 or 4 we initialize a bulk object for the “bulkTest” collection.  There are two ways to do this – we can create it ordered or non-ordered.  Ordered guarantees that the collections are inserted in the order they are presented to the bulk object.  Otherwise, MongoDB can optimize the inserts into multiple streams which may not insert in order. 

On line 9 we add documents to the “bulk” object.  When we hit an appropriate batch size (line 11), we execute the batch (line 12) and reinitialize the bulk object (lines 14 or 16).  We do a further execute at the end (line 19) to make sure all documents are inserted. 

I inserted 100,000 documents into a collection on my laptop, using various “batch” sizes (eg, the number of documents inserted between execute() calls). I tried both ordered and unordered bulk operations.  The results are charted below:

image

The results are pretty clear – inserting in batches improves performance dramatically.  Initially, every increase in batchsize reduces performance but eventually the improvement levels off.  I believe MongoDB transparently limits batches to 1000 per operation anyway, but even before then, the chances are your network packets will be filled up and you won’t see any reduction in elapsed time by increasing the batch size.  To use the analogy above – the rowboat is full! 

For my example, there was no real difference between ordered and nonordered bulk operations but this might reflect a limitation on my laptop.  Something to play with next time….

When inserting multiple documents into a MongoDB collection you should generally take advantage of the massive performance advantages offered by the bulk operations interface.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>