Specialized graph databases such as Neo4J specialize in traversing graphs of relationships – such as those you might find in a social network. Many non-graph databases have been incorporating Graph Compute Engines to perform similar tasks. In the MongoDB 3.3 release, we now have the ability to perform simple graph traversal using the $graphLookup aggregation framework function. This will become a production feature in the 3.4 release.
The new feature is documented in MongoDB Jira SERVER-23725. The basic syntax is shown here:
I started playing with this capability originally using the POKEC dataset which represents data from a real social network in Slovakia. The relationship file soc-pokec-relationships.txt.gz contains the social network for about 1.2 million people. I loaded it into Mongo using this perl script. The following pipeline did the trick:
Now we have a collection with records like this:
What we are doing here is starting with person 1476767, then following the elements of the friends array out to two levels – i.e.: to “friends of friends”.
Increasing the maxdepth exponentially increases the amount of data we have to cope with. This is the notorious “seven degrees of separation” effect – most people in a social network are linked by 6-7 hops, so once we get past that we are effectively traversing the entire set. Unfortunately, this meant that traversing more than 3 deep caused me to run out of memory:
The graph lookup can only consume at most 100MB of memory, and currently doesn't spill to disk, even if theallowDiskUse : true clause is specified within the aggregation arguments. SERVER-23980 is open to correct this but it doesn't appear to have been scheduled yet.
For shallow networks, having an index on the connectToField makes an enormous difference. But as the depth increases, the index performance advantage decreases and eventually performance matches that of the unindexed case. In this example data that just happens to be at the “7 degrees of separation” but it will clearly depend on the nature of the data.
The $graphLookup operator is a very powerful addition to the MongoDB aggregation framework and continues the trend of providing richer query capabilities within the server. Mastering the aggregation framework is clearly a high priority for anyone wanting to exploit the full power of MongoDB