In this post, I experimented with inserting data from Oracle into Cassandra column families using Hector. Unfortunately, that code isn’t compatible with the latest Cassandra 0.7 release, so I had to rework it. The new version uses the addInsertion method of the Mutator object and while not totally intuitive didn’t take long to get working. Here are the key changes:
The reason why I wanted to do this was to play with Cassandra using our (relatively) new Toad for Cloud Databases Eclipse client. Toad for Cloud Databases lets you work with non-relational datasources such as Cassandra, HBase, SimpleDB, etc, using SQL.
Here’s how it works. We select the column family we want to map from the Cassandra server:
That column family contains data loaded from both the Oracle CUSTOMER and SALES tables. Toad recognizes that the data in that single column family is best represented by two normalized tables, and gives us the opportunity to specify the names for the primary and foreign keys. We can also rename the “tables” (more like views really) that Toad will create:
The resulting tables look similar to the tables that we originally loaded from Oracle, and we can issue SQL queries against them just as we could have with Oracle. The queries get translated from SQL to thrift calls against the underlying Cassandra Server:
I definitely find it easier to issue SQL than write a 200 line Java program to do the same thing! Of course, I'm not much of a Java programmer, but at a minimum having Toad to query the Cassandra data is invaluable when checking to see that your program did was it was intended to do