Over the past 6 months, we've seen 3 significant cloud computing database announcements:
- SimpleDB from Amazon
- BigTable inside Google App Engine
- Microsoft SQL Server Data Services (SSDS)
These cloud databases provide non-relational simple but scalable data services to applications that want to live in a public cloud. While they currently exist in a rapidly expanding hype-bubble, it may well be that cloud databases represent the biggest paradigm shift in DBMS since the relational revolution in the 80s.
Anyway, I was lucky enough to get a Google App Engine account and have been playing with the data access API. I wanted to load up some data I had in Oracle tables and after a bit of mucking around decided I'd write a general purpose loader which others might fine useful.
The loader program is a python script that generates the necessary code
to allow you to use the bulk loader to load up an Oracle
table. You can download the python program here:
Download
GAEOraLoad.py (11.3K).
To use it, you’ll need python 2.5 installed with the cx_oracle
installation installed and – of course – you’ll need a Google App
Engine account or at least the GAE SDK.
The program will display a usage message if you run it without
arguments. However, as a simple example,
you could issue the following command to create a loader for the
SCOTT.EMP table:
C:\>
python gaeoraload.py emploader scott/tiger EMP
Processing table EMP
Created EmpLoader.py to load EMP
Created EmpQuery.py to query EMP
Created Emp.csv with data from EMP
Issue the following command to load table EMP
bulkload_client.py --filename=Emp.csv --kind=Emp
--url=http://localhost:8080/EmpLoad
Created loadAll.bat with commands to load all tables on
Windows
Created loadAll.sh with commands to load all tables on *nix
Created loaderclasses.py with python definitions for all
Oracle tables
Created app.yaml for application empLoader
The command creates an app.yaml file which defines a new application
“emploader”. The application has two entry points;
“EmpLoad”, used by the bulkload_client (see
http://code.google.com/appengine/articles/bulkload.html) to load the
data, and EmpQry which as a convenience displays the contents of the
table. The LoadAll.bat or loadAll.sh scripts
contains the commands to load the CSV file which contains the data from
the EMP table. If you provide a SQL wildcard (eg “%”) then
CSV files and entry points for all matching tables are
created.
So having generated all that, I can start the application as usual:
C:\>
python \tools\google_appengine\dev_appserver.py .
INFO 2008-05-09
14:26:08,125 appcfg.py] Checking for updates to the SDK.
INFO 2008-05-09
14:26:08,845 dev_appserver_main.py] Running application emploader on
port 8080: http://localhost:8080
And then I can upload my EMP data:
C:\>
python bulkload_client.py --filename=Emp.csv --kind=Emp
--url=http://localhost:8080/EmpLoad
INFO 2008-05-09
14:27:47,703 bulkload_client.py] Starting import; maximum 10 entities
per post
INFO 2008-05-09
14:27:47,703 bulkload_client.py] Importing 10 entities in 564 bytes
INFO 2008-05-09
14:27:51,414 bulkload_client.py] Importing 4 entities in 217 bytes
INFO 2008-05-09
14:27:52,621 bulkload_client.py] Import successful
Visiting the EmpQry entry point displays the first 100 rows of data
which I just loaded (there’s only 14 rows in EMP though):
This works as you expect when you upload the application, though note
that there is no security on the application so in theory someone could
upload data into your Google App Engine account. Also there
are a couple of limitations that I know about:
1. Numeric nulls are not supported by the
bulkloader, so for now I’m setting null values to -1.
2. Unicode characters (I think) are
causing the bulk loader to crash.
Now that I can upload Oracle data to the Google App Engine cloud I’m
planning to muck about with various query idioms such as joins,
aggregates and so on. Should be fun!