Guy Harrison - Yet Another Database Blog

Entries by Guy Harrison (96)

Tuesday

Jul222008

Cloud computing and database artices

Tuesday, July 22, 2008 at 5:48PM

This has been a very exciting year for cloud computing - especially as relates to database technologies. Amazon announced SimpleDB, Microsoft announced SQL Server Data Services and Google unveiled their big-table based Google App Engine.

For the first time in a long time, it looks like database technologies are a hot topic again.

I've written some articles on cloud computing for Database trends and applications. One on Amazon AWS, one on Google App Engine and a two part article on next generation database technologies. The first part of the article discusses the rise of the cloud database and the second part looks at Mike Stonbraker et al's proposals for next generation DBMS.

Uploading Oracle data to Google App Engine

Friday, May 9, 2008 at 4:07PM

Over the past 6 months, we've seen 3 significant cloud computing database announcements:

SimpleDB from Amazon
BigTable inside Google App Engine
Microsoft SQL Server Data Services (SSDS)

These cloud databases provide non-relational simple but scalable data services to applications that want to live in a public cloud. While they currently exist in a rapidly expanding hype-bubble, it may well be that cloud databases represent the biggest paradigm shift in DBMS since the relational revolution in the 80s.

Anyway, I was lucky enough to get a Google App Engine account and have been playing with the data access API. I wanted to load up some data I had in Oracle tables and after a bit of mucking around decided I'd write a general purpose loader which others might fine useful.

The loader program is a python script that generates the necessary code to allow you to use the bulk loader to load up an Oracle table. You can download the python program here:
Download GAEOraLoad.py (11.3K).

To use it, you’ll need python 2.5 installed with the cx_oracle installation installed and – of course – you’ll need a Google App Engine account or at least the GAE SDK.
The program will display a usage message if you run it without arguments. However, as a simple example, you could issue the following command to create a loader for the SCOTT.EMP table:

C:\>
python gaeoraload.py emploader scott/tiger EMP

Processing table EMP

      
Created EmpLoader.py to load EMP

      
Created EmpQuery.py to query EMP

      
Created Emp.csv with data from EMP

      
Issue the following command to load table EMP

               
bulkload_client.py --filename=Emp.csv --kind=Emp
--url=http://localhost:8080/EmpLoad

Created loadAll.bat with commands to load all tables on
Windows

Created loadAll.sh with commands to load all tables on *nix

Created loaderclasses.py with python definitions for all
Oracle tables

Created app.yaml for application empLoader

The command creates an app.yaml file which defines a new application “emploader”. The application has two entry points; “EmpLoad”, used by the bulkload_client (see http://code.google.com/appengine/articles/bulkload.html) to load the data, and EmpQry which as a convenience displays the contents of the table. The LoadAll.bat or loadAll.sh scripts contains the commands to load the CSV file which contains the data from the EMP table. If you provide a SQL wildcard (eg “%”) then CSV files and entry points for all matching tables are created.

So having generated all that, I can start the application as usual:

C:\>
python \tools\google_appengine\dev_appserver.py .

INFO     2008-05-09
14:26:08,125 appcfg.py] Checking for updates to the SDK.

INFO     2008-05-09
14:26:08,845 dev_appserver_main.py] Running application emploader on
port 8080: http://localhost:8080

And then I can upload my EMP data:

C:\>
python bulkload_client.py --filename=Emp.csv --kind=Emp
--url=http://localhost:8080/EmpLoad

INFO     2008-05-09
14:27:47,703 bulkload_client.py] Starting import; maximum 10 entities
per post

INFO     2008-05-09
14:27:47,703 bulkload_client.py] Importing 10 entities in 564 bytes

INFO     2008-05-09
14:27:51,414 bulkload_client.py] Importing 4 entities in 217 bytes

INFO     2008-05-09
14:27:52,621 bulkload_client.py] Import successful

Visiting the EmpQry entry point displays the first 100 rows of data which I just loaded (there’s only 14 rows in EMP though):

This works as you expect when you upload the application, though note that there is no security on the application so in theory someone could upload data into your Google App Engine account. Also there are a couple of limitations that I know about:

1. Numeric nulls are not supported by the bulkloader, so for now I’m setting null values to -1.
2. Unicode characters (I think) are causing the bulk loader to crash.

Now that I can upload Oracle data to the Google App Engine cloud I’m planning to muck about with various query idioms such as joins, aggregates and so on. Should be fun!

Guy Harrison |

1 Comment |

Saturday

Mar012008

Using _spin_count to reduce latch contention in 11g

Saturday, March 1, 2008 at 8:17PM

Back in 2003, I published the results of experimenting with various values of _spin_count to improve throughput on latch congested systems. You can find the original paper here.

The study involved varying spin count on an Oracle 8.1.6 database suffering from heavy cache buffer chains latch contention. The database was running on a Windows 2000 server. The results of this study confirmed that _spin_count can be very effective in improving the throughput of latch contention-bound systems. In the test, throughput almost doubled when spin_count was raised from the default of 2000 to 10000.

I recently repeated these tests for Oracle 11g. This time, the database was experiencing shared pool and library cache latch contention and was running on RHEL 5.0. The results will be published soon in my contention series at ToadWorld. To summarize however, increasing spin_count in 11g was equally effective in reducing latch contention. As _spin_count increased, latch waits reduced and throughput increased, up until CPU saturated, at which point no further improvements were achieved.

This chart summarizes typical results:

Spin count versus latch contention

In this case, the optimal value for spin_count was probably about 8000. I did these tests on a variety of systems and workloads, and almost always the default spin_count of 2000 was lower than optimal

In the 2003 tests, increasing spin count beyond the point at which CPU saturated resulted in latch waits continuing to decrease, but waits for the CPU rendering those reductions useless. In these tests, after CPU saturated latch free waits stopped reducing. I presume that this is because the session was pre-empted by the OS - effectively stopping it's spin - so that when CPU gets stressed excessive _spin_count values are effectively dishonored. In 2003 trials it looked like the sessions kept spinning, creating large run queues. The difference may be due to the scheduling differences between the Windows and Linux kernels, or it might be a difference between 8i and 11g; I'm guessing the former, but I need to do some more research when I get the chance.

I should point out that Spotlight on Oracle, the diagnostic tool i developed at Quest software, has a latch tuning module that will try and discover the most optimal value for spin_count. You might like to try it out.

Guy Harrison |

2 Comments |

Tuesday

Jan292008

Accessing Oracle from Powershell

Tuesday, January 29, 2008 at 9:11AM

I've used perl extensively - and other scripting languages from time to time - to write little unitities or simulations for Oracle. Perl is available on windows, but it doesn't provide easy access to all the Windows API and .NET utilities (such as accessing WMI counters). For that PowerShell would be the best choice.

It's not entirely obvious how to connect to oracle from Powershell. Here's how I did it.

You need to install the Oracle Data Provider for .NET. Once that is done, you load it in powershell using this command:

# Load the ODP assembly
[Reflection.Assembly]::LoadFile("C:\oracle\10g\ODP.NET\bin\2.x\Oracle.DataAccess.dll")

Of course, "c:\oracle\10g" represents my Oracle home. It's non-trivial to get this from the registry so make sure that this path is valid for you.

Now we can use the ODP.NET variants of standard ADO.NET calls to connect to Oracle and extract results. I setup my connection to Oracle as follows:

#connect to Oracle
$constr = "User Id=system;Password=manager;Data Source=gh10gb"
$conn= New-Object Oracle.DataAccess.Client.OracleConnection($constr)
$conn.Open()

My TNS alias was "gh10gb". Now I have a connection handle $conn, I can create a data reader object for a SQL query:

# Create a datareader for a SQL statement
$sql="select * from all_users"
$command = New-Object Oracle.DataAccess.Client.OracleCommand( $sql,$conn)
$reader=$command.ExecuteReader()

From this point on, everything is ADO.NET standard. You can look at the structure of the result set as so:

# Write out the result set structure
for ($i=0;$i -lt $reader.FieldCount;$i++) {
Write-Host $reader.GetName($i) $reader.GetDataTypeName($i)
}

And write out the results like this:

# Write out the results
while ($reader.read()) {
$username=$reader.GetString(0)
$userid=$reader.GetDecimal(1)
$createDate=$reader.GetDateTime(2)

Write-Host "$userid $username $createDate "
}

The ADO.NET constructs are fairly clumsy compared to perl DBI, but they get the job done. And if you want to write some code that integrates Windows APIs with Oracle database access, Powershell is probably a good choice (A C# console application would be the alternative I suppose). I have to perform some experiments shortly on latch contention in which I'll be tracking WMI counters and v$latch statistics concurrently and PowerShell should provide a useful means of doing that.

Quest sponsors one of the powershell GUIs out there. If you need a GUI to dabble in powershell, check out www.powergui.org.

Guy Harrison |

2 Comments |

1 Reference |

Friday

Aug172007

Systematic Oracle Performance Tuning

Friday, August 17, 2007 at 11:36AM

When faced with an obviously IO-bound database, it's tempting to deal with the most obvious (symptom) - the IO subsystem immediately. Unfortunately, this usually treats symptoms rather the cause, is often expensive and usually futile. Here's my article on a more systematic approach:

Systematic Oracle tuning (pdf)

and here's the powerpoint from the recent NOCOUG conference.

Guy Harrison |

1 Comment |