Next Generation
Databases: NoSQL,
NewSQL and Big Data

Buy at Amazon
Buy at Apress

Search

Oracle Performance Survival Guide

Buy It
Read it on Safari
Scripts and Examples
Sample Chapter

                                    

 MySQL Stored procedure programming

Buy It
Read it on Safari
Scripts and Examples 

                                                

Entries in Oracle (35)

Wednesday
Apr302014

Best practices for accessing Oracle from scala using JDBC

I’ve been looking for an excuse to muck about with scala for a while now.  So I thought i’d do a post similar to those I’ve done the past for .NET, python, perl and R.  Best practices for Java were included in my book Oracle Performance Survival Guide (but I’d be more than happy to post them if anyone asks).

One of the great things about scala is that it runs in the JVM, so we can use the Oracle JDBC drivers to access Oracle.  These drivers are very mature and support all the best programming practices. 

Best practices for programming Oracle in any language require at least the following:

  1. Use bind variables appropriately.
  2. Ensure you are performing array fetch when retrieving more than one row
  3. Ensure that you use array insert when doing bulk inserts

You can get the scala program which contains the code snippets below here.

Connecting

If you’ve ever used Oracle with JDBC, you’ll find things very familiar.   Here’s a snippet that connects to an oracle database with username,password, host and service specified on the command line (assumes the default 1521 port, but of course this could be parameterized as well):

   1:  import java.sql.Connection
   2:  import java.sql.ResultSet
   3:   
   4:  import oracle.jdbc.pool.OracleDataSource
   5:   
   6:  object guyscala2 {
   7:    def main(args: Array[String]) {
   8:      if (args.length != 4) {
   9:        println("Arguments username password hostname serviceName")
  10:        System.exit(1)
  11:      }
  12:   
  13:      val ods = new OracleDataSource()
  14:      ods.setUser(args(0))
  15:      ods.setPassword(args(1))
  16:      ods.setURL("jdbc:oracle:thin:@" + args(2)+":1521/"+args(3))
  17:      val con = ods.getConnection()
  18:      println("Connected")

 

Using Bind variables

As in most languages, it's all to easy to omit bind variables in scala.  Here's an example where the variable value is simply concatenated into a SQL string

   1:     for (cust_id <- 1 to rows) {
   2:        val s1 = con.createStatement()
   3:        s1.execute("UPDATE customers SET cust_valid = 'Y'"
   4:          + " WHERE cust_id = " + cust_id)
   5:   
   6:        s1.close()
   7:      }

 

On line 3 we build up a SQL statement concatenating the value we want into the string and immediately exeucte it.  Each execution is a unique SQL statement which requires parsing, optimization and caching in the shared pool. 

Here’s an example using bind variables and a prepared Statement:

   1:    val s2 = con.prepareStatement(
   2:        "UPDATE customers SET cust_valid = 'Y'"
   3:          + " WHERE cust_id = :custId")
   4:   
   5:      for (cust_id <- 1 to rows) {
   6:        s2.setInt(1, cust_id)
   7:        s2.execute()
   8:      }
   9:   
  10:      s2.close()

 

Slightly more complex:  we prepare a statement on line 1, associate the bind variable on line 6, then execute on line 7.   It might get tedious if there are a lot of bind variables, but still definitely worthwhile.   Below we see the difference in execution time when using bind variables compared with concanating the variables into a string.  Bind variables definitely increase execution time.

image

 

As well as the reduction in execution time for the individual application, using bind variables reduces the chance of latch and/or mutex contention for SQL statements in the shared pool – where Oracle caches SQL statements to avoid re-parsing. If many sessions are concurrently trying to add new SQL statements to the shared pool, then some may have to wait on the library cache mutex.   Historically, this sort of contention has been one of the most common causes of poor application scalability – applications which did not use bind variables risked strangling on library cache latch or mutex as the SQL exectuion rate increased. 

Exploiting the array interface

Oracle can retrieve rows either from the database one at a time, or can retrieve rows in “batches” sometimes called “arrays”. Array fetch refers to the mechanism by which Oracle can retrieve multiple rows in a single fetch operation. Fetching rows in batches reduces the number of calls issued to the database server, and can also reduce network traffic and logical IO overhead.   Fetching rows one at a time is like moving thousands of people from one side of a river to another in a boat with all but one of the seats empty -  it’s incredibly inefficient.

Fetching rows using the array interface is simple as can be and in fact enabled by default - though with a small default batch size of 10.  The setFetchSize method of the connection and statement objects sets the number of rows to be batched.  Unfortunately, the default setting of 10 is often far too small – especially since there is typically no degradation even when the fetch size is set very large – you get diminishing, but never negative, returns as you increase the fetch size beyond the point at which every SQL*NET packet is full  

Here’s a bit of code that sets the fetch size to 1000 before executing the SQL:

   1:  val s1 = con.createStatement()
   2:   
   3:  s1.setFetchSize(1000)
   4:  val rs = s1.executeQuery("Select /*fetchsize=" + s1.getFetchSize() + " */ * " +
   5:          "from customers where rownum<= " + rows)
   6:  while (rs.next()) {
   7:          val c1 = rs.getString(1)
   8:          val c2 = rs.getString(2)
   9:   }
  10:   rs.close()

 

Here’s the elapsed times for various fetchsizes for the above query:

image

While the default setting of 10 is clearly better than any lower value, it’s still more than 6 times worse than a setting of 100 or 200. 

Inserting data is another situation in which we normally want to consider the array interface.  In this case we need to change our code structure a bit more noticeably.   Here’s the code we probably would write if we didn’t know about array processing:

   1:    val insSQL = "INSERT into arrayinsertTest" +
   2:        " (cust_id,cust_first_name,cust_last_name,cust_street_address) " +
   3:        " VALUES(:1,:2,:3,:4)"
   4:      val insStmt = con.prepareStatement(insSQL)
   5:      val startMs = System.currentTimeMillis
   6:      var rowCount = 0
   7:      while (rs.next()) {
   8:        insStmt.setInt(1, rs.getInt(1))
   9:        insStmt.setString(2, rs.getString(2))
  10:        insStmt.setString(3, rs.getString(3))
  11:        insStmt.setString(4, rs.getString(4))
  12:        rowCount += insStmt.executeUpdate()
  13:      }
  14:      val elapsedMs = System.currentTimeMillis - startMs
  15:      println(rowCount + " rows inserted - " + elapsedMs + " ms")
  16:      con.commit()

 

We prepare the statement on line 4, bind the values to be inserted on lines 8-11, then execute the insert on line 12. 

WIth a few minor changes, this code can perform array inserts:

 

   1:      val insSQL = "INSERT into arrayinsertTest" +
   2:        " (cust_id,cust_first_name,cust_last_name,cust_street_address) " +
   3:        " VALUES(:1,:2,:3,:4)"
   4:      val insStmt = con.prepareStatement(insSQL)
   5:      val startMs = System.currentTimeMillis
   6:      var rowCount = 0
   7:      while (rs.next()) {
   8:        insStmt.setInt(1, rs.getInt(1))
   9:        insStmt.setString(2, rs.getString(2))
  10:        insStmt.setString(3, rs.getString(3))
  11:        insStmt.setString(4, rs.getString(4))
  12:        insStmt.addBatch()
  13:        rowCount += 1
  14:        if (rowCount % batchSize == 0) {
  15:          insStmt.executeBatch()
  16:        }
  17:      }
  18:   
  19:      val elapsedMs = System.currentTimeMillis - startMs
  20:      println(rowCount + " rows inserted - " + elapsedMs + " ms")
  21:      con.commit()

 

On line 12, we now call the addBatch() method instead of executeUpdate().  Once we’ve added enough rows to our batch (defined by the batchsize constant in the above code) we can call executeBatch() to insert the batch. 

Array insert gives about the same performance improvements as array fetch.  For the above example I got the performance improvement below:

 Capture

To be fair, the examples above are a best case scenario for array processing – the Oracle database was running in an Amazon EC2 instance in the US, while I was running the scala code from my home in Australia!   So the round trip time was as bad as you are ever likely to see.  Nevertheless, you see pretty impressive performance enhancements from simply increasing array size in the real world all the time.

Conclusion

If you use JDBC to get data from Oracle RDBMS within a scala project, then the principles for optimization are the same as for Java JDBC – use preparedStatements, bind variables and array processing.  Of course there’s a lot more involved in optimizing database queries (SQL Tuning, indexing, etc), but these are the three techniques that vary significantly from language to language.  The performance delta from these simple techniques are very significant and should represent the default pattern for a professional database programmer. 

Thursday
Jan022014

Using SSD for a temp tablespace on Exadata

I seem to be getting a lot of surprising performance results lately on our X-2 quarter rack Exadata system, which is good – the result you don’t expect is the one that teaches you something new.

This time, I was looking at using a temporary tablespace based on flash disks rather than spinning disks.  In the past – using Fusion IO PCI cards, I found that using flash for temp tablespace was very effective in reducing the overhead of multi-pass sorts:

image

See (http://guyharrison.squarespace.com/ssdguide/04-evaluating-the-options-for-exploiting-ssd.html)

However, when I repeated these tests for Exadata, I got very disappointing results.  SSD based temp tablespace actually lead to marginally worse performance:

image

Looking in depth at a particular point (the 500K SORT_AREA_SIZE point), we can see that although the SSD based temp tablespace has marginally better read times, it involves a significantly higher write overhead:

image

I can understand the higher read overhead (at least partially).  It’s Yet Another time when sequential write operations to an SSD device have provided disappointing performance.  However, it’s strange to see such poor read performance.  How can a spinning disk serve blocks up at effectively the same latency an SSD?

So I dumped all the direct path read waits from a 10046 trace and plotted them logarithmically:

image

We can see in this chart, that the SDD based tablespace suffers from a small “spike” of high latencies between 600-1000 us (eg .6-1 ms).  These are extremely high latencies for an SSD !  What could be causing them?  Garbage collection being caused by the almost writes to the temp tablespaces?  There was negliglbe concurrent activity on the system and the table concerned had flash cache disabled so for now that is my #1 theory. 

For that matter, why are the HDD reads times so low?  An average disk read latency of 500 us for a spinning disk is unreasonably low, is the storage cell somehow buffering temporary tablespace IO?  

As always I’m wondering if there’s someone with more expertise in Exadata internals who could shed some light on all of this!

Monday
Dec302013

Can the Exadata Smart Flash Cache slow smart scans?

I’ve been doing some work on the Exadata Smart Flash Cache recently and came across a situation in which setting CELL_FLASH_CACHE to KEEP will significantly slow down smart scans on a table.

If we create a table with default settings, then the Exadata Smart Flash Cache (ESFC) will not be involved in smart scans, since by default only small IOs get cached.  If we want the ESFC to be involved, we need to set the CELL_FLASH_CACHE to KEEP.  Of course, we don’t expect immediate improvements, since we expect that the next smart scan will need to populate the cache before subsequent scans can benefit. 

HOWEVER, what I’m seeing in practice is that the next smart scan following an ALTER TABLE … STORAGE(CELL_FLASH_CACHE KEEP) is significantly degraded, while subsequent scans get a performance boost.  Here’s an example of what I observe:

image

 

The big increase in CELL IO time is in an increase in both the number and latency of cell smart table scans.  The wait stats for the first scan with a default setting looked like this:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  gc cr disk read                                 1        0.00          0.00
  cell single block physical read                 2        0.01          0.01
  row cache lock                                  2        0.00          0.00
  gc cr grant 2-way                               1        0.00          0.00
  SQL*Net message to client                    1021        0.00          0.00
  reliable message                                1        0.00          0.00
  enq: KO - fast object checkpoint                2        0.00          0.00
  cell smart table scan                        9322        0.14          7.60
  SQL*Net message from client                  1021        0.00          0.02

For the first scan with KEEP cache it looked like this:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    1021        0.00          0.00
  reliable message                                1        0.00          0.00
  enq: KO - fast object checkpoint                2        0.00          0.00
  cell smart table scan                       14904        1.21         33.37
  SQL*Net message from client                  1021        0.00          0.02

Looking at the raw trace file didn’t help – it just shows a bunch of lines like this, with only a small number (3 in this case) of unique cellhash values… I couldn’t see a pattern:

WAIT #… : nam='cell smart table scan' ela= 678 cellhash#=398250101 p2=0 p3=0 obj#=139207 tim= …

I’m at a loss to understand why there would be such a high penalty for the initial smart scan with CELL_FLASH_CACHE KEEP setting.  You expect some overhead from constructing and storing the result set blocks in the cache, but an IO penalty of 200=300% seems way too high.   Anybody seen anything like this or have a clear explanation?

Test script is here, and formatted tkprof here

Monday
Sep172012

Exadata Smart Flash Logging–Outliers

In my last post, I looked at the effect of the Exadata smart flash logging.  Overall,  there seemed to be a slight negative effect on median redo log sync times.  This chart (slightly different from the last post because of different load and configuration of the system), shows how there’s a “hump” of redo log syncs that take slightly longer when the flash logging is enabled:

image

But of course, the flash logging feature was designed to improve performance not of the “average” redo log sync, but of the “outliers”. 

In my tests, I had 40 concurrent processes writing redo as fast as they could.  Occasionally this would result in some really long wait times.  For instance, in this trace you see an outlier of 291,780 microseconds (the biggest outlier in my tests BTW) within an otherwise unremarkable set of waits:

WAIT #47124064145648: nam='log file sync' ela= 1043 buffer#=101808 sync scn=1266588527 p3=0 obj#=-1 tim=1347583167588250
WAIT #47124064145648: nam='log file sync' ela= 2394 buffer#=130714 sync scn=1266588560 p3=0 obj#=-1 tim=1347583167590888
WAIT #47124064145648: nam='log file sync' ela= 932 buffer#=101989 sync scn=1266588598 p3=0 obj#=-1 tim=1347583167592057
WAIT #47124064145648: nam='log file sync' ela= 291780 buffer#=102074 sync scn=1266588637 p3=0 obj#=-1 tim=1347583167884090
WAIT #47124064145648: nam='log file sync' ela= 671 buffer#=102196 sync scn=1266588697 p3=0 obj#=-1 tim=1347583167885294
WAIT #47124064145648: nam='log file sync' ela= 957 buffer#=102294 sync scn=1266588730 p3=0 obj#=-1 tim=1347583167886575

To see if the flash logging feature was successful in removing these outliers, I extracted the top 10,000 waits from each of the roughly 8,000,000 waits I recorded in each category.  Here’s a plot (non-logarithmic) of those waits:

image

So – the flash log feature was effective in eliminating or at least reducing very extreme outlying redo log sync times.    Most redo log sync operations will experience no improvement or maybe even a slight degradation. But for the small number of log syncs that would have experienced a really excessive delay, the feature works as advertised – it reduces the chance of really excessive log file syncs. 

In my opinion, this effect doesn't imply that the flash can process a redo log write faster than the magnetic disks - in fact probably the opposite is true.  But given two desitinations to choose from, we avoid really long delays that occur when one of the destinations only is overloaded. 

Thursday
Aug092012

Exadata smart flash logging

Exadata storage software 11.2.2.4 introduced the Smart flash logging feature.  The intent of this is to reduce overall redo log sync times - especially outliers - by allowing the exadata flash storage to serve as a secondary destination for redo log writes.  During a redo log sync, Oracle will write to the disk and flash simultaneously and allow the redo log sync operation to complete when the first device completes. 

Jason Arneil reports some initial observations here, and Luis Moreno Campos summarized it here.

I’ve reported in the past on using SSD for redo including on Exadata and generally I’ve found that SSD is a poor fit for redo log style sequential write IO.  But this architecture should at least do now harm and on the assumption that the SSD will at least occasionally complete faster than a spinning disk I tried it out. 

My approach involved the same workload I’ve used in similar tests.  I ran 20 concurrent processes each of which performed 200,000 updates and commits – a total of 4,000,000 redo log sync operations.  I captured every redo log sync wait from 10046 traces and loaded them in R for analysis.

I turned flash logging on or off by using an ALTER IORMPLAN command like this (my DB is called SPOT):

ALTER IORMPLAN dbplan=((name='SPOT', flashLog=$1),(name=other,flashlog=on))'

And I ran “list metriccurrent where objectType='FLASHLOG'” before and after each run so I could be sure that flash logging was on or off.

When flash logging was on, I saw data like this:

Before:

     FL_DISK_FIRST                     FLASHLOG     32,669,310 IO requests
     FL_FLASH_FIRST                    FLASHLOG     7,318,741 IO requests
     FL_PREVENTED_OUTLIERS             FLASHLOG     774,146 IO requests

After:

      FL_DISK_FIRST                     FLASHLOG     33,201,462 IO requests
     FL_FLASH_FIRST                    FLASHLOG     7,337,931 IO requests
     FL_PREVENTED_OUTLIERS             FLASHLOG     774,146 IO requests

 

So for this particular cell the flash disk “won” only 3.8% of times (7,337,931-7,318,741)*100/(7,337,931-7,318,741+33,201,462-32,669,310) and prevented no “outliers”.  Outliers are defined as being redo log syncs that would have taken longer than 500 ms to complete. 

Looking at my 4 million redo log sync times,  I saw that the average and median times where statistically significantly higher when the smart flash logging was involved:

> summary(flashon.data$synctime_us) #Smart flash logging ON
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    1.0   452.0   500.0   542.4   567.0  3999.0
> summary(flashoff.data$synctime_us) #Smart flash logging OFF
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   29.0   435.0   481.0   508.7   535.0  3998.0
> t.test(flashon.data$synctime_us,flashoff.data$synctime_us,paired=FALSE)

    Welch Two Sample t-test

data:  flashon.data$synctime_us and flashoff.data$synctime_us
t = 263.2139, df = 7977922, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
33.43124 33.93285
sample estimates:
mean of x mean of y
542.3583  508.6763

Plotting the distribution of redo log sync times we can pretty easily see that there’s actually a small “hump” in times when flash logging is on (note logarithmic scale):

image

This is of course the exact opposite of what we expect, and I checked my data very carefully to make sure that I had not somehow switched samples.  And I repeated the test many times and always saw the same pattern.  

It may be that there is a slight overhead to running the race between disk and flash, and that that overhead makes redo log sync times slightly higher.  That overhead may become more negligible on a busy system.  But for now I personally can’t confirm that smart flash logging provides the intended optimization and in fact I observed a small but statistically significant and noticeable degradation in redo log sync times when it is enabled.