Guy Harrison

Monday

Sep172012

Exadata Smart Flash Logging–Outliers

Monday, September 17, 2012 at 10:37AM

In my last post, I looked at the effect of the Exadata smart flash logging. Overall, there seemed to be a slight negative effect on median redo log sync times. This chart (slightly different from the last post because of different load and configuration of the system), shows how there’s a “hump” of redo log syncs that take slightly longer when the flash logging is enabled:

But of course, the flash logging feature was designed to improve performance not of the “average” redo log sync, but of the “outliers”.

In my tests, I had 40 concurrent processes writing redo as fast as they could. Occasionally this would result in some really long wait times. For instance, in this trace you see an outlier of 291,780 microseconds (the biggest outlier in my tests BTW) within an otherwise unremarkable set of waits:

WAIT #47124064145648: nam='log file sync' ela= 1043 buffer#=101808 sync scn=1266588527 p3=0 obj#=-1 tim=1347583167588250
WAIT #47124064145648: nam='log file sync' ela= 2394 buffer#=130714 sync scn=1266588560 p3=0 obj#=-1 tim=1347583167590888
WAIT #47124064145648: nam='log file sync' ela= 932 buffer#=101989 sync scn=1266588598 p3=0 obj#=-1 tim=1347583167592057
WAIT #47124064145648: nam='log file sync' ela= 291780 buffer#=102074 sync scn=1266588637 p3=0 obj#=-1 tim=1347583167884090
WAIT #47124064145648: nam='log file sync' ela= 671 buffer#=102196 sync scn=1266588697 p3=0 obj#=-1 tim=1347583167885294
WAIT #47124064145648: nam='log file sync' ela= 957 buffer#=102294 sync scn=1266588730 p3=0 obj#=-1 tim=1347583167886575

To see if the flash logging feature was successful in removing these outliers, I extracted the top 10,000 waits from each of the roughly 8,000,000 waits I recorded in each category. Here’s a plot (non-logarithmic) of those waits:

So – the flash log feature was effective in eliminating or at least reducing very extreme outlying redo log sync times. Most redo log sync operations will experience no improvement or maybe even a slight degradation. But for the small number of log syncs that would have experienced a really excessive delay, the feature works as advertised – it reduces the chance of really excessive log file syncs.

In my opinion, this effect doesn't imply that the flash can process a redo log write faster than the magnetic disks - in fact probably the opposite is true. But given two desitinations to choose from, we avoid really long delays that occur when one of the destinations only is overloaded.

Guy Harrison |

2 Comments |

tagged

Exadata,

Oracle,

ssd in

Oracle

Reader Comments (2)

Great post. I'd be happy to see more details on the test you ran, especially what you did to generate redo and cause the long sync times.

And yeah, it would be great if Oracle added the "alternative redo destinations" option to any Oracle database, not just Exadata and not just flash.

September 18, 2012 |

Gwen Shapira

G'day Guy, It sounds like your benchmark here (from the previous post at least) is for a system that is solidly limited by redo throughput. Does that result in queued up I/O's to the redo devices ? I wonder whether flash delivers a better comparative result when the system is limited by read I/O's on the rest of the database, and redo is under-saturated (say about 50% of capacity). I'd think that could give flash the edge of being always ready, without disk positioning latencies.

October 12, 2012 |

David Penington

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Exadata Smart Flash Logging–Outliers

Reader Comments (2)

Post a New Comment

Link an External Response