Mike Dhani

6 November, 2008

Picking the right stripe size

Filed under: Computer

Posted in Storage Interconnects & RAID, Advisor - Tom by Tom Treadway

Question to the Storage Advisors, from Anonymous: Been reading your blogs most of the day (just stumbled on this site today)…WOW! Tons of excellent information/suggestions for best practice! I’ll be adding this site to my daily tech read file… One quick question: Is there a rule of thumb concerning RAID-5 Block Stripe Size to file size? Is the any direct performance correlation between Block Strip Size and NTFS cluster size? Thanks!

Thanks, man. I feel the love.

Regarding RAID-5 stripe size, in general the larger the better. In rare cases a smaller stripe size might help, but if you make it too small then performance can plummet. And of course the answer depends on the pattern of access – large or small, sequential or random, read or write, OS queue depth, …

Let’s look at an example of 4KB random reads to an 8-drive RAID-5 with an OS queue depth of 64 concurrent commands. And let’s say that each drive was capable of 100 IOs Per Second (IOPS).

Now let’s say that the stripe size on each drive was “very large” – large enough that each host command fell entirely within a stripe. That would mean that each drive would be servicing host requests at 100 IOPS, for a total of 800 host IOPS for the array. That’s as good as it gets.

But let’s say that the stripe size was 8KB. And let’s say that the 4KB requests from the host are randomly aligned, meaning that they could start on any block boundary. The following picture shows two adjacent 8KB stripes (therefore two adjacent drives) with a 4KB host request placed in all the possible random positions.

RAID-5 Stripe Alignment

Notice that the host command can be placed at 16 different offsets, but 7 of those offsets cause the command to fall on two drives. If each host IO tied up two drives then the IOPS rate would drop from 800 to 400. But since on average only 7 host IOs tie up two drives while the other 9 tie up just one drive the resultant rate would be 625 IOPS. (Simple math left as an exercise to the user.)

At this point someone might wonder why, in the two drive case, we saw no benefit from the two drives loading the data in parallel. Good question. It’s true that two drives will be able to transfer that data twice as fast as one drive. But remember that this is a random IO access pattern. That means that both drives have to seek and rotate to get to that data. That’s access time is about 15ms on average, depending on the drive. In contrast, it will take significantly less than 1ms to actually transfer the data. The time saved in transferring the data from two drives is lost in the noise of how long it takes to get to that data.

So back to the comment about a “very large” stripe size: What I meant was that the stripe was large enough to cause very few of the IOs to fall across two drives. For example, if the stripe size was 256KB, then only 7 out of 512 host commands would degrade performance – a negligent amount.

Now let’s look at writes. If the writes are short and random, then pretty much everything said about reads will apply to writes – with the exception that the IO rate is cut to ¼, or 200 IOPS. I won’t get into the details in this post, but it has to do with a RAID-5 technique called Read/Modify/Write where each host command is converted to two reads and two writes, or four IOs total. Trust me; it’s a RAID-5 thing. But the concept of trying to avoid having host commands cross drive boundaries still applies.

But what if those writes were longer and sequential? In that case, the RMW technique would be replaced with a Full Stripe Write technique, and the extra IOs would be eliminated. (Again, just trust me that that is a good thing.) And how do you make long, sequential writes? An obvious way is to have the host write long, sequential commands. An alternative, which is common with RAID controllers, is to use the controller cache and write-back, or lazy writes, to permit short IOs to hopefully coalesce into longer IOs.

So if you have a RAID cache then you should use a stripe size that is no larger than the cache’s typical write-burst size. And how do you know what the burst size is? You have absolutely no way of knowing. All you can do is hope that your RAID card is tuned to have the cache and stripe size coordinated. That “should” be a good assumption.

I suppose after all that long-winded prose I should get back to your original questions.

Is there a rule of thumb concerning RAID-5 Block Stripe Size to file size? Is the any direct performance correlation between Block Strip Size and NTFS cluster size?

Nope. Sorry, I guess I could have said that first, but it would have made this post less interesting. :-)

How an OS accesses a file is somewhat unrelated to the file size. For example, a multi-GB database file is still accessed in small, e.g., 4KB, chunks. In this case the NTFS filesystem isn’t even used. And when the NTFS filesystem is used, the NTFS cluster size tends to define the minimum access size but not the maximum.

Anyway, I hoped this help explain how stripe size affects performance. The bottom line is typically that the default stripe size is best, and the default is usually a big number, such as 256KB. If you want to play around with reducing stripe size, make sure you do plenty of real-world performance testing.

Enjoy,

TT

Comments »

The URI to TrackBack this entry is: http://idrus.blogsome.com/2008/11/06/picking-the-right-stripe-size/trackback/

No comments yet.

RSS feed for comments on this post.

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>


Get free blog up and running in minutes with Blogsome
Theme designed by Gary Rogers