December 30, 2009
Thoughts on Iomega IX4-200d performance tests

There’s been an excellent blog post overnight on the performance of the Iomega IX4-200d disk array, one of the cheapest (if not the cheapest) VMware certified iSCSI capable disk arrays available.

I’m a big fan of the Iomega IX4-200d and I’ve seem them used to good effect in various situations, so I was interested to see what happens when you push it to the edge of performance with the iSCSI functionality.

Executive Summary - The IX4-200d is still an excellent NAS device for SMB’s, but these tests suggest that when the workloads are highly random and the box is pushed to the limit, rather than handling the situation gracefully it seems to slow down to a crawl. The problem may be configuration, iSCSI, RAID5 or firmware related, we won’t be able to tell without more tests.

After reading through the post, I had a few questions about how close the IX4-200d was running to the limit of a 4 disk SATA array so went off to figure them out, using the figures from Gabes Virtual World post and this recent Yellow Bricks post of RAID impact on disk IOs which saved me from any hard maths.

Gabe helpfully listed out the disks used (Seagate Barracuda 7200.11 1TB 7200RPM drives ST3100520AS), that write cache was enabled, the server is connected via iSCSI, and all 4 disks were in a RAID5 array.

I’ve taken a quick look on the Seagate site, and while they don’t list that model number, the Barracuda 7200.11 is listed in general, and I’d expect around 75 IOPS per disk based on their own specifications, which is fairly typical for a 7200 RPM SATA drive. Update - Gabe’s let me know that the model number was wrong, the correct one is ST3100520AS which is a 5400 RPM drive, so 50 IOPS is more likely).

I had 2 questions about the IX4-200d performance - is the caching working, and is RAID5 impacting performance of the box to such an extent that you’d only want to run in RAID10?

Gabe ran 4 initial IOmeter tests, which gave me the bulk of the information I wanted.

Test 001a covers 100% sequential read access of the drives, in theory telling us how fast the array can possibly run. The result of 55MB/sec isn’t great, but IOPS of 1761 is extremely high - given that the drives themselves can only deliver around 75-100 IOPS per second, 1761 is obviously a sign that the read cache is doing it’s job. As I say though, 55MB/sec isn’t great, a single Seagate Barracude 7200.11 would be expected to return more than that when plugged into a drive, indicating there’s some kind of limiting factor outside the disks, either the iSCSI implementation or something else, possibly network related like a slow switch being used.

Test 001b is 65% read, 35% write, some sequential some random, or the “real-life” test. The MB/sec result falls through the floor here, down to just 0.69MB/sec indicating something is up - either the write cache isn’t turned on, isn’t working, or the sheer load of IO’s being generated by IOmeter is causing the box to essentially collapse - I’d be interested to see this test re-run with the volume of IOPS ramping up slowly overtime so we can see whether this is the case. Using the figures from Yellow Brick’s RAID overhead post, 89 IOPS at 35% write turns into around 60 physical read IOPS on the disks, and 100 write IOPS because of the RAID5 overhead. 25 writes per second per disk isn’t too bad for a SATA drive, but it’s not good either. This result definitely suggests something isn’t working right on the IX4-200d for some workloads.

Test 001c is 50% read, 50% write, but is all sequential unlike Test 001b, so this should clarify is the issue is write performance, overloading of IOPS, or random vs sequential workloads causing the slow down. The result of 22MB/sec and 705 IOPS is massively improved over test 001b, which does suggest it’s the “random” workload that causes the IX4-200d to slow right down. The caching obviously works much better for sequential access, which isn’t unexpected, though the impact of it is a little.  705 IOPS is again definitely higher than I’d expect the 4 SATA drives to return, so the caching is working well. 22MB/sec for test 001c compared to 55MB/sec for test 001a do imply that sequential writes happen at a much lower speed than reads (which Gabe does cover in a later test, the “Super ATTO Clone pattern”).

Test 001d is the final IOmeter test, this time 70% reads, 30% writes, 100% random. Given my earlier comments on test 001b, I’d expect these results to be even worse, and so it seems - 0.5MB/sec and 64 IOPS does suggest that with random workloads the IX4-200d simply isn’t working, the average IO response time rises to 913ms and the maximum IO response time hits 12127ms. These figures simply aren’t workable, and suggests there’s something up with the IX4-200d under high volume random workloads - high volume sequential loads like test 001c have produced maximum response times of 252ms for higher write performance levels.

To skip a couple of tests in Gabe’s testing, we finally come to the “Super ATTO Clone pattern”, which attempts to discover the maximum performance achievable by a disk, by varying block sizes while performing reads and writes. The optimal figures produced are 41MB/sec read and 9.7MB/sec write at high (64K> block sizes), but the 8K block size results of 34MB/sec read and 9.2MB/sec write are very respectable, and what I’d expect the IX4-200d to be delivering.

In conclusion, to me it seems that they’re something broken with the IX4-200d in iSCSI mode with RAID5 and highly random workloads. Gabe is going to re-run his tests in NFS mode and see what difference that makes, but I’d also like to see the same tests run in RAID10 mode to see if it’s RAID5 that’s causing the issue - with 2TB drives available, RAID10 would still give you 4TB of usable disk space on the IX4-200d.

The Iomega IX4-200d is still an excellent NAS device, but these tests have made me reconsider where it could be used. It might be that NFS or RAID10 works much better, but otherwise it suggests you’re probably best not using the IX4-200d for highly random workloads.

Update 31/12/2009 - over at blog.storming there’s a follow-up post running similar benchmarks with SSDs instead of SATA drives with more interesting results

December 29, 2009
Red Hat Virtualization (RHEV-H) price and feature comparison

I’ve been putting together a very rough and ready comparison of the price and listed functionality of Redhat’s new RHEV-H virtualization platform, based on KVM with a small footprint version of Redhat’s enterprise Linux system, all wrapped up with a Windows-based management client.

I say “listed functionality” because Red Hat are the only x86 virtualization platform developers that I can think of that don’t even let you quickly download a version of their software, slightly ironic given that they’re an open-source developer and their competitors VMware, Microsoft and Citrix are all historically closed-source companies, though Citrix have open-sourced their base XenServer virtualization system.

Assuming I can get a trial version of RHEV-H and it’s management client, I’ll write a new post giving you my experiences with it in comparison to VMware vSphere.

On paper, RHEV-H is a pretty functional product, supporting:

• High availability - failover between physical servers
• Live migration - online movement of VM’s between physical hosts without interruption
• System scheduler - dynamic live migration between physical hosts based on physical resource availability
• Maintenance manager
• Image management
• Monitoring and reporting

These are the major components of a virtualization platform, indeed live migration and the system scheduler are high-end features on the other virtualization platforms, so for Red Hat to include in it’s “one-size-fits-all” package is a nice addition.

The major player in the virtualization arena is without a doubt VMware, and their vSphere Advanced product will deliver the functionality that pretty much any company would want, though the have an “Enterprise Plus” option which adds even more for larger corporations.

VMware vSphere Advanced includes:

  • VMware ESXi or VMware ESX (deployment-time choice)
  • VMware vStorage APIs / VMware Consolidated Backup (VCB)
  • VMware Update Manager
  • VMware High Availability (HA)
  • VMware vStorage Thin Provisioning
  • VMware VMotion™
  • VMware Hot Add
  • VMware Fault Tolerance
  • VMware Data Recovery
  • VMware vShield Zones

A lot of that functionality, especially the Fault Tolerance, vShield Zones and vStorage APIs simply aren’t matched in any other virtualisation platform right now, whatever the price. However, the vSphere Standard product misses out the VMotion and Fault Tolerance functionality along with thin-provisioning and data recovery features, which means that while it’s still an excellent product, it does mean more management overhead in the event of needing to arrange physical server downtime, etc.

Now to the prices, I’ve put together the list prices of RHEV-H and VMware vSphere Standard and Advanced, and put them below in a table and also a sample configuration based on 1 management server and 5 physical hosts, each with 2 sockets.

Because Tumblr doesn’t seem to let you embed a table, I’ve had to put the table as an image, sorry about that.

As you can see, RHEV-H is the cheapest software option of the 3, though the 3 year cost-benefit compared to vSphere Standard aren’t huge, especially when 24x7 support is included. vSphere Advanced costs significantly more, but delivers a lot more too, though it could be more than your own company needs.

Below are the full costs I’ve used to calculate the above results, please let me know if you think I’ve got anything wrong or missed anything out.

The prices above were taken from the VMware online store on 29th December 2009, and the Red Hat Virtualization Cost PDF, again on the 29th December 2009.

Overall, it looks like the pricing of Red Hat’s RHEV-H system makes it worth the effort of aquiring it and giving it a solid shakedown, but it’s not going to force VMware into radically changing their own pricing structure.

vSphere Advanced is streets ahead in terms of functionality, and the wide-spread adoption of VMware products in general means vSphere Standard may lack some of the functionality of RHEV-H but makes up for it in other areas, especially around the management and backup+restore side of virtualisation, where RHEV-H has a long way to go to catch up.

December 11, 2009
Getting Gluster working with 2-node replication on CentOS

Gluster is a fantastic open-source clustering filesystem, allowing you to convert low-cost Linux servers into a single highly available storage array.

The project has recently launched the “Gluster Storage Platform”, which integrates the Gluster filesystem with an operating system and management layer, but if you want to add Gluster functionality to your existing servers without turning them into dedicated storage appliances, the documentation is a bit lacking.

In an attempt to help anyone else out there to get Gluster up and running replicating a directory between 2 servers in Gluster’s “RAID 1” mode.

First of all, download the latest version of Gluster 3 from their FTP site, I downloaded 3.0.0-1, you’ll need the following files assuming you’re running CentOS:

glusterfs-client-3.0.0-1.x86_64.rpm

glusterfs-common-3.0.0-1.x86_64.rpm

glusterfs-server-3.0.0-1.x86_64.rpm

Once you’ve downloaded the 3 files to somewhere on your first node, run:

yum install libibverbs
rpm -ivh glusterfs-client-3.0.0-1.x86_64.rpm glusterfs-common-3.0.0-1.x86_64.rpm glusterfs-server-3.0.0-1.x86_64.rpm

To install the Gluster software itself. Then copy the RPM files to your second node, and repeat the rpm installation.

You’ll need to decide a directory on each server to act as the datastore, so either pick an existing directory or more likely create a new one - In this case I’ve used “/home/export”. If it doesn’t already exist, run

mkdir /home/export

Assuming you’re using 2 nodes, next run this command on the first node to produce the Gluster configuration files, replacing the words node1ip and node2ip with the IP addresses or hostnames of the 2 nodes, and /home/export with your directory.

glusterfs-volgen —name store1 —raid 1 node1ip:/home/export node2ip:/home/export

This will create 4 files

booster.fstab

store1-tcp.vol

node1ip-store1-export.vol

node2ip-store1-export.vol

Of these, booster.fstab is used for auto-mounting filesystems after reboots, so isn’t needed yet. Copy the store1-tcp.vol and node2ip-store1-export.vol files to the second node.

On the first node, run

cp node1ip-store1-export.vol /etc/glusterfs/glusterfsd.vol

cp store1-tcp.vol /etc/glusterfs/glusterfs.vol

On the second node, run

cp node2ip-store1-export.vol /etc/glusterfs/glusterfsd.vol

cp store1-tcp.vol /etc/glusterfs/glusterfs.vol

At this point, you should be ready to start the gluster services on both nodes, and mount the filesystems.

You need somewhere on the node to mount the replicated filesystem, In this case we’re using “/mnt/export”

On each node, run

service glusterfsd start
mount -t glusterfs /etc/glusterfs/glusterfs.vol /mnt/export/

You should now have a working Gluster replication service between the 2 nodes, you can test this by running on the first node

echo “Gluster is working” > /mnt/export/fileA

and on the second node run

cat /mnt/export/fileA

Assuming everything is working ok, you’ll see a message of “Gluster if working” on your screen.

If you don’t get that, then take a look in the /var/log/glusterfs/ directory on both nodes to see what’s happening.

One thing I’ve noticed is that gluster’s output logs in /var/log/glusterfs often start with a - at the front, confusing a lot of Unix command line tools - if you refer to them using their full path including /var/log/glusterfs, you’ll have an easier time manipulating them.

Hopefully this will help people out there with the slightly confusing but comprehensive Gluster documentation where I recommend you go for any more in-depth configuration help.

July 15, 2009
Cloud computing price comparison stupidity

Microsoft have announced the pricing for their new Azure cloud-computing platform, and there’s been quite a few articles comparing the pricing to that of Amazon’s AWS cloud computing platform, the largest existing cloud provider.

Most have focussed on Microsoft charging 0.5 cents less per hour for a basic Windows instance than Amazon, 12 cents vs 12.5 cents, and whether they’ve done this to start a price war or simply to appear in-line with the existing suppliers out there.

However, these comparisons are just plain stupid, for one reason alone.

Each one provides a completely different definition of a CPU!

Amazon use the “EC2 Compute Unit”, they say that’s based on:

We use several benchmarks and tests to manage the consistency and predictability of the performance of an EC2 Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor
Microsoft haven’t published a definition of their equivalent CPU definition, but since Amazon haven’t published their exact benchmarks it’s bound to be different.

Again, Google have their App Engine service, where they define their CPU usage as:

CPU time is reported in “seconds,” which is equivalent to the number of CPU cycles that can be performed by a 1.2 GHz Intel x86 processor in that amount of time. The actual number of CPU cycles spent varies greatly depending on conditions internal to App Engine, so this number is adjusted for reporting purposes using this processor as a reference measurement.
The Google measurement is obviously fairly close to the vague Amazon definition of a Compute Unit, but neither of them clearly specify how they actually measure the usage, so any initial comparison is at best vague and at worst completely misleading.

The same is true of Rackspace’s Mosso cloud, and all the other cloud providers out there.

Until a standard CPU unit is defined publicly and agreed between the major suppliers (if that’s even possible), any comparisons between clouds based on a simple “CPU Time” measurement, are simply stupid.

June 30, 2009
Freeing up ESS disks when they are unavailable

Sometimes when an ESS vpath disk has previously been assigned to AIX and is now assigned to Windows, or vice versa, you aren’t able to access the disk even when it all looks correct. The first thing to check is whether the vpath still has a persistant reservation on it.

In AIX this is easy, run ‘lquerypr -vh /dev/vpathXX’ to see if a vpath has a persistant reservation, then ‘lquerypr -ch /dev/vpathXX’ to clear the reservation

June 2, 2009
Wave - the replacement for email?

At the end of last week, Google announced an internally developed project called “Wave”, an attempt to build a new communication platform combining the best of email, instant messaging, and document collaboration.

Wave is a pretty radical project, developed in secrecy at Google’s Australian outpost by some of the same team as first developed Google Maps. Consisting of 3 separate pieces of work, Wave is a protocol, a piece of software, and a platform for running applications.

Starting at the bottom, the protocol for Wave is based on XMPP, allowing for clients and servers to talk to each other, and crucially, for servers to talk to other servers. Without this ability for inter-server communication, this would just be another (albeit shiny) Sharepoint or Wiki implementation with an instant messaging client bolted on. The Wave protocol is openly available in it’s current form on the Google Wave site.

You then have the Wave software, which Google says will largely be open-sourced, allowing companies to run their own local implementations of Wave and control the security and data retention of Wave.

Finally, you have Wave as a platform, consisting of the Wave software and protocol, along with a set of application program interfaces (APIs), which together let 3rd-party applications run on top of a Wave, interacting directly with clients.

When you combine the 3 components of Wave together, you get an extremely powerful proposition, which really could go anywhere.

As a document collaboration and group discussion product, it has as an impressive set of features, and with the open protocol and source code for the server, the possibilities for companies to add Wave functionality to their existing applications like IBM’s Lotus Notes and Microsoft Exchange are huge. Whether they will or not is another question, but if Google integrate Wave into Gmail, then people will start to see the power of collaboration and demand the same functionality in their business applications.

While it replace email? I’m not so sure, at first glance it has everything in place, but the network effects of email are huge - I can email any one of billions of people right now, and if I’ve got something interesting to say, they can read it and reply.

Maybe the big question for Wave is, how do you make a billion people switch from their existing email client to a new service, when most of them consider email “good enough”?

February 9, 2009
Re: Auto-responders in social media are anti-social

Yeah that’s a good point. Maybe I’ll do that…

Re: Auto-responders in social media are anti-social

Since you’re only posting blog posts once a day (if that), why not take the time to manually write a twitter message about what you’ve blogged, it’ll probably look better in the 140 character limit…

January 13, 2009

Microsoft Songsmith Ad (via Duker91)

I can’t quite believe this is real, surely it’s a spoof?

January 12, 2009
Using the Video Filter module with drupal

If you’re installing the very useful Video Filter module for Drupal, you might get confused like I did by it’s non-functional nature.

It turns out that to enable the module, after you have done the normal step to enable a module in the Administer/Modules page, you then have to go to the Administer/Input Formats page at “yoursite.com/admin/settings/filters” then configure the input formats for “Filtered HTML” and “Full HTML”, where you’ll see a new “Video Filter” option. Check this box, and suddenly you’ll be able to embed a Youtube or other supported website just by putting

[video:url]

in your normal drupal page. Magic!

Why this isn’t included in the minimalistic installation and usage instructions I’ve got no idea.