Thursday, April 21, 2011

Introduction Time

I am an IT team member at a small software company. I have been working in the computer field in one manner or another for approximately 18 years, with experience from small mom-and-pop shops to international corporations. Pertinent to this particular blog, I have had experience with SANs for around 5 years total. Most of these have been FiberChannel SANs, simply because of the historical bandwidth gains over competing Ethernet.

So, that's my background. This is my environment. On the production side of the house, I have a blade center with six Windows 2008 R2 servers. Two of them are domain controllers in a sub-domain of my HQ environment, and the remaining four servers run SQL Server 2008 Standard. These four servers are grouped into two Windows clusters using the MS Clustering services built into Windows 2008 R2 and SQL Server 2008 Standard. Cluster one contains four SQL instances, Instance 1, Instance 2, Instance 5, and Instance 6. These instances run two-up on the two physical servers, which act as an active-active failover pair. The second cluster runs three instances- Instance 3, Instance 4, and a general purpose instance. One physical server in cluster group two runs two instances, and these servers also act as active-active failover partners.

My individual SQL instances run anywhere from six to 100+ databases, one for each of our customers. We group these databases by customer size and activity, so our six largest customers are grouped together, and our 100 smallest customers are grouped together. But that doesn't matter much in this context.

Our SAN connects to these physical machines through a Dell blade center interconnect using a pair of Brocade FC switches. Each server has two FC ports for redundancy, one connected to each of the switches. This gives us path resiliency to our SAN from the server.

Finally, there is the SAN itself. In our production environment we have a NetApp FAS 2050 with two filer heads, 20 SAS disks, and two FC ports per head. The SAN's performance is difficult to comment on, because NetApp does not publish IOPS numbers. They claim that this is because of their improved technology using the WAFL filesystem, which effectively makes all writes sequential. From what I've found during internal benchmarking, it's extremely easy to overwhelm the NVRAM backing the WAFL operations, and even easier to overwhelm the RAID-DP protection scheme implemented. When compared to a RAID-10 SAN, RAID-DP falls flat on its face performance wise, but does offer a cost benefit.

In future posts on this blog, I will break down the technology behind the NetApp SANs and describe my horrible experience with NetApp as a company. I will keep things honest and as un-biased as I possibly can. But I will not be gentle. NetApp is, in my opinion, the WORST option for SAN technology on the market.