The company where I work has been using virtualization in development, QA, testing, etc., for many years now. We have not even attempted virtualization in production. Things change. But, before we did it, I thought I’d bounce the question out to the people who know, What are some things to watch for in using virtual servers in a production environment.
Flat out, I asked people over Twitter. The overall message was extremely positive. No one suggested it was a bad idea. A few thought that some high volume servers are better served by physical boxes. Volume in this case could be read as high amounts of reads or high amount of writes, both were suggested as reasons to avoid virtualization. The cause for this is clear, from each individual; the number one problem was around I/O.
I put out a general call for information and received a pretty good set of responses. I’m not expert enough in all this technology to put together a single summary, so I’ll more or less present the information as it was presented to me. When all the information was bounding past, a few people asked me to aggregate it, so here it is.
What to watch for
According to Allen Kinsel (blog|twitter), a DBA for a multi-line insurance company who manages Oracle, Sybase and SQL Server and is very experienced with virtualization in support of all his products, I/O can be more than double from a virtual machine. With this in mind, Mr. Kinsel and Brent Ozar (blog|twitter), the second non-Microsoft employee to be certified as a SQL Server Certified Master (MCM) in North America (I think that’s the claim), both recommend avoiding the use of raw LUN’s unless you experience severe I/O binding issues. Brent and Allen both recommend you get limited VSphere access in order to closely monitor I/O through the VMWare interface because the normal I/O measurements would not be accurate (Allen suggests buying the VM admin some beer, probably excellent advice). Brent goes on to outline a few other specific points. Set the minimum memory reservation on the VM to avoid the balloon driver telling SQL it’s out of memory. Brent suggests that you use as few virtual CPUs as possible. If the physical box averages 25% CPU load across four physical cores, then only add 2 virtual cores. Brent also suggests using IOMeter or SQLIO to test the configuration. If you’re maxing at 100-125mb/sec then you’re looking at an I/O bottleneck and will need to make adjustments to the system. Brent says that with an fiber channel SAN such as we use that we should be fine with I/O as long as we’re not using physical boxes in active/active multipathing and that we have more than 2 HBAs per physical box.
- Disk I/O
- CPU Queuing on the host
- Memory over commit
- Disk I/O over commit
- CPU over commit
Several people, rather than send me stories, or in addition to the stories, simply pointed out documentation of which you should take advantage. First, and most important of these from the stand point of SQL Server and virtualization is that the Professionall Association of SQL Server users (PASS) has a special interest group dedicated to virtualization: http://virtualization.sqlpass.org
Disk Performance Information:
San Multipathing (introductory): http://www.brentozar.com/archive/2009/05/san-multipathing-part-1-waht-are-paths/
SAN Performance Tuning with SQLIO
Virtualization and SAN Basics
How Microsoft IT Reduces Operating Expenses using Virtualization http://blogs.msdn.com/b/nickmac/archive/2010/05/28/how-microsoft-it-reduces-operating-expenses-using-virtualization/
SQL Server Performance on Vmware vSphere: http://blogs.vmware.com/performance/2009/06/sql-server-performance-on-vspehere.html
That’s it. I won’t pretend that I understood everything. I’m passing it along in a semi-structured format as a public service. I also received some information from the MVP back channels, but I can’t publish that without permission. Suffice to say, while extremely useful, none of it changed the overall picture as represented by the information listed above.