by Kevin Fogarty

Unappreciated New VMware Feature Designed to Reduce Hardware Headaches

Aug 20, 20083 mins

Dumbing down a group of servers doesn't sound like a good idea, but hiding low-priority new instruction sets on the processor can convert hardware from 'almost compatible' to 'working fine.'

One of the best things about virtual infrastructures is their ability to minimize the inevitable differences among the servers in a typical server farm. Rather than having to buy X number of servers every quarter with identical configurations, component and firmware versions, virtualization allows data-center managers a little more freedom and a little more confidence that almost-identical hardware will perform almost identically.

There’s the same amount of failure in “almost” compatible as there is in “not even close,” though, so Intel, AMD and VMware have all been working on closing the almost gap.

VMware’s most recent contribution is in the deservedly maligned ESX 3.5.0 Update 2, which managed to annoy a huge chunk of the VMware user base by mistakenly deciding its licenses were out of date.

(VMProfessional posted a surprisingly unbitter LOLcat on the bug that makes me think virtualization has really arrived, on the assumption that you’re not really well known in your field until people can use tired Internet memes to make fun of you without having to explain to the reader about either you or the meme.)

Update 2 also contained Enhanced VMotion Compatibility (EVC), which takes advantage of complementary features in recent Intel and AMD server chips that mask the differences between different versions of the same processor, making it possible to move VMs much more easily among the physical hosts in a server farm. (Here’s a link to a VMware white paper that describes EVC; it’s a PDF, so you’ll have to scroll to page 6.)

VMware announced EVC at last year’s VMworld, with what appears to have been insufficient fanfare. It got little attention from the press or in VMware user blogs at the time, and has been discussed relatively little ever since. VMware slipped it into Update 2 with little or no additional notice, though plenty of users have been looking for it.

EVC uses Intel’s Flex Migration and AMD’s AMD-V Extended Migration to hide more advanced features of the newest chips and dumb down all the processors in a cluster to a single, lowest-common-denominator level. It does that by modifying the semantics of the CPUID instruction code so that neither the virtualization software nor the OS nor the applications will cause problems with a function call that’s present on one physical server but not another.

The feature isn’t eliminated, so there’s no damage to the firmware or the servers; the VM software and apps just think the advanced feature isn’t present, so they don’t ask for it. Apps that are written specifically to take advantage of a particular feature can still get to it, if you set things up right, and put the app on the right server.

Masking doesnt fix the potential for complications from almost-compatible chips, and doesn’t eliminate the need to do a close comparison among the chips in the servers you are buying.

It emphasizes the need to compare minor feature enhancements in different version numbers of the processors and chipsets in your servers as well as the BIOS and other firmware, in fact.

So in that facet it’s actually swapping one headache for another. Reducing the potential consequences of having two groups of servers that are almost compatible while not really eliminating the need for the due diligence you’d need to avoid the problem in the first place.

Assuming Update 2 believes your licenses are up to date, though, EVC can further narrow the distance between “almost” compatible and “really” compatible.

It doesnt eliminate that particular headache. But used correctly, it should reduce the number of Motrins involved in getting it fixed.