Friday, May 27, 2016

Windows Convenience Rollup for Windows Server 2008 R2 and vmxnet3 on VMware ESXi = big trouble, lost network configuration, revert to DHCP

I'm a little hesitant to write this blog post as I don't yet have a complete resolution.  But this caused me big headaches and I found no one with the issue, so I thought posting could help others.

Microsoft releases hotfixes on the second Tuesday of each month.  I run a patch schedule where I patch test systems and low critical servers on the third Tuesday of each month, then more critical systems on the fourth Tuesday of each month and the first Tuesday of the following month.  This routine has worked well for me, and in my mind, gives me one to two weeks to avoid any patch headaches if a bad patch is released.  Granted, bad patches aren't very common, but not impossible either.  

May 2016, my patching on the third Tuesday went without issue.  Unfortunately, that trend did not continue.

As I patched on the fourth Tuesday of the month (to about 30 servers), I was horrified to discover TCP/IP configuration was disappearring from my VMware virtual Windows Server 2008 R2 servers upon reboot.  They would be reverted to DHCP, with their static configuration missing.  I was having to use the console to log on and reconfigure TCP/IP.  As I did so, Microsoft would prompt me: "The IP address <ipaddress> you have entered for this network adapter is already assigned to another adapter (<nic>) which is no longer present in the computer.  If the same address is assigned to both adapters and they both become active, only one of them will use this address.  This may result in incorrect system configuration."  This led me to presume that, somehow, my vmxnet3 virtual network adapters had been deleted and recreated.  How did this happen?

As I began investigating, I was stymied by the fact that I hadn't had this issue a week earlier.  When I reviewed the servers I had patched a week prior, I noticed they were missing a few patches.  Testing quickly helped me to identify which patch was causing me this headache: 

Convenience rollup update for Windows 7 SP1 and Windows Server 2008 R2 SP1 KB3125574.


My ESXi servers are at ESXi 5.5 update 3a.  My guests are running the latest version of tools to come with that version of ESXi.  

So at this point, I wanted to test the situation by building a fresh Windows Server 2008 R2.  I had to apply a prerequisite patch to KB3125574, which is April 2015 servicing stack update for Windows 7 and Windows Server 2008 R2 (KB3020369).  Then I could apply KB3125574.  I was a bit disappointed to find this did not reproduce my loss of TCP/IP information.  So at this point, my Windows Server 2008 R2 template that has been used to build 80 servers seems to have the bug.

I started with a ticket open to VMware support.  I got a capable support agent.  He asked if the Windows guest event logs yielded any relevant information.  They did not.  He had me collect logs from the ESXi host, and he determined the MAC address of my vmxnet3 did not change throughout this situation.  This led him to conclude that the network adapter was not deleted and recreated, but that the TCP/IP stack was being created new.  VMware advised me to take the issue to Microsoft (offering to help explain and offer their findings).

So, I'm currently working through the process via a ticket to Microsoft.  We're collecting log data after an impacted guest virtual server at the moment.  I'll be sure to update this post with more once I have it.  

Update 5/31/2016
My open ticket with Microsoft has progressed.  KB3125574 has been updated as follows.  So, basically be cautious with this rollup on vmxnet3 adapters!


Known issue in this convenience rollup
·       Known issue 1
Symptoms
After you install this rollup on VMWare virtual machines (VM), a new Ethernet vNIC that has default settings may replace the existing vNIC and, therefore, cause network issues. Any custom settings that were set on the previous vNIC are persisted in the registry but are not applied to the new vNIC. 


Resolution
To resolve this issue, uninstall the convenience rollup.

Status
Microsoft is researching this issue, and will work together with VMWare to determine the appropriate resolution. We will post more information in this article when the information becomes available.


Update 6/2/2016
VMware released a blog post yesterday on the matter:
http://blogs.vmware.com/apps/2016/06/rush-post-microsoft-convenience-update-and-vmware-vmxnet3-incompatibilities.html

VMware is aware of this issue and we are actively investigating the root causes and possible fixes. While this effort progresses, VMware is advising customers to delay applying the Microsoft “Convenience Update” to any virtual machine that uses the VMXNet3 vNIC type.