Tuesday, March 17, 2009

A thousand and one reasons for a generic error message.

I've spent a couple of hours trying to troubleshoot a clustered SQL Server 2008 installation. All I know is that it throws this error message after the installation process and does not give me any clue at all

The cluster resource ‘SQL Server (MSSQLSERVER)’ could not be brought online. Error: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0×8007139F)

Now, this might look like a dependency issue not working correctly but when I checked the Failover Cluster Management console on Windows Server 2008, all the dependencies are online and working as expected. As always, I started searching the Internet for related errors and couldn't find anything really specific except for the same thing - dependency issue. Now, here's what I found out. Since all of the dependencies - disks, MSDTC, IP and virtual server name - are all online, maybe it doesn't have anything to do with them after all. So the first thing I did was to do a PING test to the virtual server name for my clustered SQL Server instance and guess what I found out - there is another IP registered on the DNS server with the same FQDN (maybe a previous installation that wasn't cleaned up properly). I logged in to the DNS server and updated the IP address of my clustered SQL Server instance, ran ipconfig /flushdns on the node on which I am logged in and started the service in Failover Cluster Management. It worked! It just tells you that you should think outside of the box every now and then. It really pays to have that background in network and systems infrastructure every once in a while.

Sunday, March 15, 2009

Want to free up disk space on your Windows Server 2008 system partition?

Here's one idea I got from this blog post. It came from KB 920730: How to disable and re-enable hibernation on a computer that is running Windows Vista. It certainly was not Windows Vista, but who would want to let their servers hibernate? Besides, Windows Server 2008 is listed on the Applies To section of the KB article. I've decided to give it a shot. Run this command on your command-prompt with administrative privileges

powercfg.exe /hibernate off

Now, if you happen to take a look at your system partition and show all the hidden files, you would see a hyberfil.sys file, a hidden system file located in the root folder of the system partition. The Windows Kernel Power Manager reserves this file when you install Microsoft Windows. The size of this file is approximately equal to the amount of RAM installed on the computer. Since we dont need hibernation on servers, you can just remove this file using the command above. Sometimes I wonder why everything is on the system partition when you don't even need them

"The current SKU is invalid." error? Didn't I pay for my license?

I've done quite a few SQL Server 2008 cluster installations before which is why this struck me as a surprise. I was trying to rebuild one of my test mahines when I hit upon this error message while adding the second node on a two-node SQL Server 2008 cluster on a Windows Server 2008.

The current SKU is invalid.

After a quick search on the Internet, I found out that this is a bug (my first time to get bitten by it). There is currently a Microsoft Connect item regarding this and it mentions having Cumulative Update 1 applied should fix this issue but I wouldn't want to apply a hotfix for something like this as I normally do that after the entire installation is complete. A few more hits on Google directed me to a forum post that mentions about deleting the DefaultSetup.ini file from the installation media. Just make sure you copy the installation key from this file before deleting it or simply do what I did - move the file some place else. What I usually do is copy everything on a local disk for the installation to be a lot faster. After going thru the Add node to a SQL Server failover cluster option, I manually entered the installation key and it the installation completed successfully. I wonder why I never got this error during my previous installations