Friday, December 2, 2011

From Blogger to WordPress

If you're wondering why there weren't much updates on this blog, that was because I started moving the content to WordPress a couple of months back. It was just an experiment on my part, primarily because I wanted to try something new. If you're looking for my technical blog, it's been enjoying it's new home since July of this year. Same goes for my non-technical blog on leadership, management, personal and professional development, etc. Make sure you update your RSS feed aggregators to point to the new locations. I'll see you there.

Sunday, June 19, 2011

Watch Out For SSIS Package Migration/Upgrade Errors

After successfully upgrading a multi-instance SQL Server 2005 Failover Cluster to SQL Server 2008 R2, I got woken up today by a page from a SQL Server Agent job failure. Knowing that the job was created by one of my colleagues, I ignored it for a while since it only runs once a week and is not really that critical since it only performs tasks for reporting purposes. However, I just can't seem to get it off my head that what seems to be a successful upgrade caused some issues on the SQL Server Integration Services (SSIS) components. I checked the SQL Server Agent Job History for more information but only found this bit of information in the error log

The package failed to load due to error 0xC0010014

After a quick search on Google, I found this blog post on the MSDN Blogs site that explains what could be a possible cause for the error. While the blog post defined a different error number from the one I had, I was sure that this was it. Knowing that I had SQL Server 2005 Failover Cluster before, I had a feeling that the DTExec.exe executable file was still hanging around. True enough, I saw the EXE file. And the SQL Server Agent job that failed points to the old version of the DTExec.exe file. I immediately changed the path from within the SQL Server Agent job to point to the new version of the DTExec.exe file. The command which points to the old path looks like this

C:\Program Files (x86)\Microsoft SQL Server\90\DTS\Binn\DTExec.exe

After modifying the path, I now have this

C:\Program Files\Microsoft SQL Server\100\DTS\Binn\DTExec.exe

One thing to look for after upgrading a SQL Server 2005 instance to SQL Server 2008/R2 (whether it is a failover cluster instance or not) is to make sure that your SSIS packages are reconfigured to point to the new path of the DTExec.exe executable and test them accordingly 

On Disaster Recovery and my SQL Rally 2011 Presentation

Yesterday, I saw a Twitter post regarding the speaker evaluation results from SQL Rally 2011 in Orlando, FL last May. I was surprised to see that my session was in the top 3 best sessions of the conference. I dug up the Excel spreadsheet containing my session evaluation results and began to read. I found one comment very fascinating (the only evaluation where I got very low scores) as the response pertains to the speaker’s knowledge of the subject. The comment was: “copy and paste coder.” I’ve been doing this specific presentation for almost 5 years now with a few tweaks every once in a while based on feedback from attendees. Yes, I live and breathe disaster recovery as part of my day-to-day job. However, there are several reasons why I do not type nor write code during my presentations. Here are a few of them:
  1. A presentation is a performance: Many will disagree with me on this, especially experts who believe that to demonstrate their expertise, they should be writing code and doing live demos during a presentation. Whenever I go up the stage to deliver a presentation, I always think about the attendee/audience. My goal is not to display my expertise nor to brag about what I can do that the audience could not. I always remember that my presentations are not about me, but about the audience. Which is why I do a lot of preparation prior to delivery – research, writing an appropriate storyline (you got it right – storyline), selecting the right demos, building test environments, writing demo scripts, rehearsing my presentation, etc. Yes, I rehearse my presentations and I say it out loud. I do the best that I can to make sure that the audience will be entertained, engaged, enlightened, educated and encouraged. If I’m doing a presentation on disaster recovery, I even plan out what type of disaster will I be simulating. Doing this will help me make sure that I don’t go beyond the time limit that was alloted for my session while covering all of the items that I intend to. I’d be very happy if the audience will walk out of my presentation with something that they will do when they get back to their regular routine. I keep in mind what Dr. Nick Morgan, one of America’s top communication theorist and coach, always say:”The only reason to give a speech is to change the world.” So, if you’ll be attending a presentation I’m delivering in the future, I’ll assure you that you won’t be disappointed. 
  2. Presentation time is limited: I hear presenters and speakers apologize for not covering the full content of their presentation. In some cases, you would see them breeze thru their slides as they get to the summary slide. If the presentation was rehearsed and scripted, they would know how long it will take to cover everything in their slide and add or remove as necessary. Copying and pasting code is my way of saying, “I value your time so much that I would rather copy and paste code so that I can move on to more important stuff than let you suffer from every typographical error I would make while typing.” As I said, many won’t agree with me on this but I need to focus on the more important content of the presentation.
  3. Focus on the important: Same as the previous point. Enough said.
But what about disaster recovery? Yes, this is more than just a blog post about improving your presentation skills. The main reason why I copy and paste code, especially when doing a disaster recovery presentation is to prove a point: You want to accomplish your task with the least amount of time and the least amount of effort. This is because every minute you waste is a minute against your recovery point objective (RPO) and recovery time objective (RTO). Imagine having to recover a SQL Server database by applying the latest FULL database backup and a series of LOG backups. The more LOG backups that you need to restore, the longer it will take. Plus, if somebody is behind your back watching every move you make and asking when the database will be back online, you wouldn’t want that to last longer that it possibly can. Remember, in a disaster recovery incident, every second matters. For highly transactional databases that are being used for main line-of-business applications, every minute lost is revenue lost. Having these in mind, you would do everything you can possibly think of to recover the database as fast as you possibly can – even copy and paste code. In fact, I keep a dozen or so scripts in my repository that works as code generators – scripts that generate scripts. One of them is a script that reads thru my backup history stored in the MSDB database and creates a series of RESTORE DATABASE/LOG scripts that end up getting executed so that I don’t have to figure out when the last LOG backup ran and restore the backups in sequence. Would you call this cheating because I copy and paste code? I don’t know about you but I’d call this being creative when the rubber meets the road.

And one more thing, I will be delivering this presentation but a bit more on the non-technical side of things in the upcoming PASS Community Summit 2011 in Seattle, WA on 11-14 October 2011. If you intend to attend, drop by my session so we can talk about it more.

Let me know your thoughts. Do you copy and paste code when recovering a database?

Sunday, May 22, 2011

Ever wonder what DUMP_LOG_COORDINATOR_QUEUE wait is?

If you've used wait statistics to do performance tuning on your SQL Server systems, I doubt that you would ever see this on your top waits (or if even shows up at all). You would rarely find anything on the Internet about this wait type, although, I did find a placeholder from SQLServerpedia for future information. However, if you have been heavily playing around with the SQL Server undocumented function fn_dump_dblog(), you may have seen this wait type. This occurs when you run this function to read your log backups. Janice Lee (a schoolmate of mine from way back home) blogged about using this function as an alternative to restoring to a point in time. While I did not use this function to do a point-in-time restore of a database, I did find it interesting seeing the DUMP_LOG_COORDINATOR_QUEUE wait type. I actually used it to estimate the amount of INSERT, UPDATE and DELETE commands executed on my database. While it is not pretty straightforward as compared to gathering performance counters, it's the only thing I can do especially if you were brought in for a one time consulting engagement and need to work with what you have.

Note that I did not run this in a production server so I have no issues running the undocumented function. Use it at your own risk.

Tuesday, April 19, 2011

Monday, April 4, 2011

bass_player on Microsoft Talk TechNet

It's a privilege for me to be a part of Microsoft Talk TechNet - the Talk radio for IT Pros. This coming Friday, 8-Apr-2011 9AM Pacific Time,  join me as we talk about anything related to managing and administering SQL Server. If you know anybody who's into administering SQL Server, you can pass this information around.

Event ID: 1032483824

Language(s): English.
Product(s): Microsoft SQL Server 2008 R2.
Audience(s): IT Generalist.

For those of you who have missed the event, the recording has been made available for download

Wednesday, March 30, 2011

Building a Scale-Out SQL Server 2008 R2 Reporting Services Farm using Windows Network Load Balancing Part 2

Part 2 of the series is out. The rest will probably take quite some time as I'm still wrapping up a few items on the ASP.NET aspect of SQL Server 2008 R2 Reporting Services. Stay tuned

Friday, March 25, 2011

Building a Scale-Out SQL Server 2008 R2 Reporting Services Farm using Windows Network Load Balancing Part 1

For the past couple of weeks, I've been working on a series of articles on building a scale-out SQL Server 2008 R2 Reporting Services Farm. Part 1 of this series is already available on MSSQLTips.com. This was inspired by the SQL Server 2008 Failover Cluster installation series as I've been searching for available guides online but was frustrated for not finding a lot of information (or maybe I just wasn't looking hard enough).

Enjoy!

Friday, March 18, 2011

Configuring Alternate Access Mapping in SharePoint 2010

It took me quite a while to get back to writing articles for both MSSQLTips and MSSharePointTips. Now, I'm back and the first article to be published this year is all about configuring alternate access mappings on SharePoint 2010. Check out the article here. There's a corresponding PowerShell command to do the same task so watch out for that article as well

Friday, February 11, 2011

Disaster Recovery Is More Than Just Technology Part 3: The Lion, The Switch and The Wardrobe

You were in your favourite bar one Saturday night when, suddenly, you hear your mobile phone ring. You pick up the phone and heard the sound of a screaming voice on the other end (no, it’s not your wife telling you to go home and take out the trash). The background noise is preventing you from understanding what is actually being said. You checked on the phone number that registered on the phone – it’s your manager. You get out of the bar to clearly hear what is being said until you barely hear the last phrase, “the production database is in recovering state for more than an hour now…” And, then, your battery went dead. Sounds familiar?





In a previous blog post, I talked about the different acronyms that come with the term disaster recovery. In this blog post, I’ll talk about key items that we sometimes tend to ignore when creating a disaster recovery strategy – the lion, the “switch” and the wardrobe (I’ve been a fan of the Narnia movie series from which I got the idea). And, yes, I did get a phone call similar to that while I was driving with my family that I had to pull over and guide the other person on the line as they try to recover the database.





The Lion



I call them the lions because they represent people with authority and responsibility over the infrastructure that you are preparing the disaster recovery strategy for. You definitely need to include them in. If you’re the DBA and you’re designing a disaster recovery strategy for the database server, then that lion is you. However, there are cases where the server administrator is not necessarily the DBA. That other lion is the server administrator. Oh, and isn’t your database server connected to the network? Then, you have another lion in the pack – the network administrator. And isn’t that a faulty hard drive that caused the disaster? Do you know who the supplier was for that hardware? Yes, that lion belongs to your pack as well. I can go on and on and include a ton of people in this list – the service provider for your network link, the company that stores your tapes offsite, the junior staff that needs to know what to do in case you’re on vacation, your IT manager who needs to make the tough calls in case the need arise. Make sure you know who the lions are in your pack and how to get in touch with them. Document who is responsible for what because a missing lion in the pack will definitely affect your service level agreement.



The “Switch”



This is intentional. The switch (not the witch) represents the other types of “hardware” that affect your service reliability. I’ve had some discussions in the past with one of my former customers who happened to have high availability built into their database servers. They had their SQL Server instances running on top of Windows Failover Cluster which they designed after upgrading to Windows Server 2008. However, one of their past outages clearly show that Windows Failover Clustering is totally meaningless if you do not consider the other components of the hardware stack. While multiple nodes of the cluster have provided high availability for their cluster, the main culprit for the outage was the shared storage. Their SAN happened to be on a dedicated network that was causing a bit of an issue with routing. To make matters worse, the “switch” on which the SAN is connected to was shutting down unexpectedly due to power outages that might have been caused by improper wiring on the UPS. They focused so much on the availability for the cluster that they didn’t look at the storage and the network to be potential causes of outages.



Also, in my previous life as a data center engineer, we have had an incident where the production server suddenly experienced performance issues. We couldn’t figure out why because even our remote access sessions won’t go thru to allow us to perform troubleshooting. Until one of the heat sensors in the data center went off. The high CPU utilization was caused by overheating. One of the air conditioners shut down, thus, causing drastic temperature increase inside the data center. Those who spend time visiting a data center know that you need to be wearing a thick enough coat to keep yourself warm while working. Air conditioners are used to control the temperature and humidity to help prevent equipment overheating and, potentially, disaster. While fixing a faulty air conditioner won’t happen in less than a day, designing the data center to allow for such incidents should be considered as part of the disaster recovery strategy (we have had to bring in electric fans and portable air conditioners to temporarily prevent the temperature from rising while the air conditioners are being fixed). You also need to know the lion in the pack responsible for the data center management in case you have your servers co-located somewhere. Bottom line is that you need to consider the other types of hardware that affect your service reliability and should be included in your disaster recovery strategy.

The Wardrobe


The wardrobe represents storage of stuff. And stuff could be anything that affects your service reliability. One of my favourite wardrobes as far as disaster recovery is concerned is the runbook. It stores the information for a particular system that can be used by anybody should the need arise. Not too many DBAs or IT professionals like the idea of documentation but, as I’ve heard from a few, it’s a necessary evil. If you need to rebuild the server because of hardware upgrades or, worse, disaster recovery, the runbook will be your guide to have the server rebuilt just as it was before. If you don’t have one, chances are that you won’t be able to rebuild your server with the exact same configuration as before. With a runbook, you can have junior staff go thru the process themselves by simply following it in written format. You can even include processes for recovering databases based on your backup strategy. A common rule of thumb for runbooks is simply this – write it so that even the most junior staff can figure it out. The challenge is keeping the runbook updated with the changes made on the system. However, runbooks are definitely a must for disaster recovery strategies.
 What about backups? Where are you keeping them? Do you have access to your backups? Are the backup tapes labelled intuitively? Are they safe? Are they stored offsite? This type of wardrobe should be documented as well so you will know where to look for your backups when you need them.


How about storage media? Yes, the media for installing your operating system, your database server, your patches, service packs, application software, etc. Have you heard about a legacy application that is only supported on Windows 2000 Server only to find out that the installation media is missing after the server crashed? Create backups of storage media and document them accordingly so that you can be sure they’re there when you need them.

And, have you even considered yourself as a wardrobe? Yes, you’ve got a ton of information on your head that needs to be shared within your team so that you don’t get as much of those emergency phone calls. You can set up a mentoring session with the junior staff, write documentation (or a blog post like this), do an internal presentation on how to perform a test restore of a backup – anything to make sure you’re not the only person on your team who can do the job. Most people don’t like this idea for fear of losing their job. But this is one thing that would make you more invaluable to the organization. For now, I’ll leave this topic for a professional development blog post.



I’m tempted to dive into the technical details of SQL Server disaster recovery and high availability but I realized that in order to really appreciate the technology aspects of disaster recovery, we need to understand the other aspects that affect and influence it. And just to give you a heads up, I’ll be in Orlando, FL in May for the first ever PASS SQLRally where I’ll be talking about Disaster Recovery Techniques for the SQL Server DBA. If you happened to be within the area or are planning to attend, come see me at my presentation

Sunday, January 30, 2011

Check for Free Space on your Drives - LazyAdmin PowerShell version

When tasked to automate a task, I rely on my scripting background to get the job done. Back then, it was all VBScript. Now that more and more customers are moving to Windows Server 2008 servers, I try to convince my customers to use PowerShell for all their scripting needs.

A few weeks back, a friend of mine asked me to write a script that will check the disks on all of their servers and send a report via email.  They have over a hundred or so servers in their data center and it will be a bit cumbersome to log in to each server just to check for free space on their disks. While their network operations center have access to reports generated by their monitoring tools, she didn't. Which is why she opted to do it on her own. I tried to demonstrate why PowerShell was my scripting language of choice for this task by highlighting how this task can be done in a single line of code. Not that it can't be done otherwise but this is an effective way to tell people how easy it is to use Windows PowerShell. Below is the script to do the task (BTW, this can be written in a single line. The blog engine just made it as it is due to space constraints)

Get-WmiObject Win32_Volume -computername "localhost" |

Select-Object __SERVER, Name, @{Name="Size(GB)";Expression={"{0:N1}" -f($_.Capacity/1gb)}},@{Name="FreeSpace(GB)";Expression={"{0:N1}" -f($_.freespace/1gb)}},@{Name="FreeSpacePerCent";Expression={"{0:P0}" -f($_.freespace/$_.capacity)}} |
Where-Object -FilterScript {$_.FreeSpacePerCent -lt 15} |
Sort-Object -property "FreeSpacePerCent"
Format-Table

Now this may seem intimidating at first but let's disect the script to understand what it is doing. The Get-WmiObject cmdlet calls the Win32_Volume class to scan thru a list of disks (I'm using the -computer parameter to highlight an important concept later on.) This includes mountpoints, local drives and USB drives. If you're only concerned about local disks and mountpoints, you can exclude USB drives by filtering via the DriveType attribute of the Win32_Volume class.

Get-WmiObject Win32_Volume -computername "localhost"

Since we are interested in the size of the disk, the free space in GB and in per cent values, we will use the Capacity and FreeSpace attributes of the Win32_Volume class. We do need to perform some calculations to make sure that we get the values we are accustomed with - GB for capacity and per cent in free space value. That's what the calculations are for, noting that capacity and free space are expressed in bytes. The Select-Object cmdlet simply creates a new object by defining a new attribute from the original Win32_Volume class - server hostname, drive or mountpoint name, capacity and free space - named FreeSpacePerCent. This will definitely give you all the disks on a server with the defined attributes.

Select-Object __SERVER, Name, @{Name="Size(GB)";Expression={"{0:N1}" -f($_.Capacity/1gb)}},@{Name="FreeSpace(GB)";Expression={"{0:N1}" -f($_.freespace/1gb)}},@{Name="FreeSpacePerCent";Expression={"{0:P0}" -f($_.freespace/$_.capacity)}}

But we don't want all of the drives. We only want those that have FreeSpacePerCent value less than your allowable threshold. In this example, less than 15 %. So, we use the Where-Object to filter the results.

Where-Object -FilterScript {$_.FreeSpacePerCent -lt 15}

As an administrator, we need to make sure that we address issues that are more critical than others. This is where the Sort-Object cmdlet comes in. We sort the results in order of increasing FreeSpacePerCent so we can immediately address those disks with very little free space left.

Sort-Object -property "FreeSpacePerCent"

Finally, the Format-Table cmdlet is just for aesthetics. The reason is because I will be sending the results of this script as an email attachment.

Now, that wasn't so hard, was it? Understanding what each cmdlet is doing and how you can pipe the results to another cmdlet is the key to maximizing the use of PowerShell. But if you have a hundred or more servers, you wouldn't want to copy the script on all of your servers and run it from there, would you? There are a few ways to accomplish this. One of which is to ue PowerShell Remoting. This will be another topic for a blog post as there is more to it than just simply writing and executing a PowerShell script remotely. What I opted to do here is simpler since I'm assuming that my friend nor I don't have access to Active Directory to create Group Policies to enable PowerShell Remoting on all of the servers. Since the Get-WmiObject cmdlet has a -computer parameter, I can use that to execute queries against remote computers. What I can do is simply read thru a list - probably a list of computers in Active Directory or even as simple as a text file list. I opted for the text file list for the same reason that I didn't go for PowerShell Remoting. So, what I did was to convert the script above to a function that I will call while passing values to the -computer parameter of the Get-WmiObject


function getDiskFreeSpace
{
Get-WmiObject Win32_Volume -computername $args |
Select-Object __SERVER, Name, @{Name="Size(GB)";Expression={"{0:N1}" -f($_.Capacity/1gb)}},@{Name="FreeSpace(GB)";Expression={"{0:N1}" -f($_.freespace/1gb)}},@{Name="FreeSpacePerCent";Expression={"{0:P0}" -f($_.freespace/$_.capacity)}} |
Where-Object -FilterScript {$_.FreeSpacePerCent -lt 15}  |
Sort-Object -property "FreeSpacePerCent"  |
Format-Table
}

I will call this function as I read thru a list of servers in a text file passing the computer names as parameters. And since I want the results of the query in a single output file, I used the Out-File cmdlet to save the results in a text file


ForEach($s in Get-Content C:\serverlist.txt)
{
getDiskFreeSpace $s  | Out-File C:\diskFreeSpaceResults.txt -append
}

Sending emails


I used a simple script in the past to send emails via Windows PowerShell. This was prior to PowerShell v2.0

$SmtpClient = new-object system.net.mail.smtpClient
$SmtpServer = "smtp.yourmailserver.local"
$SmtpClient.host = $SmtpServer

$From = "Friendly Reminder "
$To = "recepient@yourmailserver.net"
$Title = "Subject Matter"
$Body = "Body Text"


$SmtpClient.Send($from,$to,$title,$Body)

With PowerShell v2.0, the Send-MailMessage cmdlet was made available to send an email message. This made it easier to integrate sending email functionalities in PowerShell scripts. Adding one more line to the script above, I've included email sending functionality with attachment

Send-MailMessage -to "recepient@mail.com" -from "sender@mail.com" -subject "Servers Disk Free Space Report" -Attachment "C:\diskFreeSpaceResults.txt" -SmtpServer "yourSMTPserver.mail.com"

Now, you can start using this script with a text file that lists all of your servers. Make sure that the server on which you will be running this script has Windows PowerShell v2.0 installed and that you can communicate with those servers. I've seen servers on different VLANs that are isolated from each other but are in the same Active Directory domain. This causes the script to fail due to connectivity issues. Check with your network administrator to be sure.
Google