Friday, October 12, 2007

Change what? - Change Data Capture in SQL Server 2008

It must have been tough trying to capture INSERT, UPDATE and DELETE activities in a SQL Server database. Usually, developers may resort to creating DML triggers just to solve this problem in previous versions. The problem with this approach is that it introduces unnecessary overhead in the database engine. Another approach is to read the transaction log using either the undocumented DBCC LOG and fn_dblog or using third-party tools like Lugiment Log Explorer, ApexSQL Log, or RedGate SQL Log Rescue. In SQL Server 2005, the concept of DDL triggers were introduced, thus making it easier to audit DDL activities. This new feature of SQL Server 2008 called Change Data Capture(CDC), system stored procedures are used to mark which types of objects you want to audit, and the stored procedures take care of how the auditing occurs. How it works is that it captures DML statements (this is quite interesting) asynchronously, reading the transaction logs (so much for third-party tools) and populates the CDC table with the row's data which change, and expose the changes through a relational interface which can be consumed easily. The downside of this is that the feature is only available on Enterprise and Developer Editions of SQL Server 2008. To use this feature, you have to enable it on the database level. Again, I'll use the very familiar Northwind database for this sample.

USE Northwind
EXEC sys.sp_cdc_enable_db

Once CDC is enabled, it creates the corresponding objects are created: cdc schema, cdc user, metadata tables and other system objects. The thing about this is that you have to enable it on a per table basis (I don't know why Microsoft chose to do it this way but if you decide to do this on all user-defined tables, you definitely need to write a script to accomplish this)

EXEC sp_cdc_enable_table 'dbo', 'EmployeeTerritories', @role_name = 'cdc_role';

Just to explain the parameters passed to the stored procedure, the dbo happens to be the schema in which the table belongs, EmployeeTerritories is the name of the table which will be enabled for CDC while the @role_name parameter is the name of the database role used to have access to change data. If the role already exists, permissions are given to that role. If it doesn't, it will be created with the specified name. Now, let's simulate changes in the dbo.EmployeeTerritories table (don't worry about referential integrity as this has been taken cared of)

INSERT dbo.EmployeeTerritories VALUES (4,98004), (4,98052);
WAITFOR DELAY '00:00:05';
UPDATE dbo.EmployeeTerritories SET TerritoryID =80202
WHERE employeeid = 4 AND TerritoryID=98052;

These information is stored in a new table created named schema_table_CT in the cdc schema. In this case, it would be the cdc.dbo_EmployeeTerritories_CT table. Querying this table is similar to executing the DBCC LOG command in previous versions of SQL Server - it just doesn't make sense. Good thing Microsoft came up with table-valued functions which makes querying this table a lot easier (but not straight-forward). One function which we can look at is the cdc.fn_cdc_get_all_changes__ function. This function returns one row for each change applied to the source table within the specified log sequence number (LSN) range. We don't really care about LSN ranges but this is how SQL Server tracks all changes in the transaction log. This is also part of the information stored in our database backups. Since we have this data in our cdc.dbo_EmployeeTerritories_CT table, we can use this data and pass it as a parameter in this function. Below is a sample script to use this function to query the cdc.dbo_EmployeeTerritories_CT table but would return a more meaningful resultset

@begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);

-- Obtain the beginning of the time interval.
SET @begin_time = GETDATE()-1

-- Obtain the end of the time interval.
SET @end_time = GETDATE();

-- Map the time interval to a change data capture query range.
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time

-- Return the changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_EmployeeTerritories(@from_lsn, @to_lsn, 'all');

You might notice the new function introduced - fn_cdc_map_time_to_lsn. This just returns the log sequence number (which we don't really care about) corresponding to the specified time (which we are more familiar with). The resultset would be something we could understand. What we need to really look at is the __$operation column which returns code values for the operations done for the specified table: 1 = delete, 2 = insert, 3 = update (before image), and 4 = update (after image). Imagine an UPDATE statement to be like a DELETE followed by an INSERT that's why you have a before and after images. Another way to translate the results of the cdc.dbo_EmployeeTerritories_CT table is to use the query below

operation = CASE __$operation
FROM cdc.dbo_EmployeeTerritories_CT

Bear in mind though that the data in the cdc tables are available within 3 days expressed in minutes (4320). You can clean this up when necessary using the sp_cdc_cleanup_change_table stored procedure.

This gives you an overview about change data capture and how you can use it in SQL Server 2008. Download the RC0 and start playing around with it. Check out my SQL Server 2008 videos at

Thursday, October 11, 2007

What is ready to boost? - Windows Vista Ready Boost

While I rarely use my Windows Vista laptop at home, I happen to use it to convert my PowerPoint 2007 files to 2003 version and sometimes to copy slides from 2007 to 2003. As I was copying my slides from my USB thumb drive to the laptop, I noticed an unusual (for me, it is unusual) prompt aside from the typical do-you-want-to-open-this-in-Windows-Explorer type of question.
It gives users the option to use the USB thumb drive for ReadyBoost, a feature in Windows Vista which allows you to use a non-volatile flash memory, such as that on a USB flash drive, to improve performance without having to add additional physical memory. Now, this is cool especially when you don't have the "ca$h" to buy additional RAM or if your motherboard does not support additional physical RAM, like maybe limited to a maximum of 2GB. What I did notice is that there was some increase in performance using a 1GB USB thumb drive. The reason I didn't get to see this with my other USB thumb drives/SD cards is that it has to be at least 3.5MB/s before it can be used for ReadyBoost. But still, it's worth having a cheap USB thumb drive to be used as additional memory for your machine although performance is not as good as a real physical RAM

Wednesday, October 10, 2007

MERGE statement in SQL Server 2008

The MERGE statement in SQL Server 2008 is a new DML statement that combines multiple DML operations. It performs INSERT, UPDATE, or DELETE operations on a target table based on the results of a join with a source table. This statement enables you to merge two tables based on a set of criteria. In previous versions of SQL Server, you have to create separate statements if you need to insert, update, or delete data in one table based on certain conditions in another table or query. With MERGE, you can include the logic for these data modifications in one statement. You specify a source record set, which could be a table join or a view, and a target table, and the join condition between the two. You then specify the DML statement that you want to execute when the records between the two data are matched or are not matched. One typical use for this is bulk loading data warehouse tables or maybe executing DML statements during off-peak hours. To demonstrate how the MERGE statement works, here's a very simple example. I'll create two tables and populate them with values.

USE Northwind
CREATE TABLE Table1 (ID int, [Value] varchar(20), Comment varchar(50))
CREATE TABLE Table2 (ID int, [Value] varchar(20), Comment varchar(50))
INSERT INTO Table1 VALUES (1, 'A', 'Inserted'), (2, 'B', 'Inserted')
INSERT INTO Table2 VALUES (1, 'C', 'Inserted'), (3, 'D', 'Inserted')

Here is where I'll be using the MERGE statement. What I'll do is I'll use Table1 as my source data and Table2 as my target table. I'll look at Table1 based on the ID column and, if a match is found, I'll update the Comment column with a value Match Found from Source. If the row from the Table1 does not match another row in Table2, I'll insert a new row and specify the Comment column with a value No Match Found on Target. If the row from Table2 does not match another row from Table1, I'll update the Comment column value to No Match Found from Source. The MERGE statement for this scenario looks like this

MERGE Table2 AS targetTable
USING (SELECT ID, [Value] FROM Table1) SourceTable
ON (targetTable.ID = SourceTable.ID)
WHEN MATCHED THEN UPDATE SET Comment = 'Match Found from Source'
WHEN SOURCE NOT MATCHED THEN UPDATE SET Comment = 'No Match Found from Source';

If you analyze the query, it will update the first record in Table2 with ID=1 and set the Comment field value to Match Found from Source as they both have the same values. Since the second record in Table1 has a an ID value of 2, it will update the Comment field value to No Match Found on Target on Table2 since the ID column has a value of 3. And since the Table2 has a record with no matching ID value from Table1, it will update Comment field value to No Match Found on Target.

Having explained how the concept of the MERGE statement works, let's look at a practical application. Using the Northwind database, I have created a script which will update the Products table based on the sales made for the day. It will subtract the total number of units for a specific product from the Order Details table from the UnitsInStock column of the Products table. You can run this query at the end of the day when there is not much queries running during the day. Normally, we would either create a trigger or a stored procedure which is wrapped in a transaction to solve this problem.

USE Northwind

MERGE Products AS P
USING (SELECT ProductID, SUM(Quantity) FROM [Order Details] OD
JOIN Orders O
ON OD.OrderID = O.OrderID
AND O.OrderDate = GETDATE()
GROUP BY ProductID) AS src (ProductID, OrderQty)
ON (P.ProductID = src.ProductID)
WHEN MATCHED AND P.UnitsInStock - src.OrderQty <> 0
THEN UPDATE SET P.UnitsInStock = P.UnitsInStock - src.OrderQty
WHEN MATCHED AND P.UnitsInStock - src.OrderQty = 0

Check out my SQL Server 2008 videos at

So you get an Error 0x80244019 in your WSUS client

If you happen to check your WindowsUpdate.log file and found this error, you might want to check your WSUS' IIS settings especially if you have your URLScan.ini modified. If you install WSUS after you configured your URLScan.ini, it doesn't change the settings at all. You need to add the following setting in the URLScan.ini:

[Allow Extensions]

And remove ".exe" it from...
[Deny Extensions]
[Allow Verbs]

You can find out more about WSUS troubleshooting at this site. There may be more than this reason why your WSUS just wouldn't cooperate when you deploy your patches

Monday, October 8, 2007

Spot the difference - SQL Server 2008 TSQL enhancements

If you've had your hands on T-SQL for quite some time, it wouldn't be hard for you to identify the new enhancements available in SQL Server 2008. Now just for fun, try to identify what's unusual in this script, from a SQL Server 2000/2005 point-of-view:


@variable int = 1;
@variable += 1


table1 (col1 int);

VALUES (@variable),(@variable+1

If you try to execute this script in SQL Server 2005, you'll get an error similar to the one below

Msg 139, Level 15, State 1, Line 0
Cannot assign a default value to a local variable.
Msg 137, Level 15, State 2, Line 8
Must declare the scalar variable "@variable".
Now you start thinking, what could be wring with the script? The answer: T-SQL enhancements in SQL Server 2008. First, The DECLARE statement in the first line now has the capability to accommodate initialization of variables unlike before where it only does variable declaration and you need either a SET or SELECT statement to assign values to variables. This is something similar to what you do in .NET, particularly VB.NET (the language enhancement part, in this case) where you can declare and assign values to variables in a single line. You can still use the SET statement to assign values to variables. Next is the += operator (-= is also included). I've first encountered this in C++ (Java, C# and VB.NET also have these) where the expression is used to increment (or decrement) variable values. In this case, the value for the variable @variable is set to 2 after the second line because of the += operator. The third item is the existence of multiple row predicates in the VALUES clause. This also works for an UPDATE statement. This means you can now make your INSERT or UPDATE statements shorter with fewer codes. Not that you couldn't do these in previous versions but this simply means writing less code and getting more work done (although you might be in trouble as your boss might give you more work to do)
Check out my SQL Server 2008 videos at

Speed up your SQL Server 2005 Books Online

Microsoft did a very literal translation of the term Books Online with SQL Server 2005. Books Online happens to be the help file for SQL Server. In SQL Server 2005 (and now with SQL Server "Katmai"), Books Online has the capability to go "online" and look for related searches on the Internet, MSDN Online, community websites like CodeZone, etc. A lot of users find it annoying that it takes ages to load a search result in Books Online. This is because the default configuration is that it will search for the results in the local version, then, it will go online and grab the results from the online sources configured. The make Books Online respond a bit faster to searches, configure it not to go online at all (Google still offers faster search results, after all). Open SQL Server Books Online, go to the Tools menu then select Options. When the Options window appears, select the Online option under the Help node. On the When Loading Help content section, select the Try local only, not online option. This will return search results from the local copy of Books Online (and make the searches a bit faster). Try it out and see what happens

The pitfalls of ROBOCOPY

I have been using ROBOCOPY for a lot of reasons - Home folder replication for disaster recovery purposes, file-copy utility for custom transaction log shipping for SQL Server, etc. It has been a very useful tool for me, especially when you talk about disaster recovery. But what a lot of people do not realize is that this will become a very big headache if not used properly. One scenario I have seen is a database log file being replicated over to another server. Database log files are being locked by the service or application responsible for running the database engine. In this case, depending on how it is configured, ROBOCOPY will try execute the copy command until such time that it can do so. Now what will happen if it cannot copy the file due to a lock caused by the service? It will enter an infinite loop. And if you're like me who likes to log every activity done by any utility or script, you will configure ROBOCOPY to do so as well. When this happens, the log file generated will become so huge that it would be very difficult to open with Notepad. Just a word of caution when using ROBOCOPY - DO NOT use the utility on files that are being locked or used by services or applications.

Sunday, October 7, 2007

The Pains of Collation Settings

Collation settings define the physical storage of character strings in SQL Server. It specifies the bit patterns that represent each character and the rules by which characters are sorted and compared. Collation settings have a direct impact on the databases if you configure them to use a different collation other than the default. When you configure a different collation setting during the installation, the rest of the databases (system and user-defined) will have this setting was their default when you create them unless you specifically defined something different from the default. One case I encountered was that the tempdb database had a different collation setting compared to the user databases. This is because SQL Server was reinstalled and the master and msdb databases were restored from backup together with the user-defined databases. This made the collation settings of the entire database server back to their original configuration - well, almost. Since the model database was not restored (and obviously, the tempdb database does not require a backup), collation settings for the both would be the settings defined during re-installation. A script executed in one of the user-databases would require the use of the tempdb and since they do not have the same collation settings, queries will fail unless the queries themselves take into account collation setting differences. The next best bet is to rebuild the master database and change the collation settings and go through all the grueling task of restoring the system databases, provided that the user-defined databases are still intact. Well, another approach would be to simply restore the model database if you have a backup (or simply look for the old MDF and LDF files if they are still working fine). This is because tempdb database takes the settings of the model database everytime SQL Server service starts. This will resolve collation problems should you encounter a similar case

What? I lost my port on a named instance?Dynamic port detection?

One thing I realized is that SQL Server port numbers will not be the same for I don't know what reason. I found this out while rebuilding a SQL Server named instance. We had standard settings in setting up SQL Server. But since I am rebuilding my test environment, I just used the default port number settings (which is 1433) during the process. To my surprise, it was using port number 3080 when the service started. Apparently, SQL Server dynamically determines the port it listens on. It will try to listen on the port it used previously. If for some reason it cannot bind to that port, the named instance may bind to a different port - which is obviously the case for my setup. Now this would cause a lot of problems if your SQL Server is interacting with a lot of servers, services and applications as you need to either change the port number to it's original value (which I think would be more appropriate) or change the clients to use the new port number. I have yet to dig up more details on this as to why SQL Server behaves that way. At least I know another place to look at when troubleshooting connectivity problems

I was doing more research on SQL Server port usage when I chanced upon this Microsoft KB. SQL Server clients use DBNETLIB to perform port detection. This is either thru ODBC or SQLOLEDB componentes which loads DBNETLIB. SQL Server clients can use the Client Network Utility for dynamic port detection. This is to make sure that in case SQL Server uses a different port number during service startup or is configured to user dynamic port assignments, the client can map to the port number that a specific instance uses. Now, what if the client is a third party application running on a non-Microsoft platform(like a JSP application running on top of Apache on a Linux box?) How does this port resolution work?