Search This Blog

Tuesday, October 11, 2011

Best Sql Server Interview Questions 2000/2005/2008 : Part 3

Part 1
Part 2

1.      Why can there be only one Clustered Index and not more than one?
Cluster Index physically stores data, or arranges data in one order (depends on which column(s) you have defined Clustered index and in which order).
As a fact, we all know that a set of data can be only stored in only one order; that is why only one clustered index is possible

2.      How to Create Primary Key with Specific Name while Creating a Table?
CREATE TABLE [dbo].[TestTable](
[FirstName] [varchar](100)NULL,
([ID] ASC))

3.      How to Enable/Disable Indexes?

--Disable Index
--Enable Index

4.      What is T-SQL Script to Take Database Offline – Take Database Online?

-- Take the Database Offline
-- Take the Database Online

5.      Can we Insert Data if Clustered Index is Disabled?

No, we cannot insert data if Clustered Index is disabled because Clustered Indexes are in fact original tables which are physically ordered according to one or more keys (Columns).

6.      How to Recompile Stored Procedure at Run Time?

We can Recompile Stored Procedure in two ways.
Option 1:
CREATE PROCEDURE dbo.PersonAge(@MinAge INT, @MaxAge INT)
FROM dbo.tblPerson
WHERE Age <= @MinAge AND Age >= @MaxAge
Option 2:
EXEC dbo.PersonAge65, 70 WITH RECOMPILE
We can use RECOMPILE hint with a query and recompile only that particular query. However, if the parameters are used in many statements in the stored procedure and we want to recompile all the statements, then instead of using the RECOMPILE option with all the queries, we have one better option that uses WITH RECOMPILE during stored procedure creation or execution.
This method is not recommended for large stored procedures because the recompilation of so many statements may outweigh the benefit of a better execution plan. 

7.      What is the Maximum Number of Index per Table?
For SQL Server 2005:
1 Clustered Index + 249 Nonclustered Index = 250 Index.
For SQL Server 2008:
1 Clustered Index + 999 Nonclustered Index = 1000 Index.

8.      Where are SQL server Usernames and Passwords Stored in the SQL server?

They get stored in System Catalog Views, sys.server_principals and sys.sql_logins. However, you will not find password stored in plain text.

9.      What does TOP Operator Do?

The TOP operator is used to specify the number of rows to be returned by a query. The TOP operator has new addition in SQL SERVER 2008 that it accepts variables as well as literal values and can be used with INSERT, UPDATE, and DELETES statements.

10.  What is CTE?

CTE is the abbreviation for Common Table Expression. A CTE is an expression that can be thought of as a temporary result set which is defined within the execution of a single SQL statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query.

11.  What are the Advantages of Using CTE?
  • Using CTE improves the readability and enables easy maintenance of complex queries.
  • The query can be divided into separate, simple, and logical building blocks, which can be then used to build more complex CTEs until the final result set is generated.
  • CTE can be defined in functions, stored procedures, triggers or even views.
  • After a CTE is defined, it can be used as a Table or a View and can SELECT, INSERT, UPDATE or DELETE Data.

12.  What is MERGE Statement?

MERGE is a new feature that provides an efficient way to perform multiple DML operations. In previous versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions, but now, using MERGE statement, we can include the logic of such data modifications in one statement that even checks when the data is matched, then just update it, and when unmatched, insert it.
One of the most important advantages of MERGE statement is all the data is read and processed only once.
In previous versions three different statement has to be written to process three different activity (INSERT, UPDATE or DELETE), however using MERGE statement all update activity can be done in one pass of database table. This is quite an improvement in performance of database query.
Syntax of MERGE statement is as following:
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] target_table [ WITH ( <merge_hint> ) ] [ [ AS ] table_alias]
USING <table_source>
ON <merge_search_condition>
[ WHEN MATCHED [ AND <clause_search_condition> ]
THEN <merge_matched> ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ]
THEN <merge_not_matched> ]
[ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ]
THEN <merge_matched> ]
[ <output_clause> ]
[ OPTION ( <query_hint> [ ,...n ] ) ]

13.  What is Filtered Index?

Filtered Index is used to index a portion of rows in a table that means it applies filter on INDEX which improves query performance, reduces index maintenance costs, and reduces index storage costs when compared with full-table indexes. When we see an Index created with a WHERE clause, then that is actually a FILTERED INDEX.

14.  How can we Rewrite Sub-Queries into Simple Select Statements or with Joins?

Yes. We can rewrite sub-queries using the Common Table Expression (CTE). A Common Table Expression (CTE) is an expression that can be thought of as a temporary result set which is defined within the execution of a single SQL statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query.
WITH EmployeeDepartment_CTE AS (
SELECT EmployeeID,DepartmentID,ShiftID
FROM HumanResources.EmployeeDepartmentHistory
SELECT ecte.EmployeeId,ed.DepartmentID, ed.Name,ecte.ShiftID
FROM HumanResources.Department ed
INNER JOIN EmployeeDepartment_CTE ecte ON ecte.DepartmentID = ed.DepartmentID

15.  What is CLR?
In SQL Server 2008, SQL Server objects such as user-defined functions can be created using such CLR languages. This CLR language support extends not only to user-defined functions, but also to stored procedures and triggers. You can develop such CLR add-ons to SQL Server using Visual Studio 2008. 

16.  What are Synonyms?
Synonyms give you the ability to provide alternate names for database objects. You can alias object names; for example, using the Employee table as Emp. You can also shorten names. This is especially useful when dealing with three and four part names; for example, shortening server.database.owner.object to object. 

17.  What is LINQ?
Language Integrated Query (LINQ) adds the ability to query objects using .NET languages. The LINQ to SQL object/relational mapping (O/RM) framework provides the following basic features:
Tools to create classes (usually called entities) mapped to database tables
Compatibility with LINQ’s standard query operations
The DataContext class with features such as entity record monitoring, automatic SQL statement generation, record concurrency detection, and much more

18.  What is Use of EXCEPT Clause?

EXCEPT clause is similar to MINUS operation in Oracle. The EXCEPT query and MINUS query return all rows in the first query that are not returned in the second query. Each SQL statement within the EXCEPT query and MINUS query must have the same number of fields in the result sets with similar data types.

19.  What is XPath?

XPath uses a set of expressions to select nodes to be processed. The most common expression that you’ll use is the location path expression, which returns back a set of nodes called a node set. XPath can use both an unabbreviated and abbreviated syntax. The following is the unabbreviated syntax for a location path:

20.  What is NOLOCK?

Using the NOLOCK query optimizer hint is generally considered a good practice in order to improve concurrency on a busy system. When the NOLOCK hint is included in a SELECT statement, no locks are taken on data when data is read. The result is a Dirty Read, which means that another process could be updating the data at the exact time you are reading it. There are no guarantees that your query will retrieve the most recent data. The advantage to performance is that your reading of data will not block updates from taking place, and updates will not block your reading of data. SELECT statements take Shared (Read) locks. This means that multiple SELECT statements are allowed simultaneous access, but other processes are blocked from modifying the data. The updates will queue until all the reads have completed, and reads requested after the update will wait for the updates to complete. The result to your system is delay (blocking).

21.  What is the Difference between Update Lock and Exclusive Lock?

When Exclusive Lock is on any process, no other lock can be placed on that row or table. Every other process has to wait till Exclusive Lock completes its tasks.
Update Lock is a type of Exclusive Lock, except that it can be placed on the row which already has Shared Lock on it. Update Lock reads the data of the row which has the Shared Lock as soon as the Update Lock is ready to change the data it converts itself to the Exclusive Lock.

22.  How will you Handle Error in SQL SERVER 2008?

SQL Server now supports the use of TRY…CATCH constructs for providing rich error handling. TRY…CATCH lets us build error handling at the level we need, in the way we need to by setting a region where if any error occurs, it will break out of the region and head to an error handler. The basic structure is as follows:
So if any error occurs in the TRY block, then execution is diverted to the CATCH block, and the error can be resolved.

23.  What is the XML Datatype?
The xml data type lets you store XML documents and fragments in a SQL Server database. An XML fragment is an XML instance that has a missing single top-level element. You can create columns and variables of the xml type and store XML instances in them. The xml data type and associated methods help integrate XML into the relational framework of SQL Server.

24.  How can I Track the Changes or Identify the Latest Insert-Update-Delete from a Table?
In SQL Server 2005 and earlier versions, there is no inbuilt functionality to know which row was recently changed and what the changes were. However, in SQL Server 2008, a new feature known as Change Data Capture (CDC) has been introduced to capture the changed data.

25.  How to Find Tables without Indexes?

Run the following query in the Query Editor.
USE <database_name>;
SELECT SCHEMA_NAME(schema_id) AS schema_name
,name AS table_name
FROM sys.tables
ORDER BY schema_name, table_name;

1.      What is Difference between GETDATE and SYSDATETIME in SQL Server 2008?

In case of GETDATE, the precision is till milliseconds, and in case of SYSDATETIME, the precision is till nanoseconds

2.      What is the ‘FILLFACTOR’?

A “FILLFACTOR” is one of the important arguments that can be used while creating an index.
According to MSDN, FILLFACTOR specifies a percentage that indicates how much the Database Engine should fill each index page during index creation or rebuild. Fill-factor is always an integer valued from 1 to 100. The fill-factor option is designed for improving index performance and data storage. By setting the fill-factor value, you specify the percentage of space on each page to be filled with data, reserving free space on each page for future table growth.
Specifying a fill-factor value of 70 would imply that 30 percent of each page will be left empty, providing space for index expansion as data is added to the underlying table. The fill-factor setting applies only when the index is created or rebuilt.

3.      What are Various Limitations of the Views?
  • ORDER BY clause does not work in View. (works with Top keyword)
  • Regular queries or Stored Procedures give us flexibility when we need another column; we can add a column to regular queries right away. If we want to do the same with Views, then we will have to modify them first.
  • Index created on view not used often.
  • Once the view is created and if the basic table has any column added or removed, it is not usually reflected in the view till it is refreshed.
  • One of the most prominent limitations of the View it is that it does not support COUNT (*); however, it can support COUNT_BIG (*).
  • UNION Operation is now allowed in Indexed View.
  • We cannot create an Index on a nested View situation means we cannot create index on a view which is built from another view.
  • Outer Join, SELF JOIN Not Allowed in Indexed View.
  • Cross Database Queries Not Allowed in Indexed View.

4.      What is a Covered index?

It is an index that can satisfy a query just by its index keys without having needed to touch the data pages.
It means that when a query is fired, SQL Server doesn’t need to go to the table to retrieve the rows, but can produce the results directly from the index as the index covers all the columns used in query.

5.      When I delete any Data from a Table, does the SQL Server reduce the size of that table?

When data are deleted from any table, the SQL Server does not reduce the size of the table right away; however, it marks those pages as free pages, showing that they belong to the table. When new data are inserted, they are put into those pages first. Once those pages are filled up, SQL Server will allocate new pages. If you wait for sometime, the background process de-allocates the pages, finally reducing the page size.

6.      What are different transaction levels in SQL SERVER?

Transaction Isolation level decides how is one process isolated from other process.
Using transaction levels, you can implement locking in SQL SERVER.

There are four transaction levels in SQL SERVER:-

The shared lock is held for the duration of the transaction, meaning that no other transactions can change the data at the same time. Other transactions can insert and modify data in the same table, however, as long as it is not locked by the first transaction.

No shared locks and no exclusive locks are honored. This is the least restrictive isolation level resulting in the best concurrency but the least data integrity.

This setting disallows dirty and non-repeatable reads. However, even though the locks are held on read data, new rows can still be inserted in the table, and will subsequently be interpreted by the transaction.

This is the most restrictive setting holding shared locks on the range of data. This setting does not allow the insertion of new rows in the range that is locked; therefore, no phantoms are allowed.

Following is the syntax for setting transaction level in SQL SERVER.

7.      What is LOCK escalation?
Lock escalation is the process of converting of low-level locks (like rowlocks, page locks) into higher-level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards SQL Server dynamically manages it.

8.      What are the different ways of moving data between databases in SQL Server?
There are lots of options available; you have to choose your option depending upon your requirements. Some of the options you have are BACKUP/RESTORE, detaching and attaching databases, replication, DTS, BCP, logshipping, INSERT...SELECT, SELECT...INTO, creating INSERT scripts to generate data.

9.      What is the purpose of Replication?
Replication is way of keeping data synchronized in multiple databases. SQL server replication has two important aspects publisher and subscriber.
Database server that makes data available for replication is known as Publisher.
Database Servers that get data from the publishers is called as Subscribers.

10.  What are the different types of replication supported by SQL SERVER?

There are three types of replication supported by SQL SERVER:-

Snapshot Replication:
Snapshot Replication takes snapshot of one database and moves it to the other database. After initial load data can be refreshed periodically. The only disadvantage of this type of replication is that all data has to be copied each time the table is refreshed.

Transactional Replication:
In transactional replication, data is copied first time as in snapshot replication, but later only the transactions are synchronized rather than replicating the whole database. You can either specify to run continuously or on periodic basis.

Merge Replication:
Merge replication combines data from multiple sources into a single central database. Again as usual, the initial load is like snapshot but later it allows change of data both on subscriber and publisher, later when they come on-line it detects and combines them and updates accordingly.

11.  What are Ranking Functions?

Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic. The different Ranking functions are as follows:
ROW_NUMBER () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
RANK () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the rank of each row within the partition of a result set.
DENSE_RANK () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the rank of rows within the partition of a result set, without any gaps in the ranking. 

Best Sql Server Interview Questions 2000/2005/2008 : Part 2

Part 1
Part 3

1.       What’s difference between “Optimistic” and “Pessimistic”   locking?

In pessimistic locking when user wants to update data it locks the record and till then no one can update data. Other user’s can only view the data when there is pessimistic locking.

In optimistic locking multiple users can open the same record for updating, thus increase maximum concurrency. Record is only locked when updating the record. This is the most preferred way of locking practically. Now a days in browser based application it is very common and having pessimistic locking is not a practical solution.

2.      When is the use of update_statistics command?

This command is basically used when a large amount of data is processed. If a large amount of deletions, modifications or Bulk Copy into the tables has occurred, it has to update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.

3.       What is the difference between a HAVING CLAUSE and a WHERE
You can use Having Clause with the GROUP BY function in a query and WHERE Clause is applied to each row before, they are part of the GROUP BY function in a query.

4.      What is Connection Pooling and why it is Used?

To minimize the cost of opening and closing connections, ADO.NET uses an optimization technique called connection pooling.
The pooler maintains ownership of the physical connection. It manages connections by keeping alive a set of active connections for each given connection configuration. Whenever a user calls Open on a connection, the pooler looks for an available connection in the pool. If a pooled connection is available, it returns it to the caller instead of opening a new connection. When the application calls Close on the connection, the pooler returns it to the pooled set of active connections instead of closing it. Once the connection is returned to the pool, it is ready to be reused on the next Open call.

5.      Types of Sub-query

·         Single-row sub-query, where the sub-query returns only one row.
·         Multiple-row sub-query, where the sub-query returns multiple rows, and
·         Multiple column sub-query, where the sub-query returns multiple columns
Note: A sub-query can’t contain an ORDER BY clause; however sub-query can use                           ORDER BY when used with TOP clause.

6.      What is SQL Profiler?

SQL Profiler is a graphical tool that allows system administrators to monitor events in an instance of Microsoft SQL Server. You can capture and save data about each event to a file or SQL Server table to analyze later. For example, you can monitor a production environment to see which stored procedures are hampering performances by executing very slowly.

7.      What is an SQL Server Agent?

The SQL Server agent plays an important role in the day-to-day tasks of a database administrator (DBA). It is often overlooked as one of the main tools for SQL Server management. Its purpose is to ease the implementation of tasks for the DBA, with its full-function scheduling engine, which allows you to schedule your own jobs and scripts.

8.      Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting are possible?
Yes. You can nest stored procedures and managed code references up to 32 levels.
Stored procedures are nested when one stored procedure calls another or executes managed code by referencing a CLR routine, type, or aggregate. You can nest stored procedures and managed code references up to 32 levels. The nesting level increases by one when the called stored procedure or managed code reference begins execution and decreases by one when the called stored procedure or managed code reference completes execution. Attempting to exceed the maximum of 32 levels of nesting causes the whole calling chain to fail. The current nesting level for the stored procedures in execution is stored in the @@NESTLEVEL function.

9.      What is the Difference between a Local and a Global Temporary Table?
A local temporary table exists only for the duration of a connection, or if defined inside a compound statement, for the duration of the compound statement.
A global temporary table remains in the database accessible across the connections. Once the connection where original global table is declared dropped this becomes unavailable.

10.  What is the STUFF Function and How Does it Differ from the REPLACE Function?
STUFF function is used to overwrite existing characters using this syntax:        STUFF (string_expression, start, length, replacement_characters), where string_expression is the string that will have characters substituted, start is the starting position, length is the number of characters in the string that are substituted, and replacement_characters are the new characters interjected into the string. REPLACE function is used to replace existing characters of all occurrences. Using the syntax REPLACE (string_expression, search_string, replacement_string), every incidence of search_string found in the string_expression will be replaced with replacement_string.

11.  What is the difference between UNION and UNION ALL?
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between UNION and UNION ALL is that UNION ALL will not eliminate duplicate rows; instead it just pulls all rows from all the tables fitting your query specifics and combines them into a table.

12.  What is B-Tree?
The database server uses a B-tree structure to organize index information. B-Tree generally has following types of index pages or nodes:
·         Root node: A root node contains node pointers to only one branch node.
·         Branch nodes: A branch node contains pointers to leaf nodes or other branch nodes, which can be two or more.
·         Leaf nodes: A leaf node contains index items and horizontal pointers to other leaf nodes, which can be many. 

13.  What are the Advantages of Using Stored Procedures?
·         Stored procedure can reduced network traffic and latency, boosting application performance.
·         Stored procedure execution plans can be reused; they staying cached in SQL Server’s memory, reducing server overhead.
·         Stored procedures help promote code reuse.
·         Stored procedures can encapsulate logic. You can change stored procedure code without affecting clients.
·         Stored procedures provide better security to your data. 

14.  What is a Table Called, if it has neither Cluster nor Non-cluster Index? What is it used for?
Unindexed table or Heap. Microsoft Press Books and Book on Line (BOL) refers it as Heap. A heap is a table that does not have a clustered index and therefore, the pages are not linked by pointers. The IAM pages are the only structures that link the pages in a table together. Unindexed tables are good for fast storing of data. Many times, it is better to drop all the indexes from table and then do bulk of Inserts and restore those indexes after that.

15.  Can SQL Servers Linked to other Servers like Oracle?
SQL Server can be linked to any server provided it has OLE-DB provider from Microsoft to allow a link, e.g. Oracle has an OLE-DB provider that Microsoft provides to add it as a linked server to the SQL Server group.

16.  What is BCP? When is it used?
BCP or BulkCopy is a tool used to copy huge amounts of data from tables and views. BCP does not copy the complete structures from source to destination. BULK INSERT command helps to import a data file into a database table or view in a user-specified format.

17.  What Command do we Use to Rename a db, a Table and a Column?
To Rename db
sp_renamedb ‘oldname’ , ‘newname
If someone is using db it will not accept sp_renamedb. In that case, first bring db to single user mode using sp_dboptions. Use sp_renamedb to rename the database. Use sp_dboptions to bring the database to multi-user mode.
EXEC sp_dboption YourDbName, 'Single User', True
EXEC sp_renamedb 'YourDbName',' YourDbName_New'
EXEC sp_dboption YourDbName, 'Single User', False
To Rename Table
We can change the table name using sp_rename as follows:
sp_rename 'oldTableName' 'newTableName'
To rename Column
The script for renaming any column is as follows:
sp_rename 'TableName.[OldcolumnName]', 'NewColumnName', 'Column'

18.  What is an Execution Plan? When would you Use it? How would you View the Execution Plan?
An execution plan is basically a road map that graphically or textually shows the data retrieval methods chosen by the SQL Server query optimizer for a stored procedure or ad-hoc query, and it is a very useful tool for a developer to understand the performance characteristics of a query or stored procedure since the plan is the one that SQL Server will place in its cache and use to execute the stored procedure or query.
Within the Query Analyzer, there is an option called “Show Execution Plan” (in the Query drop-down menu). If this option is turned on, it will display query execution plan in a separate window when the query is ran again.

19.  What is the difference between CHAR and VARCHAR Datatypes?
VARCHARS are variable length strings with a specified maximum length. If a string is less than the maximum length, then it is stored verbatim without any extra characters, e.g. names and emails. CHARS are fixed-length strings with a specified set length. If a string is less than the set length, then it is padded with extra characters, e.g. phone number and zip codes. For instance, for a column which is declared as VARCHAR(30) and populated with the word ‘SQL Server,’ only 10 bytes will be stored in it. However, if we have declared the column as CHAR(30) and populated with the word ‘SQL Server,’ it will still occupy 30 bytes in database.

20.  What is the Difference between VARCHAR and VARCHAR (MAX) Data types?
VARCHAR stores variable-length character data whose range varies up to 8000 bytes; varchar(MAX) stores variable-length character data whose range may vary beyond 8000 bytes and till 2 GB. TEXT datatype is going to be deprecated in future versions, and the usage of VARCHAR (MAX) is strongly recommended instead of TEXT datatypes.
If you know the max length of characters to be entered go for varchar (length) otherwise go for VARCHAR (MAX).

21.  What is the Difference between VARCHAR and NVARCHAR data types?
In principle, they are the same and are handled in the same way by your application. The only difference is that NVARCHAR can handle Unicode characters, allowing you to use multiple languages in the database (Arabian, Chinese, etc.). NVARCHAR takes twice as much space when compared to VARCHAR.
Use NVARCHAR only if you are using foreign languages.

22.  Which are the Important Points to Note when Multilanguage Data is Stored in a Table?
There are two things to keep in mind while storing unicode data. First, the column must be of unicode data type (nchar, nvarchar, ntext). Second, the value must be prefixed with N while insertion.
For example: INSERT INTO table (Hindi_col) values (N’hindi data’)

23.  How to Optimize Stored Procedure Optimization?
·         Include SET NOCOUNT ON statement.
·         Use schema name with object name.
·         Do not use the prefix “sp_” in the stored procedure name.
·         Use IF EXISTS (SELECT 1) instead of (SELECT *).
·         Use the sp_executesql stored procedure instead of the EXECUTE statement.
·         Try to avoid using SQL Server cursors whenever possible.
·         Keep the Transaction as short as possible.
·         Use TRY-Catch for error handling. 
24.      How to Find Out the List Schema Name and Table Name for the Database?
We can use following script:
SELECT '['+ SCHEMA_NAME (schema_id) +'].['+name+']' AS SchemaTable
FROM sys.tables

25.      What is CHECKPOINT Process in the SQL Server?
CHECKPOINT process writes all dirty pages for the current database to disk. Dirty pages are data pages that have been entered into the buffer cache and modified, but not yet written to disk.

Introduction to Change Data Capture (CDC) in SQL Server 2008

      How can I Track the Changes or Identify the Latest Insert-Update-Delete from a Table?

In SQL Server 2005 and earlier versions, there is no inbuilt functionality to know which row was recently changed and what the changes were. However, in SQL Server 2008, a new feature known as Change Data Capture (CDC) has been introduced to capture the changed data.


Often, you’ll be told that the specification of an application requires that  the value of  data in the database of an application must be recorded before it is changed. In other words, we are required to save all the history of the changes to the data. This feature is usually implemented for data security purposes. To implement this, I have seen a variety of solutions from triggers, timestamps and complicated queries (stored procedures) to audit data.
SQL Server 2005 introduced the new features of ‘after update’, ‘after insert’ and ‘after delete’ triggers that  almost solved the problem of tracking changes in data.  A better solution was introduced in SQL Server 2008 and is called Change Data Capture (CDC). CDC has allowed SQL Server developers to deliver SQL Server data archiving and capturing without any additional programming.
CDC is one of the new data tracking and capturing features of SQL Server 2008. It only tracks changes in user-created tables. Because captured data is then stored in relational tables, it can be easily accessed and retrieved subsequently, using regular T-SQL.
When you apply Change Data Capture features on a database table, a mirror of the tracked table is created with the same column structure of the original table, but with additional columns that include the metadata used to summarize the nature of the change in the database table row.  The SQL Server DBA can then easily monitor the activity for the logged table using these new audit tables .


Enabling Change Data Capture on a Database

CDC first has to be enabled for the database. Because CDC is a table-level feature, it then has to be enabled for each table to be tracked. You can run following query and check whether it is enabled for any database. 

USE master
SELECT [name]database_idis_cdc_enabled 
FROM sys.databases      

This query will return the entire database name along with a column that shows whether  CDC is enabled.

You can run this stored procedure in the context of each database to enable CDC at database level. (The following script will enable CDC in AdventureWorks database. )

USE AdventureWorks
EXEC sys.sp_cdc_enable_db

As soon as CDC is enabled, it will show this result in SSMS.

Additionally, in the database AdventureWorks, you will see that a schema with the name ‘cdc’ has now been  created.

Some System Tables will have been created within the  AdventureWorks database as part of the cdc schema.

The table which have been created are listed here.
·         cdc.captured_columns – This table returns result for list of captured column.
·         cdc.change_tables – This table returns list of all the tables which are enabled for capture.
·         cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
·         cdc.index_columns – This table contains indexes associated with change table.
·         cdc.lsn_time_mapping – This table maps LSN number (for which we will learn later) and time.

 Enabling Change Data Capture on one or more Database Tables

The CDC feature can be applied at the table-level  to any database for which CDC is enabled.  It has to be enabled for any table which needs to be tracked. First run following query to show which tables of database have already been enabled for CDC.

USE AdventureWorks
SELECT [name]is_tracked_by_cdc 
FROM sys.tables

The above query will return a result that includes a column with the  table name, along with a column which displays if CDC is enabled or not.

You can run the following stored procedure to enable each table. Before enabling CDC at the table level, make sure that you have  enabled SQL Server Agent. When CDC is enabled on a table, it creates two CDC-related jobs that are specific to the database,  and executed using SQL Server Agent. Without SQL Server Agent enabled, these jobs will not execute.
Additionally, it is very important to understand the role of the required parameter @role_name. If there is any restriction of how data should be extracted from database, this option is used to specify any role which is following restrictions and gating access to data to this option if there is one.  If you do not specify any role and, instead, pass a NULL value, data access to this changed table will not be tracked and will be available to access by everybody. 

Following script will enable CDC on HumanResources.Shift table.

USE AdventureWorks
EXEC sys.sp_cdc_enable_table
@source_schema N'HumanResources',
@source_name   N'Shift',
@role_name     NULL

As we are using AdventureWorks database, it creates the jobs with following names. 

1.       cdc.AdventureWorks_capture – When this job is executed it runs the system stored procedure sys.sp_MScdc_capture_job.  The procedure sys.sp_cdc_scan  is called internally by sys.sp_MScdc_capture_job. This procedure cannot be executed explicitly when a change data capture log scan operation is already active or when the database is enabled for transactional replication. This system SP enables SQL Server Agent, which in facts enable Change Data Capture feature.
2.       cdc.AdventureWorks_cleanup – When this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job. This system SP cleans up database changes tables. 

The Stored Procedure sys.sp_cdc_enable_table enables CDC. There are several options available with this SP but we will only mention the required options for this SP. CDC is very powerful and versatile tool. By understanding the Stored Procedure  sys.sp_cdc_enable_table you will gain the true potential of the CDC feature. One more thing to notice is that when these jobs are created they are automatically enabled as well.

By default, all the columns of the specified table  is taken into consideration of this operation. If you want to only few columns of this table to be tracked in that case you can specify the columns as one of the parameters of above mentioned SP.
When everything is successfully completed,  check  the system tables again and you will find a new table  called cdc.HumanResources_Shift_CT. This table will contain all the changes in the table HumanResources.Shift. If you expand this table, you will find five additional columns as well.  

As you will see there are five additional columnsto the mirrored original table
·         __$start_lsn
·         __$end_lsn
·         __$seqval
·         __$operation
·         __$update_mask

There are two values which are very important to us is __$operation and __$update_mask.
Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.

·         Delete Statement = 1
·         Insert Statement = 2
·         Value before Update Statement = 3
·         Value after Update Statement = 4 

The column _$update_mask shows, via a bitmap,   which columns were updated in the DML operation that was specified by _$operation.  If this was  a DELETE or INSERT operation,   all columns are updated and so the mask contains value which has all 1’s in it. This mask is contains value which is formed with Bit values.

Example of Change Data Capture

We will test this feature by doing  DML operations such as INSERT, UPDATE and DELETE on the table HumanResources.Shift which we have set up for CDC. We will observe the effects on the CDC table cdc.HumanResources_Shift_CT.
Before we start let’s first SELECT from both tables and see what is in them. 

USE AdventureWorks
FROM HumanResources.Shift
USE AdventureWorks
FROM cdc.HumanResources_Shift_CT

The result  of the query is as displayed here.

The original table HumanResources.Shift has three rows in it, whereas the  table cdc.HumanResources_Shift_CT is totally empty. This table will have entries after an operation on the tracked table.

Insert Operation

Lets run an INSERT operation on the table HumanResources.Shift.

USE AdventureWorks
INSERT INTO [HumanResources].[Shift]

Once the script is run, we will check the content of two of our tables HumanResources.Shift and cdc.HumanResources_Shift_CT.

Because of the INSERT operation, we have a newly inserted fourth row in the tracked table HumanResources.Shift . The tracking table also has the same row visible. The value of _operation is 2 which means that this is an INSERT operation.

Update Operation

To illustrate the effects of an UPDATE we will update a newly inserted row. 

USE AdventureWorks
UPDATE [HumanResources].[Shift]
SET Name = 'New Name',
      ModifiedDate = GETDATE()
WHERE ShiftID = 4

Once more, we check our tables HumanResources.Shift and cdc.HumanResources_Shift_CT.

UPDATE operations always result in two different entries in the tracking table. One entry contains the previous values before the UPDATE is executed.  The second entry is for new data after the UPDATE is executed. In our case we have only changed two columns of the table but we are tracking the complete table so all the entries in the tableare logged before, and after, the update happens. The Change Data Capture mechanism always captures all the columns of the table unless, when CDC is set up on a table, it is restricted to track only a few columns. We will see how this can be done later on this article.

Delete Operation

To verify this option we will be running a DELETE operation on a newly inserted row. 

USE AdventureWorks
FROM [HumanResources].[Shift]

Once this script is run, we can see the contents of  our tables HumanResources.Shift and cdc.HumanResources_Shift_CT.

Due to the DELETE operation, we now have only three rows in the tracked table HumanResources.Shift We can see the deleted row visible in the tracking table as new entry. The value of _operation is 4 , meaning that this is a delete operation.

Change Data Capture and Operations

We have now verified that, by using CDC, we are able to capture all the data  affected by DML operation. In the tracked table we have four values of the operation. We can see this operation’s value in the following image.

Understanding Update mask

It is important to understand the Update mask column in the tracking table. It is named as _$update_mask. The value displayed in the field is hexadecimal but is stored as binary.
In our example we have three different operations. INSERT and DELETE operations are done on the complete row and not on individual columns. These operations are listed marked masked with 0x1F is translated in binary as 0b11111, which means all the five columns of the table.
In our example, we had an UPDATE on only two columns – the second and fifth column. This is represented with 0x12 in hexadecimal value ( 0b10010 in binary).  Here, this value stands for second and fifth value if you look at it from the right, as a bitmap. This is a useful way of finding out which columns are being updated or changed.
The tracking table shows  two columns which contains the suffix lsn in them i.e. _$start_lsn and _$end_lsn. These two values correspond to the  Log Sequential Number. This number is associated with committed transaction of the DML operation on the tracked table.

Disabling Change Data Capture on a table

Disabling this feature is very simple. As we have seen earlier, if we have to enable CDC we have to do this in two steps – at table level and at database level,: In the same way, when we have to disable this feature, we can do this at same two levels. Let us see both of them one after one.
For dropping any tracking of any table we need three values the Source Schema, the Source Table name, and the Capture Instance. It is very easy to get schema and table name. In our case, the schema is HumanResource and table name is Shift, however we do not know the name of the Capture Instance. We can retrieve it very easily by running following T-SQL Query.

USE AdventureWorks;
EXEC sys.sp_cdc_help_change_data_capture

this will return a result which contains all the three required information for disabling CDC ona table.

The Stored Procedure  sys.sp_cdc_help_change_data_capture provides lots of other useful information as well. Once we have name of the capture instance, we can disable tracking of the table by running this T-SQL query.

USE AdventureWorks;
EXECUTE sys.sp_cdc_disable_table
    @source_schema = N'HumanResources',
    @source_name = N'Shift',
    @capture_instance = N'HumanResources_Shift';

Once Change Data Capture is disabled on any table, it drops the change data capture table as well all the functions which were associated with them. It also deletes all the rows and data associated with this feature from all the system tables and changes relevant data in catalog views.

In our example, we can clearly see that capture table cdc.HumanResources_Shift_CT is dropped.

Disable Change Data Capture Feature on Database

This is the easiest task out of all process. Running following T-SQL query will disable CDC on whole database. 

USE AdventureWorks
EXEC sys.sp_cdc_disable_db
This Stored Procedure will delete all the data, functions, tables related to CDC. If this data is needed for any reason, you must take a  backup  before dropping CDC from any database

Capture Selected Column

When CDC is enabled on any table, it usually captures the data of all the columns. During INSERT or DELETE operations, it is necessary to capture all the data but in UPDATE operations  only the data of the updated columns are required. CDC is not yet advanced enough to provide this kind of dynamic column selection but CDC can let you select the columns from which changes to data should be captured from the beginning.
This stored procedure should be run in the context of each database to enable it at database level. Following script will enable CDC in AdventureWorks database. 

USE AdventureWorks
EXEC sys.sp_cdc_enable_db

Now we will enable this feature at table level but for selected columns of ShiftID and Name only. This script will enable table-level change data capture for only two columns. 

USE AdventureWorks
EXEC sys.sp_cdc_enable_table
@source_schema N'HumanResources',
@source_name   N'Shift',
@role_name     NULL,
@captured_column_list '[ShiftID],[Name]'

So what’s in the system table which will be created by data capturing purpose in AdventureWorks Database?

So you can see that there are now only two rows which are tracked.
We will change the data of one of the columns that weren’t specified so as to see  the value in cdc.HumanResources_Shift_CT table.
Before we start let us first select from both of the table and observe its content. 

USE AdventureWorks
FROM HumanResources.Shift
USE AdventureWorks
FROM cdc.HumanResources_Shift_CT

Here is the result.

The original table HumanResources.Shift now has three rows in it; whereas  table cdc.HumanResources_Shift_CT is totally empty. Lets update ModifiedDate for ShiftID =1 and see if that record creates an entry in the tracking table. 

USE AdventureWorks
UPDATE [HumanResources].[Shift]
SET        ModifiedDate GETDATE()
WHERE  ShiftID 3

Now to check the contents of the tracking table  table cdc.HumanResources_Shift_CT and see whether that change is captured.

The tracking table is empty because it only tracks the changes which it contains, and it ignores any changes in other columns.
Retrieve Captured Data of Specific Time Frame
Quite often, one is asked for data to be tracked over a  time interval. If you look at the tracking data there is apparently no time captured at all. It always provides all the information. However, there are few fields which can definitely help us out i.e. _$start_lsn . LSN stands for Last Sequence Number. Every record in transaction log is uniquely identified by a LSN. They are always incrementing numbers.
LSN numbers are always associated with time and their mapping can be found after querying system table  cdc.lsn_time_mapping. This table is one of the tables which was created when AdventureWorks database was enabled for CDC. You can run this query to get all the data in the table  cdc.lsn_time_mapping.

USE AdventureWorks
FROM cdc.lsn_time_mapping

When  this query is run it will give us all the rows of table. It is a little difficult to find the  necessary information from all the data. The usual case is when we need to inspect a change that occurred in a particular  time period.

We can find the time that corresponds to the LSN by using the system function sys.fn_cdc_map_time_to_lsn. If we want all the changes done yesterday, we can run this function as described below and it will return all the rows from yesterday.
Before we run this query let us explore two table valued functions (TVF) in AdventureWorks database. You can see that there are two new TVF are created with schema cfc. These functions are created when table level CDC was enabled.

The function cdc.fn_cdc_get_all_changes_HumanResources_Shift can be used to get events that occurred over a particular time period. You can run this T-SQL script to get event happened during any specific time period. In our case, we will be retrieving this data for the past 24 hours.
Following query should do retrieve data which was modified in the past 24 hours..

USE AdventureWorks
DECLARE @begin_time DATETIME@end_time DATETIME@begin_lsn BINARY(10), @end_lsn BINARY(10);
SELECT @begin_time GETDATE()-1@end_time GETDATE();
SELECT @begin_lsn sys.fn_cdc_map_time_to_lsn('smallest greater than'@begin_time);
SELECT @end_lsn sys.fn_cdc_map_time_to_lsn('largest less than or equal'@end_time);
FROM cdc.fn_cdc_get_all_changes_HumanResources_Shift(@begin_lsn,@end_lsn,'all')

we have used relational operations in the function sys.fn_cdc_map_time_to_lsn. There can be total of four different relational operations available to use in that function:
·         largest less than
·         largest less than or equal
·         smallest greater than
·         smallest greater than or equal
This way captured data can be queried very easily and query based on time interval.

Automatic Clean Up Process

If we track every change of all the  data in our database, there is very good chance that we will outgrow the hard drive of main server. This will also lead to issues with maintenance and input/output buffer issues.
In CDC this there is automatic cleanup process that runs at regular intervals. By default the interval is of 3 days but it can be configured. We have observed that, when we enable CDC on the database, there is one additional system stored procedure created with the  name sys.sp_cdc_cleanup_change_table which cleans up all the tracked data at interval.


For years, programmers have tried to create systems that record all the changes made to the data in a database application.  At last, with SQL Server 2008, we have a robust way, CDC, that comes ‘out of the box’ to deliver this functionality in a standard way. This should be useful for auditing databases and for tracking obscure problems that require you to know exactly when and where a change to a base table was made.

Reference : Above article originally written by Pinal Dave(

Popular Posts