Monday, December 5, 2011

Concept behind SQL Query execution

Summary :

When you submit a query to a Sql Server DB then a number of process on the server go to work on that query.
- Some process occure in the Relational Engine
- Some process occure in the Storage Engine

In the relational engine the query is parsed and then processed by the query optimizer, which genarate an execution plan. The plan is sent(in binary formate) to the storage engine, which it then uses to retrieve or update the underlying data.

The process in storage engine such as locking, index maintenence and transection occure.

Note : The execution plan is created in the relational engine.

Details :

When you pass a T-Sql statement to the Sql Server system then the first palce it goes to is the relational engine. As the T-Sql arrives then it passes througn a process that check the T-Sql syntax. This process is known as Query Parsing. The output of this process is a Parse Tree.

Then, this parse tree is passed to a process called the Algebrizer. The algebrizer resolves all the names of the various objects, tables and columns, reffered to within the query string. The algebrizer indentify (at the individual level) all the types (varchar(50) vs nvarchar(25) and so on) of the object being occure. It also determine the location of the aggregate function within the query, a process called Aggregation Binding. The algebrizer process is important because the quey may have the alias or synonms name that does not exist in the DB, that need to be resolved. The output of the algebrizer is a binary called Query Processor Tree which is then passed to the Query Optimizer.

Then using the query processor tree (o/p of algebrizer) and the statistics it has about the data and applying the model The Query Optimizer generate an execution plan for query. The optimizer will generate and evalute many plans and will choose the lowest cost plan.

Once the optimizer arrives at an execution plan, the actual plan is created and stored in a memory space known as Plan Cache. - Unless an identical plan is already exist in the cache. As the optimizer generates potential plan, it compare them to previously generated plans in the cache. If it find a match the it will use that plan.

After the plan is finilized the plan is passed to the Storage engine and in the storage engine, many process like locking, indexes mentainence and tranction are performed and finally the query is executed accoring to the plan And result is displayed.

Reuse of Execution Plan

It is expensive for server to generate the execution plan so sql server will keep and reuse plans whenever possible.

When a query is submitted to the server then an estimated plan is created by optimizer. Once that plan is created and before it get passed to the storage engine, the optimizer compare this estimated plans to actual plans that already exist in the plan cache. If an actual plan is found that matches the estimated one then the optimizer will reuse the existing plan, since it is already been used before by the query engine. This reuse of plan, avoids the overhead of creating actual plan for large and complex queries.

Each plan is stored once. Execution plans are not kept in memory these are stored in Cache (Plan Cache).

The lazywriter process, an internal process that work to free all types of cache (including plan cache using AgeFormula), periodically scans the objects in the cache and decrease this value by one each time.

SQL Injection

Sql injection is an attack in which malicious code is inserted into string that are later passd to the Sql server for parsing & execution.

EX :
var Shipcity = Request.Form("ShipCity")
var SQL = 'select * from Order where Shipcity =''"+Shipcity+"''

If the user is promted to enter a cityname(like Delhi) then the qry will be

[ select * from Order where Shipcity = 'Delhi' ]

assume that an expert user enter the cityname as
(Delhi';drop table order--)

In this case your qry will be:

[ select * from Order where Shipcity = 'Delhi';drop table order--]

Note : -- is comment in sql server

And when this query will be executed then it will first select the row based in passed CityName and then drop your table form the database.

This type of problem is called SQL Injection:

TC of it by :
1 - Remove all special chars from user input.
2 - Always use SP instead of direct QRY.
3 - Avoid to take the filter condition data in txtbox use dropdown as possible as.

Difference between * & *

[ Where & Having ]

1 - Where clause can be used with select, update and delete statement but having can be used only with select statement.

2 - Having can be used only with Group By clause but where clause is applied to each row before they are part of Group By function in query.

3 - Aggregate functions can be used only with having clause not with where clause.

[ Truncate & Delete ]

1 - Delete is DML command and Truncate is a DDL command.

2 - Where clause can be used with the Delete command but can not be used with Truncate command.

3 - Identity is reset in case of Truncate but does not reset in the case of Delete.

4 - Delete command maintain the log in row by row manner but Truncate maintain the only one log for the datapage.

5 - Truncate is faster than Delete because of delete maintain the log in row basis.

6 - Truncate will not return the number of row deleted but Delete return the number of row deleted.

[Note : Both Delete & Truncate can be rollbacked.]

[ SP & Function(UDF) ]

1 - UDF is just like the parameter view and can be used any where as views but SP can not be used any where.

2 - You can not implement exception handlind in UDF but in SP you can implement.

3 - You can not be used transection in UDF but you can be used transection in SP.

4 - You can not use SP within the UDF but you can use the UDF within SP.

5 - You can use only select command in UDF but in SP you can also use the insert, update, delete command.

6 - Function always return the value but in SP is not necessary.

[ Normalization & De-Normalization ]

Normalization concept say that you should remove the redundancy of data
to better organize the data in database that is take less space in storage
But De-Normalization concept say that you should maintain the redundent data for faster search means minimum relation between data because relationship slow your query performance.

[Note : Normalization is used in OLTP system(RDBMS) and De-Normalization is uesd in OLAP system (DWH). OLTP use the Normal Form from 1 > 2 > 3 but OLAP use the Normal Form from 3 > 2 > 1].

[ Clustered & Non-Clustered Index ]

Index are created on the table for faster searching the data when you have large amount of data in your table. And Data is stored in B-Tree (Balanced Tree) formate

1 - In case clustred index the data is physically sorted and in case of non-clustered the data is logically sorted.

2 - In case clustred index the leaf node of index has the actual data but in case of non-clustered the leaf node has the index of data.

3 - Clustered index is only one in a table but many non-clustered may be in a table.(Reasone is : the leaf node of clustered index is data page which is physically sorted and data page can only one so a table can have only one clusterd index and many non-clustered index because it has the reference of data and reference may be any number of count.)

[ varchar & nvarchar ]

1 - nvarchar datatype is used to store the unicode data (multilanguage data : japanies, chainees etc) and varchar is used to store non-unicode data.

2 - varchar use 1 byte per char and nvarchar use 2 byte per char.(nvarchar consume double space than varchar)

[ charindex & patindex ]

Both are used to find the starting pasition og the given string in expression.

1 - Wildcard charectors can not be used in charindex but you can use wildcard charectors in patindex.

2 - % sign is needed on the front & back of the pattern in case of patindex.

3 - You can not define any pattern in case of charindex but you can define the pattern in patindex.

[ Cast & Convert ]

The goal of both are same but you should use the Cast() because this ANSI-92 syntax and Convert is used when you need a specific style of your data.
So moslty Convert() is used in case of DataTime convertion where you need
a specific style of your DataTime.

[ Replce & Stuff ]

Replace function is used to replace the all occurance a string into expression by give new string.


EX : select Replace('MS SQL Server','S','M')
EX : select Replace('MS SQL Server','SQL','NewSQL')

If you want to replace the first occurance of a string, then Replace will not work, since it always replaces all the occurrences of the string. But Stuff would, since it only replaces the string it finds at the starting location we tell it for the number of chars we want it to replace.


EX : select stuff('Hi Manish',4, 6,'Mukesh')

[ Inline Table Valued Function & Multi-Statement Table Valued Function ]

Both are User defined function and return a table.

1 - In Inline table valued function, we need not to define the return table defination but in Multivalued table valued function, you need to defined the return table defination.

2 - In Inline table valued function, You need not to write the Begin and End Statement but in Multivalued table valued function, you need to write.

3 - In Inline table valued function, you can write only one select statement nothing else but in Multivalued table valued function, you can write any business logic and any number of select statement.

[ Temp Table & Table Varible ]

1 - Temp table is initially is stored in TempDB but Table variable is first stored in memory and after a specifc size it is also stored in TempDB.

2 - Transection logs are not recorded for table variable but are mentained for Temp table.

3 - Any procedure with Temp table can not be precomplied while an execution plan of procedure with table variable can be complied in advanced.

4 - Table variable exist only in the same scope as variable.

5 - Table variable can not have the non-clustered index but Temp table can have.

6 - You can not create the constraint in the table varible while you can create in Temp table.

7 - You can not create the default value on table variable column while you can create on Temp table column.

[ Table Scan & Index Scan & Index Seek ]

When you pass a SQL query to SQL Server then an execution plan is created for the query and data is search by either Table Scan or Index Scan or Index Seek.

Table Scan is performed when your table does not have any index or Clustered index.

Index Scan is performed when your table have the indexes but index is not used to search the data.

Index Seek is performed when data is searched by the indexes.

Concept of Open Query & Covering Index & Index View

Open Query

Open Query is mechanism in Sql Server. Using the Open Query you can write any sql query on the Link Server Database object.

Link Server : It is a technique by which you can add any database server in your Sql Sever instance.

Suppose you are working on a Sql Server(S1) which is in IST and you have a Sql Server(S2) which is in PST. And you need to write a Sql query in S1 server based on S2 server date, then how it can be possible?

It is possible by Link Server & Open Query.

First you need to add S2 server as a link server(LS_S2) in S1 and now you can get the date of S2 server as follow.

select S2Date from OpenQuery('LS_S2','select getdate() as S2Date')

Covering Index

Covering Index is not seperate kind of Index but it is a technique that is used to improve the performance. And you can say it is a non-clustered index which covers all the field given in your Select statement.

The use of Covering Index is best (and only should be used) in that case when you have a simple query and you need to run this query again & again because of Covering Index store the data at both level : heap & index page so Index Covering give the result fast.

Indexed View

If your view has the clustered index that means your view is a Indexed view.

Index view is a performance optomization tips.

Suppose you are not satisfied with your query response then you can create a view for that query and also you can create the index on that view as like regular table.
Indexed views consume disk space and involve some maintenance overhead (every time underlying tables change, the indexed view also has to change), but they usually provide a good boost in performance.

Performance tunning of Stored Procedure : Tips to write Better Code

You can optimize your SP as:

O - First make sure that the statistics on all tables in your query plan are up to date if not then first update the statistics of all the tables(Using update statistics command) which are used in SP. And look at the query plan first. It will show you the optimal current execution plan from the query engine's point of view. Find the most expensive part of the plan and start optimizng from here.

O - If you see the table scan in that part that means the table used in that part have not only indexes but also it dont have clustered index, so first create the clustered index on that table and run again the SP and check the SP is working fine or not according to time.

O - If you see the clustered index scan in that part that means the table used in that part have the indexes but the indexes are not using by the execution plan to search the data. In that case you need to replace the indexes position that means indexes are not created on the right columns. So when you create the indexes on right column then the index seek is performed to search the data in table.

So finally you need to create the indexes in such a way that the query execution plan always perform the Index seek to search the data in table.

Things to be remembered while writting a SQL code :

1 - Always use set nocount on.

2 - Avoid the use of Distinct keyword and use Group By.

3 - Avoide the use of Order By clause.

4 - Avoid the use of IN keyword use Exists or inner join.

5 - Avoid the use of NOT IN keyword use NOT Exists.

6 - Dont use the cursor use while loop with the help of identity column.

7 - Select only required field in query not use select *.

8 - In case of dynamic query always use sp_executesql instead of execute.

9 - Avoid use of temp table use table variable or CTE.

10 - Do not use the SP_ prefix with stored procedure name.

11 - Avoid any operation on the fields where possible. Some operations will prevent the use of index on this field even if it exist.
EX: where cast(dtfield as varchar(20)) = @DTval

Paging in SQL Server (2008)

declare @customers table(custid int,cusname varchar(50))

insert into @customers(custid,cusname)

@PageSize int, @PageNumber int

set @PageSize = 5
--set @PageNumber = 1
set @PageNumber = 2

Declare @RowStart int
Declare @RowEnd int

if @PageNumber > 0

SET @PageNumber = @PageNumber -1

SET @RowStart = @PageSize * @PageNumber + 1
SET @RowEnd = @RowStart + @PageSize - 1


;with cust as
select custid, cusname, row_number() over (order by custid) RowNumber
from @customers
select custid, cusname from cust where RowNumber >= @RowStart and RowNumber <= @RowEnd

Using the New THROW Keyword in SQL Server.

Introduction :
When SQL Server 2005 introduced the TRY...CATCH blocks, SQL Server developers were finally able to use structured exception handling to their code. However, passing error information back to the application from the CATCH block using a RAISERROR statement is not as simple as it seems. To ensure that the correct information is passed back to the caller, you have to add your error information to the sys.messages table. That's easy enough to do, but if your application moves onto another server, there's a risk that you'll forget to add the error to sys.messages on the new server and your application may not function correctly.

SQL Server "Denali" introduces the THROW keyword, which extends the exception handling functionality of SQL Server 2008. It eliminates the issue of having to define errors in sys.messages and also enables you to handle an error and then rethrow that error back to the calling code.

So let's look at some examples of how you can use THROW in your Transact-SQL code. If you are implementing search functionality, you might want to customize the information returned when a SELECT statement finds no records. In this situation, you can simply use the THROW statement to return your own error message to the calling application.

SELECT * FROM Person.Person

WHERE LastName = 'Smyth'


THROW 50001, 'No results found.', 1

You can see in this example that I've used an error number greater than 50000. This is a requirement of THROW to avoid replicating any of the system error numbers. I've specified a meaningful message to be returned to the calling application that can be used directly in the calling code and the state parameter, which you can use to locate where the error originated.

The output when you run this code will be:

(0 row(s) affected)

Msg 50001, Level 16, State 1, Line 7

No results found.

You can also use THROW in a TRY ... CATCH block. This enables you to execute Transact-SQL code such as rolling back a transaction or logging information before passing the error back to the calling application.


DECLARE @BusEntityID int;

SET @BusEntityID = (SELECT MAX(BusinessEntityID) FROM Person.BusinessEntity)

INSERT Person.Person(BusinessEntityID, PersonType, NameStyle, FirstName, LastName)

VALUES(@BusEntityID, 'EM', 0, 'Lin', 'Joyner')



-- Handle the error.

-- Log the error in the SQL Server application log.



By using the THROW statement on its own, you will simply rethrow the existing error that has occurred. This is useful when the SQL Server error message is meaningful to the user or application. If you run this code with a non-unique value for the BusinessEntityID field, you will see the following error:

(0 row(s) affected)

Msg 2627, Level 14, State 1, Line 5

Violation of PRIMARY KEY constraint 'PK_Person_BusinessEntityID'. Cannot insert duplicate key in object 'Person.Person'. The duplicate key value is (20780).

You can use the parameters of THROW to make the error information more meaningful to the calling application or user as shown in the following example.


DECLARE @BusEntityID int;

SET @BusEntityID = (SELECT MAX(BusinessEntityID) FROM Person.BusinessEntity)

INSERT Person.Person(BusinessEntityID, PersonType, NameStyle, FirstName, LastName)

VALUES(@BusEntityID, 'EM', 0, 'Lin', 'Joyner')



-- Handle the error.

-- Log the error in the SQL Server application log.

THROW 51111, 'Invalid business entity id.', 2


In this case, the output when you violate the primary key will be as follows:

(0 row(s) affected)

Msg 51111, Level 16, State 2, Line 9

Invalid business entity id.

You'll notice that the severity level when you throw your own error is defaulting to 16. You cannot define a severity level for custom errors, however when you rethrow a SQL Server error, the original severity level is retained. You can only rethrow errors from inside a CATCH block, if you are outside of the CATCH block, you must specify the parameters.

So, as you can see, the new THROW statement in SQL Server "Denali" extends the structured exception handling capabilities of earlier versions of SQL Server.

SQL Server 2008 - The Power of Merge

In Part IV of my Sales Order Workshop, I had presented a stored procedure which saves a sales order information into Order Header and Order Detail tables. If you have ever worked on an order processing application, you would realize that saving a modified sales order is little tricky. There may be new rows added to the sales order. There may be rows which are updated and there may be rows that should be deleted. If you have the freedom to delete all the rows from order details table and re-insert everything, you are lucky. But many of the times you cannot simply delete the order details table because there may be additional information like Quantity-picked etc, which is updated from other parts of the application. In those scenarios, you need to perform a DELETE-UPDATE-INSERT operation to save the information correctly. The following is the pseudo code that I had used in my stored procedure.
1 /*
2 Pseudo code used for saving sales order information with SQL Server 2005
4 -- save order header information
5 If OrderNumber found in OrderHeader
6 Update the information
7 Else
8 Insert the information
9 end
11 -- save order detail information
12 Delete from Order detail table all items not in the order info
13 Update Order detail for all items present in the order info
14 Insert into order details all new items in the order info
15 */
SQL Server 2008 introduces MERGE, a new keyword which performs INSERT, UPDATE and DELETE operations at one go. With SQL Server 2008, you can perform the above operation as simple as the following pseudo code.
1 /*
2 Pseudo code for saving the same order with the MERGE statement of SQL Server 2008
4 -- save order header information
5 MERGE order info to Order Header table
7 -- save order detail information
8 MERGE order info to order detail table
9 */
No, I did not miss anything. You can write the code in just 2 lines. The rest of this article presents a SQL Server 2008 stored procedure which demonstrates this.
The Data
Here is the structure of the order data that we have. This XML has the order header and detail information. Our stored procedure needs to store the information in both tables. Some rows need to be updated, some inserted and some deleted.
1 <OrderInfo>
2 <OrderHeader OrderNumber="20070101" CustomerNumber="J0001" OrderDate="2007-07-08" />
3 <ItemInfo>
4 <Item ItemNumber="A001" Qty="10" Rate="100" />
5 <Item ItemNumber="A002" Qty="11" Rate="200" />
6 <Item ItemNumber="A003" Qty="12" Rate="300" />
7 <Item ItemNumber="A004" Qty="13" Rate="400" />
8 <Item ItemNumber="A005" Qty="14" Rate="500" />
9 </ItemInfo>
10 </OrderInfo>
We need two tables to store the order information. Here is the script to create the tables.
1 CREATE TABLE [dbo].[OrderHeader](
2 [OrderNumber] [varchar](20) NULL,
3 [CustomerNumber] [varchar](20) NULL,
4 [OrderDate] [datetime] NULL
7 GO
9 CREATE TABLE [dbo].[OrderDetails](
10 [OrderNumber] [varchar](20) NULL,
11 [ItemNumber] [varchar](20) NULL,
12 [Qty] [int] NULL,
13 [Rate] [money] NULL
16 GO
Enter the Dragon
Let us see the Stored Procedure which uses the MERGE keyword.
1 CREATE PROCEDURE [dbo].[MergeSalesOrder]
2 (
3 @OrderInfo XML
4 )
5 AS
9 /*
10 I am not using a TRY..CATCH or BEGIN TRAN to simplify the code.
11 */
13 /*
14 Code to save order header. I am creating a CTE over the XML data to simplify
15 the code.
16 */
18 ;WITH OrderInfo AS (
20 x.h.value('@OrderNumber', 'VARCHAR(20)') AS OrderNumber,
21 x.h.value('@CustomerNumber', 'VARCHAR(20)') AS CustomerNumber,
22 x.h.value('@OrderDate', 'VARCHAR(20)') AS OrderDate
23 FROM @OrderInfo.nodes('/OrderInfo/OrderHeader') AS x(h)
24 )
25 MERGE OrderHeader AS h
26 USING OrderInfo AS o
27 ON (h.OrderNumber = o.OrderNumber)
29 UPDATE SET h.CustomerNumber = o.CustomerNumber, h.OrderDate = o.OrderDate
31 INSERT (OrderNumber, CustomerNumber, OrderDate)
32 VALUES (o.OrderNumber, o.CustomerNumber, o.OrderDate)
33 ;
35 /*
36 Save Order Detail Information
37 */
39 ;WITH OrderInfo AS (
41 x.h.value('(../../OrderHeader/@OrderNumber)[1]', 'VARCHAR(20)') AS OrderNumber,
42 x.h.value('@ItemNumber', 'VARCHAR(20)') AS ItemNumber,
43 x.h.value('@Qty', 'INT') AS Qty,
44 x.h.value('@Rate', 'MONEY') AS Rate
45 FROM @OrderInfo.nodes('/OrderInfo/ItemInfo/Item') AS x(h)
46 )
47 MERGE OrderDetails AS d
48 USING OrderInfo AS o
49 ON (d.OrderNumber = o.OrderNumber AND d.ItemNumber = o.ItemNumber)
52 d.ItemNumber = o.ItemNumber,
53 d.Qty = o.Qty,
54 d.Rate = o.Rate
56 INSERT (OrderNumber, ItemNumber, Qty, Rate)
57 VALUES (o.OrderNumber, o.ItemNumber, o.Qty, o.Rate)
60 ;
62 /*
63 Points to note:
64 1. MERGE statement should be terminated with a semi colon
65 2. The JOIN (USING...ON) should not result in duplicate records.
66 3. When the records in the SOURCE and TARGET matches, MATCHED becomes true
67 4. When the record is not in the TARGET, NOT MATCHED becomes true
68 5. When the record is not in the SOURCE, SOURCE NOT MATCHED becomes true.
69 */
Execute the code
It is time to execute the code. Use the following code to execute the stored procedure.
1 EXECUTE [MergeSalesOrder] '
2 <OrderInfo>
3 <OrderHeader OrderNumber="20070101" CustomerNumber="J0001" OrderDate="2007-07-08" />
4 <ItemInfo>
5 <Item ItemNumber="A001" Qty="10" Rate="100" />
6 <Item ItemNumber="A002" Qty="11" Rate="200" />
7 <Item ItemNumber="A003" Qty="12" Rate="300" />
8 <Item ItemNumber="A004" Qty="13" Rate="400" />
9 <Item ItemNumber="A005" Qty="14" Rate="500" />
10 </ItemInfo>
11 </OrderInfo>
12 '
14 /*
15 Let us check the results
16 */
17 SELECT * FROM OrderHeader
18 SELECT * FROM OrderDetails
20 /*
21 OrderNumber CustomerNumber OrderDate
22 -------------------- -------------------- -----------------------
23 20070101 J0001 2007-07-08 00:00:00.000
25 (1 row(s) affected)
27 OrderNumber ItemNumber Qty Rate
28 -------------------- -------------------- ----------- ---------------------
29 20070101 A001 10 100.00
30 20070101 A002 11 200.00
31 20070101 A003 12 300.00
32 20070101 A004 13 400.00
33 20070101 A005 14 500.00
35 (5 row(s) affected)
36 */
The above code shows that the order is saved correctly. Now lets us modify the order info. Let us delete a row, add a new row and modify an existing row. Here is the code. Note that Item A005 is deleted. Item A006 is added and item A001 is modified. Let us execute the code and see the results.
1 EXECUTE [MergeSalesOrder] '
2 <OrderInfo>
3 <OrderHeader OrderNumber="20070101" CustomerNumber="J0001" OrderDate="2007-07-08" />
4 <ItemInfo>
5 <Item ItemNumber="A001" Qty="15" Rate="150" />
6 <Item ItemNumber="A002" Qty="11" Rate="200" />
7 <Item ItemNumber="A003" Qty="12" Rate="300" />
8 <Item ItemNumber="A004" Qty="13" Rate="400" />
9 <Item ItemNumber="A006" Qty="16" Rate="600" />
10 </ItemInfo>
11 </OrderInfo>
12 '
14 SELECT * FROM OrderHeader
15 SELECT * FROM OrderDetails
17 /*
20 OrderNumber CustomerNumber OrderDate
21 -------------------- -------------------- -----------------------
22 20070101 J0001 2007-07-08 00:00:00.000
24 (1 row(s) affected)
26 OrderNumber ItemNumber Qty Rate
27 -------------------- -------------------- ----------- ---------------------
28 20070101 A001 15 150.00
29 20070101 A002 11 200.00
30 20070101 A003 12 300.00
31 20070101 A004 13 400.00
32 20070101 A006 16 600.00
34 (5 row(s) affected)
36 */
I found the MERGE keyword very powerful and friendly. It reduces the complexity of the code and provides a simple interface to perform a complex operation. I like it and I suppose many of you around there would like it too.

Popular Posts