Wednesday, December 14, 2011

Making Sense of the XML DataType in SQL Server

In SQL Server 2005, XML becomes a first-class data type. Developers can make minor remote modifications to stored XML documents easily, taking advantage of new support for XML schema-based strong typing, and server-based XML data validation. by Shawn Wildermuth

As database developers, many of us have had to dip our feet into the wide ocean of XML.

It should come as good news that in SQL Server 2005, you can store XML in the database with a new XML datatype. Although this is good news, many developers have been storing XML in the database for some time now. Even without implicit support for XML, developers have been shoving XML documents into text fields since XML's inception.

SQL Server 2000 included some XML features out of the box. Key among these features was the ability to return results as XML using the FOR XML clause. SQL Server 2005's functionality is markedly different. In SQL Server 2005, XML is a genuine data type, which means that you can use XML as a column in tables and views, in T-SQL statements, or as parameters of stored procedures. You can now store, query, and manage XML documents directly in the database.

More importantly, you can also now specify the schema to which your XML must conform. Aside from providing a mechanism to validate your XML in the database, this also allows you to describe complex types of data to be stored and to have an engine that enforces those rules.

Using the XML Datatype
The XML datatype is not substantially different than any other datatype in SQL Server. It can be used in any place you would ordinarily use any SQL datatype. For example, the following creates an XML variable and fills it with a XML:


DECLARE @doc xml
SELECT @doc = '<Team name="Braves" />'

Although literal XML is useful, you can also fill an XML variable using a query and the SQL Server's FOR XML syntax:


SELECT @doc =
(SELECT * FROM Person.Contact FOR XML AUTO)

The XML datatype is not limited to use as a variable. You can also use the XML data type in table columns. You can assign default values and the NOT NULL constraint is supported:


CREATE TABLE Team
(
TeamID int identity not null,
TeamDoc xml DEFAULT '<Team />' NOT NULL
)

SQL Server 2005's XML functionality is markedly different from that in SQL Server 2000.

Inserting XML data into tables is just a matter of specifying the XML to add in the form of a string:


-- Insert a couple of records
INSERT INTO Team (TeamDoc)
VALUES ('
<Team name="Braves">
<Players>
<Pitcher name="John Smoltz" role="Closer"/>
</Players>
</Team>');
INSERT INTO Team (TeamDoc)
VALUES ('<Team name="Red Sox">
<Players>
<Pitcher name="Petro Martinez" role="Starter"/>
</Players>
</Team>');

When creating instances of XML in SQL Server 2005, the only conversion is from a string to XML. Similarly, going in the reverse direction, you can only convert to a string. Converting to and from text and ntext types is not allowed.

Limitations of the XML Data Type:
Although the XML datatype is treated like many other datatypes in SQL Server 2005, there are specific limitations to how it is used. These limitations are:


• XML types cannot convert to text or ntext data types.
• No data type other than one of the string types can be cast to XML.
• XML columns cannot be used in GROUP BY statements.
• Distributed partitioned views or materialized views cannot contain XML data types.
• Use of the sql_variant instances cannot include XML as a subtype.
• XML columns cannot be part of a primary or foreign key.
• XML columns cannot be designated as unique.
• Collation (COLLATE clause) cannot be used on XML columns.
• XML columns cannot participate in rules.
• The only built-in scalar functions that apply to XML columns are ISNULL and COALESCE. No other scalar built-in functions are supported for use against XML types.
• Tables can have only 32 XML columns.
• Tables with XML columns cannot have a primary key with more than 15 columns.
• Tables with XML columns cannot have a timestamp data type as part of their primary key.
• Only 128 levels of hierarchy are supported within XML stored in the database.

XML Type Methods:
Up to this point, the examples have shown the XML datatype being used as just a blob of data, but this is where the real power of the XML data type shows itself. The XML data type supports several methods that can be called using the UDT dot syntax (myXml.operation()) syntax. The supported operations are listed in Table 1.

Table 1: XML Datatype methods.

Method        Name Description


query           Performs a query against the XML and returns the results of the query

exists          Performs a query against the XML and returns the value of 1 if there was a result

value           Evaluates a query to return a simple value from the XML

modify         Executes a change operation on the XML document in place

nodes         Allows you to break apart XML into a table structure



For the following sections, you will use a table called Team that contains a row for every team name. In each row, there is a TeamDoc row that contains XML about the team:


CREATE TABLE Team
(
TeamID int identity not null,
TeamDoc xml DEFAULT '<Team />' NOT NULL
)
In the examples, assume that the following XML document exists in the Braves row of the table:


<Team name="Braves">
<Players>
<Pitcher name="John Smoltz" role="Closer"/>
<Pitcher name="Russ Ortiz" role="Starter" />
<ThirdBase name="Chipper Jones" role="Starter" bats="switch"/>
</Players>
</Team>

Query Method:
You can use XML as a column in tables and views, in T-SQL statements, or as parameters of stored procedures.

The query method allows you to specify an XQuery or XPath expression to evaluate. The result of the query method is an XML data type object. The specific syntax of the query method is:


query(XQuery)
The first parameter is always an XQuery expression. The following example uses a query to return an XML document with information about each team's pitcher:


SELECT TeamDoc.query('/Team/Players/Pitcher')
FROM Team
This produces the following results:


----------------------------------------------
<Pitcher name="John Smoltz" role="Closer" />
<Pitcher name="Russ Ortiz" role="Starter" />
(1 row(s) affected)

The query method allows you to find and return nodes lists that match the XQuery expression you specify. The real power of the query method comes from the XQuery syntax, which is covered in detail later in this article.

Exist Method:
The exist method is similar to the query method except that it is used to determine whether a query yields any results. The syntax for the exist method is:



exist(XQuery)


When you use the exist method, it evaluates the query and returns the value of 1 if the query yields any results. For example, this query finds the rows in the team table where the TeamDoc field has starting pitchers:

-- Simple Exist clause

SELECT Count(*)

FROM Team

WHERE TeamDoc.exist(

'/Team/Players/Pitcher[@role="Starter"]') = 1



Value Method:
value(XQuery, datatype)

Use the value method when you want to get a single scalar value from the XML. You must specify the XQuery statement and the datatype you want it to return and you can return any datatype except the XML datatype. For example, if you want to get the name of the first pitcher on every team, you can write the query like this:


-- Do a Query to get an individual value
SELECT TeamDoc.value(
'(/Team/Players/Pitcher/@name)[1]',
'nvarchar(max)')
AS FirstPitcher
FROM Team
This query results in the scalar value of the first pitcher for each team returned in the result:


FirstPitcher
------------------------------
John Smoltz
(1 row(s) affected)

The difference between the query and value methods is that the query method returns an XML datatype that contains the results of the query, and the value method returns a non-XML datatype with the results of the query. The value method can return only a single (or scalar) value. You will get an error if you try to create an XQuery expression that returns more than one value using the value method.

Modify Method:
Although the XQuery standard does not provide a mechanism for updating XML, SQL Server 2005 supports a way of modifying parts of an XML object in place. This means that you do not have to retrieve an entire XML document just to make changes. To modify a document in place, you use a combination of the modify method and SQL Server 2005's new XML Data Modification Language (XML DML).

The syntax for the Modify method is:

modify(<XMLDML>)
The Modify method takes only a single parameter, the XML DML statement. XML DML is similar, but not identical, to SQL's insert, update and delete syntax. For example, you can modify the XML by using the insert DML statement:


SET @doc.modify('
insert <Pitcher name="Jaret Wright"/> as last
into (/Team/Players)[1]
')
You can do the same thing to modify an XML column by calling modify in an UPDATE statement:


-- Modify an XML doc without replacing
-- it completely!
UPDATE Team
SET TeamDoc.modify('
insert <Pitcher name="Jaret Wright"/> as last
into (/Team/Players)[1]
')
WHERE TeamDoc.exist('/Team[@name="Braves"]') = 1
Notice that the SET clause in this UPDATE statement does not follow the SET x = y pattern that you may be used to from writing SQL in the past. That syntax assumes that you will provide a complete new value to replace the old, which, in the case of XML, means a completely new document to replace the old. When using the XML type, the modify method changes the original document in place. There's no need to generate a completely new and separate document, or for SQL Server to attempt to replace an entire document with every change. The SET syntax in the example reflects the more efficient approach of updating a document in place.

There are three XML DML statements: insert, update, and delete. Not surprisingly, they are used to insert, update, and delete parts of an XML object. The syntax for each is similar to SQL, but with some definite differences.

Here is the syntax for the insert statement:


insert
InsertExpression (
{{as first | as last}
into | after | before} LocationExpression
)
Immediately following the insert statement is the XML that you want to insert (InsertExpression). Next you specify how you want the XML inserted. Your choices are into, after, or before. The before and after clauses instruct the database to insert the InsertExpression as a sibling to the LocationExpression. The use of before or after specifies whether to insert it before or after the LocationExpression:


SET @doc.modify('
insert <Pitcher role="Starter"
name="Jaret Wright"/>
before (/Team/Players/Pitcher)[1]
')
The into clause inserts the InsertExpression as a child of the LocationExpression. The optional clauses of as start and as last are used to specify the position of the insertion within the children:


-- Insertion within Team
SET @doc.modify('
insert <Pitcher role="Starter"
name="Jaret Wright"/>
into (/Team/Players)[1]
')
-- Insertion within Team, specifying it should
-- be inserted as the last element
SET @doc.modify('
insert <Pitcher role="Starter"
name="Jaret Wright"/>
as last into (/Team/Players)[1]
')
The syntax for the delete statement is very straightforward:
delete LocationExpression

The LocationExpression specifies what to delete from the XML data. For example, to delete all the Pitchers:


SET @doc.modify('delete /Team/Player/Pitcher')
Because the query specifies all pitcher elements, they will all be deleted. If you want to delete just a single element, you can specify identifying attributes. To delete just the pitcher named John Smoltz, you write the delete statement like so:


SET @doc.modify('
delete /Team/Players/Pitcher[@name="John Smoltz"]
')
You can also tell the delete statement to remove an individual attribute. For example, to delete the role attribute for the pitcher named John Smoltz, the XML DML looks like this:


SET @doc.modify('
delete /Team/Players/Pitcher[
@name="John Smoltz"]/@role')
Lastly, the replace value statement describes changes to make to the XML data. The syntax of the replace value statement is:


replace value of
OriginalExpression
with
ReplacementValue | if
The replace value statement is used to change discrete values in the XML. The only discrete values possible are the literal contents of a tag or the value of an attribute. The OriginalExpression must resolve to a single node or attribute. The ReplacementValue is usually a literal value to replace. Replacing the literal contents of a node requires the XQuery expression using the text() function to specify that you want to replace the text of a node. For example, to replace the inner text for a pitcher, you write the modify statement like this:


DECLARE @doc xml
SELECT @doc = '
<Team name="Braves">
<Players>
<Pitcher name="John Smoltz" role="Closer">
With team since 1989
</Pitcher>
</Players>
</Team>'
SET @doc.modify('
replace value of (/Team/Players/Pitcher[
@name="John Smoltz"]/text())[1]
with "May start in 2005"
')
Modifying an attribute is straightforward: you just need the XQuery expression to resolve to a single attribute. For example, to replace the value of the role attribute for the pitcher named John Smoltz with "Starter," do this:


SET @doc.modify('
replace value of (/Team/Players/Pitcher[
@name="John Smoltz"]/@role)[1]
with "Starter"
')
The replace value syntax also supports conditional replacement by using the if…then…else syntax within the with clause of the replace value statement. For example, to replace John Smoltz's role to Starter if he is a Closer, but change it to a Closer if he is not a Starter, you could write the code:


SET @doc.modify('
replace value of (/Team/Players/Pitcher[
@name="John Smoltz"]/@role)[1]
with (
if (/Team/Players/Pitcher[
@name="John Smoltz"]/@role = "Closer") then
"Starter"
else
"Closer"
)
')

Nodes Method:
The purpose of the nodes method is to allow normalizing of a set of nodes returned by a query into a set of rows in a table-like result set. The syntax of the nodes method is:

nodes (XQuery) Table(Column)

The XQuery is the expression that picks the nodes to be exposed as a result set. The Table and Column are used to specify names in the result set. Note that you can only have one column and that it is automatically of type XML. For example, to query to get each of the pitchers, write the code like this:


DECLARE @doc xml
SELECT @doc = '
<Team name="Braves">
<Players>
<Pitcher name="John Smoltz" role="Closer">
With team since 1989
</Pitcher>
</Players>
</Team>'
SELECT Team.player.query('.') as Pitcher
FROM @doc.nodes('/Team/Players/Pitcher')
Team(player)
This results in a single result set containing rows for each of the Pitchers' elements:


Pitcher
--------------------------------------------
<Pitcher name="John Smoltz" role="Closer" />
<Pitcher name="Russ Ortiz" role="Starter" />
(2 row(s) affected)
Notice that you used the query method to return these nodes in the result. The reason for this is the results of a nodes method may only be referred to by the XML methods (query, modify, delete, and update) or IS NULL and IS NOT NULL statements.

Gone are the days of needing to pull the entire XML document out of the database as a string, parsing it, making changes, and replacing the entire document.

More ordinarily, you may use the nodes method to break apart XML into a more useful result. For instance, you could get the players' nodes by using the nodes method, and then retrieve them with the value method to get the individual values as scalar data:


SELECT Team.player.value(
'./@name', 'nvarchar(10)') as Name,
Team.player.value('./@role', 'nvarchar(10)') as
PlayerRole
FROM @doc.nodes('/Team/Players/Pitcher')
Team(player)
This results in the following result set:


Name PlayerRole
--------------- ---------------
John Smoltz Closer
Russ Ortiz Starter
(2 row(s) affected)

Count Of Nodes:
SELECT @xmlTarget.value('count(/Team/Players/Pitcher)', 'int')
Cheers
Happy Querying

Popular Posts