Overblog Follow this blog
Administration Create my blog
Balavardhan Reddy Narani

FOR XML - SQL Server

June 9 2010 , Written by Balavardhan Reddy Published on #SQL Server2005

Introduction

In Microsoft SQL Server 2000, we introduced the FOR XML clause to the SELECT statement. This clause provides the ability to aggregate the relational rowset returned by the SELECT statement into XML. FOR XML on the server supports three modes—RAW, AUTO, and EXPLICIT—that provide different transformation semantics.


RAW
mode generates single elements, which are named row, for each row returned.


The AUTO mode employs a heuristic to infer a simple, one element name-per-level hierarchy based on the lineage information and the order of the data in a SELECT statement.


Finally, the EXPLICIT mode requires a specific rowset format that can be mapped into almost any XML shape, while still being formulated by a single SQL query.

The EXPLICIT mode format is highly successful in achieving its goals. There are only a few things it cannot do (arbitrary recursive part-list trees are one of them). The preferred SQL expression to generate the rowset format is a formidable "query from hell," however.

Unfortunately, FOR XML results in SQL Server 2000 can only be consumed on the client side, and authoring complex XML structures using FOR XML EXPLICIT is a complex exercise.

 

In SQL Server 2005, we now have alternatives to the "query from hell." In the following, we will take a look at the main features added to FOR XML in SQL Server 2005: Integration with the XML data type, assignability and nesting of expressions, and the new PATH mode.

Integration with the XML Data Type

With the introduction of the XML data type, we wanted to also give FOR XML the ability to generate an instance of XML directly (more precisely, it generates a single row, single column rowset where the cell contains the XML data type instance).

 
SELECT * FROM Employee FOR XML AUTO, TYPE


Returns the Employee elements as an XML data type instance, instead of the nvarchar(max) instance that would have been the case without the TYPE directive.

This result is guaranteed to conform to the well-formedness constraints provided by the XML data type. Since the result is an XML data type instance, you can also use XQuery expressions to query and reshape the result. For example, the following expression retrieves the Employee Name into a new Person element.

 
SELECT (SELECT * FROM Employee FOR XML AUTO, TYPE).query(
 '<doc>{ 
  for $c in /Employee 
  return  
  <Person name="{data($c/@employeename)}"/> 
  }</doc>') 

returns (only first elements shown),

 <doc> 
  <Person name="Maria Anders" /> 
  <Person name="Ana Trujillo" /> 
  <Person name="Antonio Moreno" /> 
  ... 
 </doc> 

Assigning FOR XML Results

The result of a FOR XML query can be assigned to a variable, or inserted into a column.

 
DECLARE @cust XML;
 SET @cust = (SELECT * FROM Employee FOR XML AUTO, TYPE) 
 CREATE TABLE T(i int, x XML) 
 INSERT INTO T SELECT 1, (SELECT * FROM Employee FOR XML AUTO, TYPE) 
   

Nesting of FOR XML Expressions

FOR XML, in SQL Server 2005, recognizes XML data type columns, and will inline them as sub-elements. Thus, we can nest FOR XML queries to generate hierarchies, instead of having to rely on the AUTO mode heuristic, or writing an EXPLICIT mode query.

Let's look at the example.

 
SELECT 1 as TAG,
  NULL as Parent, 
  CustomerID as "Customer!1!CustomerID", 
  NULL as "Order!2!OrderID", 
  NULL as "Employee!3!LastName" 
 FROM Customers 
 UNION ALL 
 SELECT 2, 
  1, 
  Customers.CustomerID, 
  Orders.OrderID, 
  NULL 
 FROM Orders  
 JOIN Customers ON Orders.CustomerID = Customers.CustomerID 
 UNION ALL 
 SELECT DISTINCT 3, 
  1, 
  Customers.CustomerID, 
  NULL, 
  Employees.LastName 
 FROM Customers  
 JOIN Orders ON Customers.CustomerID = Orders.CustomerID 
 JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID 
 ORDER BY "Customer!1!CustomerID","Employee!3!LastName","Order!2!OrderID" 
 FOR XML EXPLICIT 
   

returns (only first customer shown),

 
<Customer CustomerID="ALFKI">
  <Order OrderID="10643" /> 
  <Order OrderID="10692" /> 
  <Order OrderID="10702" /> 
  <Order OrderID="10835" /> 
  <Order OrderID="10952" /> 
  <Order OrderID="11011" /> 
  <Employee LastName="Davolio" /> 
  <Employee LastName="Leverling" /> 
  <Employee LastName="Peacock" /> 
  <Employee LastName="Suyama" /> 
 </Customer> 
 ... 

As you can see, we need a select statement for each element. We also repeat the parent's identification with every child, so that the order by groups the children with their parents. The streaming serialization from the rowset to XML depends on this grouping to get the correct nesting.

Now let's look at how we can rewrite this by using nesting FOR XML expressions. We can leverage the new TYPE directive to generate XML data type instances (otherwise, you will get a textual result that will be entitized if it is embedded in another FOR XML query) and nest sub selections to define the hierarchy. We now rewrite the previous EXPLICIT mode query using the AUTO mode and nesting:

 
SELECT CustomerID as "CustomerID",
  (SELECT OrderID as "OrderID" 
  FROM Orders "Order" 
  WHERE "Order".CustomerID = Customer.CustomerID 
  FOR XML AUTO, TYPE), 
  (SELECT DISTINCT LastName as "LastName" 
  FROM Employees Employee 
  JOIN Orders "Order" ON "Order".EmployeeID = Employee.EmployeeID 
  WHERE Customer.CustomerID = "Order".CustomerID 
  FOR XML AUTO, TYPE) 
 FROM Customers Customer 
 FOR XML AUTO, TYPE 


This returns the same result as the EXPLICIT mode query. Obviously, this query is easier to write, understand, and maintain, even though it contains basically the same number of select statements and join conditions.

The New PATH Mode

The PATH mode allows you to use an XPath-like syntax as a column name, which then is mapped into an attribute (e.g., "@a"), element (e.g., "e"), sub element structure ("e1/e2"), element content ("*"), text node ("text()"), or data value ("data()"). As with the RAW mode, the default name for the row element is row and can be overwritten with an NCName (a name without a prefix).

Let us give the PATH mode formulation of the above EXPLICIT mode query, now we know why the EXPLICIT mode is sometimes called the "query from hell."

 
SELECT CustomerID as "@CustomerID",
  (SELECT OrderID as "@OrderID" 
  FROM Orders 
  WHERE Orders.CustomerID = Customers.CustomerID 
  FOR XML PATH('Order'), TYPE), 
  (SELECT DISTINCT LastName as "@LastName" 
  FROM Employees 
  JOIN Orders ON Orders.EmployeeID = Employees.EmployeeID 
  WHERE Customers.CustomerID = Orders.CustomerID 
  FOR XML PATH('Employee'), TYPE) 
 FROM Customers 
 FOR XML PATH('Customer') 


This example is similar to the AUTO mode version and returns the same result.

 
SELECT CustomerID as "@CustomerID",
  CompanyName, 
  Address as "address/street", 
  City as "address/city", 
  Region as "address/region", 
  PostalCode as "address/zip", 
  Country as "address/country", 
  ContactName as "contact/name", 
  ContactTitle as "contact/title", 
  Phone as "contact/phone",  
  Fax as "contact/fax" 
 FROM Customers 
 FOR XML PATH('Customer'), ROOT('doc') 

This query results in the following document

 <doc> 
  <Customer CustomerID="ALFKI"> 
  <CompanyName>Alfreds Futterkiste</CompanyName> 
  <address> 
  <street>Obere Str. 57</street> 
  <city>Berlin</city> 
  <zip>12209</zip> 
  <country>Germany</country> 
  </address> 
  <contact> 
  <name>Maria Anders</name> 
  <title>Sales Representative</title> 
  <phone>030-0074321</phone> 
  <fax>030-0076545</fax> 
  </contact> 
  </Customer> 
  ... 
 </doc> 


How would this query look like if it used the EXPLICIT mode? We need four select clauses—one for each non-leaf element—instead of only one selection.

Last but not least, the following gives us an example of generating a value list, and shows the use of a text node.

 
SELECT CustomerID as "@ID",
  (SELECT OrderID as "data()" 
  FROM Orders 
  WHERE Customers.CustomerID=Orders.CustomerID 
  FOR XML PATH('') 
  ) as "@OrderIDs", 
  CompanyName, 
  ContactTitle as "ContactName/@ContactTitle", 
  ContactName as "ContactName/text()", 
  PostalCode as "Address/@ZIP", 
  Address as "Address/Street", 
  City as "Address/City" 
 FROM Customers 
 FOR XML PATH('Customer') 
   

This creates a result of the form (showing one example customer),

 <Customer ID="HUNGC" OrderIDs="10375 10394 10415 10600 10660"> 
  <CompanyName>Hungry Coyote Import Store</CompanyName> 
  <ContactName  
  ContactTitle="Sales Representative">Yoshi Latimer</ContactName> 
  <Address ZIP="97827"> 
  <Street>City Center Plaza 516 Main St.</Street> 
  <City>Elgin</City> 
  </Address> 
 </Customer> 


The sub query, which generates the OrderIDs attribute list, maps the OrderID column values as atomic values (using the path data()). These values will then be serialized as a text node by adding a space between sibling atomic values, which are provided in the next cell in the rowset. We then avoid generating a name for the row by using the zero-length string as the PATH mode argument, so that we get a single string (note, there is no TYPE directive!) as the result of the FOR XML PATH expression. That string gets mapped to the OrderIDs attribute by the containing FOR XML expression.

The CompanyName gets mapped to a sub element of the same name.

The ContactTitle produces the ContactTitle attribute of the ContactName element, while the ContactName column value is mapped into the text node of the same element. Note that in this case, the same result would have been achieved by mapping ContactName to the ContactName element directly.

Finally, the properties for the Address element parts are put together.

More FOR XML Extensions

Besides the new features listed above, SQL Server 2005 provides the following new features:

RAW mode can be combined with the ELEMENTS directive, and take a parameter to overwrite the row element name. For example,

 
SELECT * FROM Customers FOR XML RAW('Customer'), ELEMENTS

returns (only showing the first customer),

 
<Customer>
  <CustomerID>ALFKI</CustomerID> 
  <CompanyName>Alfreds Futterkiste</CompanyName> 
  <ContactName>Maria Anders</ContactName> 
  <ContactTitle>Sales Representative</ContactTitle> 
  <Address>Obere Str. 57</Address> 
  <City>Berlin</City> 
  <PostalCode>12209</PostalCode> 
  <Country>Germany</Country> 
  <Phone>030-0074321</Phone> 
  <Fax>030-0076545</Fax> 
 </Customer> 
   

The ELEMENTS directive provides an XSINIL option to map NULL values to an element with an attribute xsi:nil="true". For example,

 
SELECT * FROM Customers WHERE Region is null
 FOR XML PATH('Customer'), ELEMENTS XSINIL 

returns (only showing the first customer),

 
<Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <CustomerID>ALFKI</CustomerID> 
  <CompanyName>Alfreds Futterkiste</CompanyName> 
  <ContactName>Maria Anders</ContactName> 
  <ContactTitle>Sales Representative</ContactTitle> 
  <Address>Obere Str. 57</Address> 
  <City>Berlin</City> 
  <Region xsi:nil="true" /> 
  <PostalCode>12209</PostalCode> 
  <Country>Germany</Country> 
  <Phone>030-0074321</Phone> 
  <Fax>030-0076545</Fax> 
 </Customer> 
   

A new in-line schema inference directive XMLSCHEMA that takes a target namespace URI as an optional argument has been added for the RAW and AUTO modes. For example,

 
SELECT * FROM Customers FOR XML RAW('Customer'), XMLSCHEMA('urn:example.com')

returns (only showing parts of the schema and data),

 
<xsd:schema targetNamespace="urn:example.com"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"  
  xmlns:sqltypes= 
  "http://schemas.microsoft.com/sqlserver/2004/sqltypes"  
  elementFormDefault="qualified"> 
  <xsd:import namespace= 
  "http://schemas.microsoft.com/sqlserver/2004/sqltypes" /> 
  <xsd:element name="Customer"> 
  ... 
  </xsd:element> 
 </xsd:schema> 
 <Customer xmlns="urn:example.com" CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" ContactName="Maria Anders" ContactTitle="Sales Representative" Address="Obere Str. 57" City="Berlin" PostalCode="12209" Country="Germany" Phone="030-0074321" Fax="030-0076545" /> … 
Read more