XML Data into a Relation Table Using OPENXML


Suppose we have an xml and we want to extract data from this xml and insert into a table. Or we can say that we want to get the xml data into tabular format. There are several ways of achieving this. In this article, I am discussing a method using OPENXML.

Suppose the xml we are going to convert into table is as follows:

declare @xml_data as xml=

'<ProductList>

  <Product>

    <intl_product_id>1</intl_product_id>

    <intl_product_name>A D T</intl_product_name>

    <entered_on>2008-05-20T00:00:00</entered_on>

  </Product>

  <Product>

    <intl_product_id>2</intl_product_id>

    <intl_product_name>A FERIN HOT</intl_product_name>

    <entered_on>2008-05-20T00:00:00</entered_on>

  </Product>

  <Product>

    <intl_product_id>3</intl_product_id>

    <intl_product_name>A NOX</intl_product_name>

    <entered_on>2008-05-20T00:00:00</entered_on>

  </Product>

  <Product>

    <intl_product_id>4</intl_product_id>

    <intl_product_name>A T 10</intl_product_name>

    <entered_on>2008-05-20T00:00:00</entered_on>

  </Product>

  <Product>

    <intl_product_id>5</intl_product_id>

    <intl_product_name>A.S.A</intl_product_name>

    <entered_on>2008-05-20T00:00:00</entered_on>

  </Product>

</ProductList>'

Now we need to open a xml document handler. This can be done using the built in stored procedure named sp_xml_preparedocument. sp_xml_preparedocument returns an integer handler which will be used to extract data from the xml.

DECLARE @iDoc int

EXEC sp_xml_preparedocument @iDoc OUTPUT, @xml_data

 

First we declare an integer type variable and then we call sp_xml_preparedocument

Then we can use OPENXML to convert this xml into a tabular form.

SELECT *

FROM OPENXML(@iDoc, '//Product', 2)

WITH (

            [intl_product_id] int,

            [intl_product_name] varchar(max),

            [entered_on] datetime

       )

In this case we are pointing to the Product node directly by using //Product

Inside WITH (), we can specify whatever data we want to be converted into tabular format. Here, we are taking all the three informations for each of the product. If we don’t need all the data, we can just specify the node names we need to take. So the final output from the query looks like:

 

intl_product_id

intl_product_name

entered_on

1

A D T

2008-05-20 00:00:00.000

2

A FERIN HOT

2008-05-20 00:00:00.000

3

A NOX

2008-05-20 00:00:00.000

4

A T 10

2008-05-20 00:00:00.000

5

A.S.A

2008-05-20 00:00:00.000

After getting the data from the xml we will need to remove the xml document handler. SQL Server does not remove this handler automatically. So we have to ensure that the handler is removed. There is a built in stored procedure again, named sp_xml_removedocument. So we will need to call this with the integer hanlder we have.

EXEC sp_xml_removedocument @iDoc  

So the full code looks like this:

 

DECLARE @iDoc int

 

EXEC sp_xml_preparedocument @iDoc OUTPUT, @xml_data

 

SELECT *

FROM OPENXML(@iDoc, '//Product', 2)

WITH (

            [intl_product_id] int,

            [intl_product_name] varchar(max),

            [entered_on] datetime

       )

 

EXEC sp_xml_removedocument @iDoc

 

WORKING WITH SOME VARIATIONS:

Now suppose we don’t want the column names to be exactly same as the xml nodes. We want a different column name. We can achieve this as follows:

SELECT *

FROM OPENXML(@iDoc, '//Product', 2)

WITH (

            [product_id] int 'intl_product_id',

            [product_name] varchar(max) 'intl_product_name',

            [entered_on] datetime

       )

Here we are specifying a different column name and we are specifying the corresponding xml node from where the data will come after the data type.

 

Now we can play with different kind of xml. Suppose we have some data in an attribute of a xml node, not simply as a node. Below we show only one product node to describe this.

<Product entered_on="2008-05-20T00:00:00">

    <intl_product_id>5</intl_product_id>

    <intl_product_name>A.S.A</intl_product_name>

</Product>

How can we get the entered_on data which is not in a node but is an attribute of the Product node? We can just mention that entered_on is an attribute, not a node using the following query.

SELECT *

FROM OPENXML(@iDoc, '//Product', 2)

WITH (

            [product_id] int 'intl_product_id',

            [product_name] varchar(max) 'intl_product_name',

            [entered_on] datetime '@entered_on'

       )

So, by giving “@” we can mention that this is not coming from a node, it is an attribute data.