Advantages and disadvantages of XML shredding

Learn about the advantages and disadvantages of XML shredding and different shredding methods, plus find out when shredding is and isn't a good option for your company.

DB2 Pure XML cover In this section of DB2 pureXML Cookbook, you'll learn about the advantages and disadvantages of XML shredding or decomposing and different shredding methods. You'll find out when shredding is and isn't a good option for your company and find definitions of partial shredding and hybrid XML storage.

TABLE OF CONTENTS

  • Advantages and disadvantages of XML shredding
  • How to shred XML with the XMLTABLE function
  • Shredding XML docs into relational tables with annotated XML schemas
  • Examples of single and bulk XML shredding of XML documents

Chapter 11 Converting XML to Relational Data

This chapter describes methods to convert XML documents to rows in relational tables. This conversion is commonly known as shredding or decomposing of XML documents. Given the rich support for XML columns in DB2 you might wonder in which cases it can still be useful or necessary to convert XML data to relational format. One common reason for shredding is that existing SQL applications might still require access to the data in relational format. For example, legacy applications, packaged business applications, or reporting software do not always understand XML and have fixed relational interfaces. Therefore you might sometimes find it useful to shred all or some of the data values of an incoming XML document into rows and columns of relational tables.

In this chapter you learn:

  • The advantages and disadvantages of shredding and of different shredding methods (section 11.1)
  • How to shred XML data to relational tables using INSERT statements that contain the XMLTABLE function (section 11.2)
  • How to use XML Schema annotations that map and shred XML documents to relational tables (section 11.3)

11.1 ADVANTAGES AND DISADVANTAGES OF SHREDDING

The concept of XML shredding is illustrated in Figure 11.1. In this example, XML documents with customer name, address, and phone information are mapped to two relational tables. The documents can contain multiple phone elements because there is a one-to-many relationship between customers and phones. Hence, phone numbers are shredded into a separate table. Each repeating element, such as phone, leads to an additional table in the relational target schema. Suppose the customer information can also contain multiple email addresses, multiple accounts, a list of most recent orders, multiple products per order, and other repeating items. The number of tables required in the relational target schema can increase very quickly. Shredding XML into a large number of tables can lead to a complex and unnatural fragmentation of your logical business objects that makes application development difficult and error-prone. Querying the shredded data or reassembling the original documents may require complex multiway joins.

Figure 11.1 Shredding of an XML document
Shredding an XML document

Depending on the complexity, variability, and purpose of your XML documents, shredding may or may not be a good option. Table 11.1 summarizes the pros and cons of shredding XML data to relational tables.

Table 11.1 When Shredding Is and Isn't a Good Option
 

Shredding Can Be Useful When… Shredding Is Not A Good Option When…
  • Incoming XML data is just feeding an existing relational database.
  • The XML documents do not represent logical business objects that should be preserved.
  • Your primary goal is to enable existing relational applications to access XML data.
  • You are happy with your relational schema and would like to use it as much as possible.
  • The structure of your XML data is such that it can easily be mapped to relational tables.
  • Your XML format is relatively stable and changes to it are rare.
  • You rarely need to reconstruct the shredded documents.
  • Querying or updating the data with SQL is more important than insert performance.
  • Your XML data is complex and nested, and difficult to map to a relational schema.
  • Mapping your XML format to a relational schema leads to a large number of tables.
  • Your XML Schema is highly variable or tends to change over time.
  • Your primary goal is to manage XML documents as intact business objects.
  • You frequently need to reconstruct the shredded documents or parts of them.
  • Ingesting XML data into the database at a high rate is important for your application.

In many XML application scenarios the structure and usage of the XML data does not lend itself to easy and efficient shredding. This is the reason why DB2 supports XML columns that allow you to index and query XML data without conversion. Sometimes you will find that your application requirements can be best met with partial shredding or hybrid XML storage.

  • Partial shredding means that only a subset of the elements or attributes from each incoming XML document are shredded into relational tables. This is useful if a relational application does not require all data values from each XML document. In cases where shredding each document entirely is difficult and requires a complex relational target schema, partial shredding can simplify the mapping to the relational schema significantly.
  • Hybrid XML storage means that upon insert of an XML document into an XML column, selected element or attribute values are extracted and redundantly stored in relational columns.

More on enterprise data architecture

Read about Onefinestay's move to a graph database

Learn about the big data architecture integration options

Hear William McKnight discuss relational vs. NoSQL databases

If you choose to shred XML documents, entirely or partially, DB2 provides you with a rich set of capabilities to do some or all of the following:

  • Perform custom transformations of the data values before insertion into relational columns.
  • Shred the same element or attribute value into multiple columns of the same table or different tables.
  • Shred multiple different elements or attributes into the same column of a table.
  • Specify conditions that govern when certain elements are or are not shredded. For example, shred the address of a customer document only if the country is Canada.
  • Validate XML documents with an XML Schema during shredding.
  • Store the full XML document along with the shredded data.

DB2 9 for z/OS and DB2 9.x for Linux, UNIX, and Windows support two shredding methods:

  • SQL INSERT statements that use the XMLTABLE function. This function navigates into an input document and produces one or multiple relational rows for insert into a relational table.
  • Decomposition with an annotated XML Schema. Since an XML Schema defines the structure of XML documents, annotations can be added to the schema to define how elements and attributes are mapped to relational tables.

Table 11.2 and Table 11.3 discuss the advantages and disadvantages of the XMLTABLE method and the annotated schema method.

Table 11.2 Considerations for the XMLTABLE Method
 

Advantages of the XMLTABLE Method Disadvantages of the XMLTABLE Method
  • It allows you to shred data even if you do not have an XML Schema.
  • It does not require you to understand the XML Schema language or to understand schema annotations for decomposition.
  • It is generally easier to use than annotated schemas because it is based on SQL and XPath.
  • You can use familiar XPath, XQuery, or SQL functions and expressions to extract and optionally modify the data values.
  • It often requires no or little work during XML Schema evolution.
  • The shredding process can consume data from multiple XML and relational sources, if needed, such as values from DB2 sequences or look-up data from other relational tables.
  • It can often provide better performance than annotated schema decompositions.
  • For each target table that you want to shred into you need one INSERT statement.
  • You might have to combine multiple INSERT statements in a stored procedure.
  • There is no GUI support for implementing the INSERT statements and the required XMLTABLE functions. You need to be familiar with XPath and SQL/XML.

Table 11.3 Considerations for Annotated Schema Decomposition
 

Advantages of the Annotated Schema Method Disadvantages of the Annotated Schema Method
  • The mapping from XML to relational tables can be defined using a GUI in IBM Data Studio Developer.
  • If you shred complex XML data into a large number of tables, the coding effort can be lower than with the XMLTABLE approach.
  • It offers a bulk mode with detailed diagnostics if some documents fail to shred.
  • It does not allow shredding without an XML Schema.
  • You might have to manually copy annotations when you start using a new version of your XML Schema.
  • Despite the GUI support, you need to be familiar with the XML Schema language for all but simple shredding scenarios.
  • Annotating an XML Schema can be complex, if the schema itself is complex.

Copyright info

This chapter is an excerpt from the new book DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server, by Matthias Nicola and Pav Kumar-Chatterjee.
Published by IBM Press. isbn:0138150478
Copyright 2010 by International Business Machines Corporation.
More information is available on the publisher's site as well as Safari Books Online.

More on converting XML to relational data:

Dig Deeper on Data integration