Aspose.Words

How-to: Mail Merge from XML using IMailMergeDataSource

You can download the complete source code of the XmlMailMerge sample here.

Given the widespread use and support of the XML markup language, the ability to run a mail merge from an XML file to a Word template document has become a common requirement.

This article provides a simple example of how, using Aspose.Words, you can execute mail merge from XML using a custom data source which implements the IMailMergeDataSource interface.

Solution

To achieve this, we will implement our own custom data source which reads the parsed XML sored in memory. When mail merge is executed our class is requested to return values for each of the fields in the document. The values from the XML is read and passed to the mail merge engine to be merged into the document.

We’ll use this simple XML file which contains the customer information we want to use in the mail merge.

[XML]

 

<?xml version="1.0" encoding="utf-8"?>
<customers>
  <customer Name="John Ben Jan" ID="1" Domain="History" City="Boston"/>
  <customer Name="Lisa Lane" ID="2" Domain="Chemistry" City="LA"/>
  <customer Name="Dagomir Zits" ID="3" Domain="Heraldry" City="Milwaukee"/>
  <customer Name="Sara Careira Santy" ID="4" Domain="IT" City="Miami"/>
</customers>

Note that the structure of the XML document can also be varied and the data will still be read correctly. This allows different types of XML documents to be merged easily. The XML can be changed so that each table represented as an element in the XML with each field of the table being a child element and the field value being the text node of this element.

Here’s our sample Word template document. The Name, ID, Domain and City fields have been set up as merge fields, and correspond to the nodes in the XML file.

testfile

To execute mail merge with data from an XML data source we will:

1.       Load the XML into memory.

2.       Pass the data to a new instance of the XmlMailMergeDataTable class which is included with this sample.

3.       Run the Aspose.Words MailMerge.Execute method.

It’s really pretty simple. Using Aspose.Words, the mail merge operation will replace the merge fields in the document with the values from the XML file.

The Code

Make sure in the Word template that you have set up merge fields wherever you want the data inserted.

Firstly, we store the XML file from disk into memory by parsing it and storing it in a org.w3c.dom.Document object.

This object which represents the XML is passed to the XmlMailMergeDataTable class. This class is the middle-man between the data source and the mail merge engine, allowing data from the XML represented in memory to be passed the mail merge engine and merged into the document.

Then we open the template document, and run the mail merge on the XmlMailMergeDataTable using the Aspose.Words Mail Merge object.

Example XMLMailMerge

Shows how to execute mail merge using an XML data source by implementing IMailMergeDataSource.

[Java]

 

package XMLMailMerge;

 

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.DocumentBuilderFactory;

import java.io.File;

import java.net.URI;

 

import com.aspose.words.Document;

 

/**

* This sample demonstrates how to execute mail merge with data from an XML data source. The XML file is read into memory,

* stored in a DOM and passed to a custom data source implementing IMailMergeDataSource. This returns each value from XML when

* called by the mail merge engine.

*/

class Program

{

    public static void main(String[] args) throws Exception

    {

        // Sample infrastructure.

        URI exeDir = Program.class.getResource("").toURI();

        String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

 

        // Use DocumentBuilder from the javax.xml.parsers package and Document class from the org.w3c.dom package to read

        // the XML data file and store it in memory.

        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();

        // Parse the XML data.

        org.w3c.dom.Document xmlData = db.parse(dataDir + "Customers.xml");

 

        // Open a template document.

        Document doc = new Document(dataDir + "TestFile.doc");

 

        // Note that this class also works with a single repeatable region (and any nested regions).

        // To merge multiple regions at the same time from a single XML data source, use the XmlMailMergeDataSet class.

        // e.g doc.getMailMerge().executeWithRegions(new XmlMailMergeDataSet(xmlData));

        doc.getMailMerge().execute(new XmlMailMergeDataTable(xmlData, "customer"));

 

        // Save the output document.

        doc.save(dataDir + "TestFile Out.doc");

    }

}

 

 

The XmlMailMergeDataTable class is a custom data source implementing IMailMergeDataSource. The code for this class is provided below. The IMailMergeDataSource interface allows you to manually define where the data used for mail merge comes from. In this case the data is read from the XML file loaded into memory. The details of how classes implementing this interface works are not explained in full here but can be found in the API documentation for the IMailMergeDataSource class.

The general process that the XmlMailMergeDataTable class employs when providing data to the mail merge engine involves iterating over the nodes in the DOM and extracting the appropriate values with each record to be merged. The DOM represents XML tags as nodes and elements and when the mail merge engine requests the value of a field the data is extracted from the currrent node and the value returned.

When the record for the table has finished the mail merge engine instructs the pointer to be moved forward and the current node is moved to the next sibling.

If mail merge with regions is used along with nested regions then the IMailMergeDataSource.GetChildDataSource method is called. A new instance of XmlMailMergeDataTable is created with the root node being the child node matching the first record of the table name.

Example XMLMailMergeDataTable

Shows how to create a class implementing IMailMergeDataSource which allows data to be mail merged from an XML document.

[Java]

 

package XMLMailMerge;

 

import com.aspose.words.IMailMergeDataSource;

import org.w3c.dom.Element;

import org.w3c.dom.Node;

 

import javax.xml.xpath.XPath;

import javax.xml.xpath.XPathConstants;

import javax.xml.xpath.XPathExpression;

import javax.xml.xpath.XPathFactory;

import java.util.HashMap;

 

/**

* A custom mail merge data source that allows you to merge data from an XML document into Word templates.

* This class demonstrates how data can be read from a custom data source (XML parsed and loaded into a DOM) and merged

* into a document using the IMailMergeDataSource interface.

*

* An instance of this class represents a single table in the data source and in the template.

* Note: We are using the Document and Node class from the org.w3c.dom package here and not from Aspose.Words.

*/

public class XmlMailMergeDataTable implements IMailMergeDataSource

{

    /**

     * Creates a new XmlMailMergeDataSource for the specified XML document and table name.

     *

     * @param xmlDoc The DOM object which contains the parsed XML data.

     * @param tableName The name of the element in the data source where the data of the region is extracted from.

     */

    public XmlMailMergeDataTable(org.w3c.dom.Document xmlDoc, String tableName) throws Exception

    {

        this(xmlDoc.getDocumentElement(), tableName);

    }

 

    /**

     * Private constructor that is also called by GetChildDataSource.

     */

    private XmlMailMergeDataTable(Node rootNode, String tableName) throws Exception

    {

        mTableName = tableName;

 

        // Get the first element on this level matching the table name.

        mCurrentNode = (Node)retrieveExpression("./" + tableName).evaluate(rootNode, XPathConstants.NODE);

    }

 

    /**

     * The name of the data source. Used by Aspose.Words only when executing mail merge with repeatable regions.

     */

    public String getTableName()

    {

        return mTableName;

    }

 

    /**

     * Aspose.Words calls this method to get a value for every data field.

     */

    public boolean getValue(String fieldName, Object[] fieldValue) throws Exception

    {

        // Attempt to retrieve the child node matching the field name by using XPath.

        Node value = (Node)retrieveExpression(fieldName).evaluate(mCurrentNode, XPathConstants.NODE);

        // We also look for the field name in attributes of the element node.

        Element nodeAsElement = (Element)mCurrentNode;

 

        if (value != null)

        {

            // Field exists in the data source as a child node, pass the value and return true.

            // This merges the data into the document.

            fieldValue[0] = value.getTextContent();

            return true;

        }

        else if (nodeAsElement.hasAttribute(fieldName))

        {

            // Field exists in the data source as an attribute of the current node, pass the value and return true.

            // This merges the data into the document.

            fieldValue[0] = nodeAsElement.getAttribute(fieldName);

            return true;

        }

        else

        {

            // Field does not exist in the data source, return false.

            // No value will be merged for this field and it is left over in the document.

            return false;

        }

    }

 

    /**

     * Moves to the next record in a collection. This method is a little different then the regular implementation as

     * we are walking over an XML document stored in a DOM.

     */

    public boolean moveNext()

    {

        if (!isEof())

        {

            // Don't move to the next node if this the first record to be merged.

            if (!mIsFirstRecord)

            {

                // Find the next node which is an element and matches the table name represented by this class.

                // This skips any text nodes and any elements which belong to a different table.

                do

                {

                    mCurrentNode = mCurrentNode.getNextSibling();

                }

                while ((mCurrentNode != null) && !(mCurrentNode.getNodeName().equals(mTableName) &&  (mCurrentNode.getNodeType() == Node.ELEMENT_NODE)));

            }

            else

            {

                mIsFirstRecord = false;

            }

        }

 

        return (!isEof());

    }

 

    /**

     * If the data source contains nested data this method will be called to retrieve the data for

     * the child table. In the XML data source nested data this should look like this:

     *

     * <Tables>

     *    <ParentTable>

     *       <Name>ParentName</Name>

     *       <ChildTable>

     *          <Text>Content</Text>

     *       </ChildTable>

     *    </ParentTable>

     * </Tables>

     */

    public IMailMergeDataSource getChildDataSource(String tableName) throws Exception

    {

        return new XmlMailMergeDataTable(mCurrentNode, tableName);

    }

 

    private boolean isEof()

    {

        return (mCurrentNode == null);

    }

 

    /**

     * Returns a cached version of a compiled XPathExpression if available, otherwise creates a new expression.

     */

    private XPathExpression retrieveExpression(String path) throws Exception

    {

        XPathExpression expression;

 

        if(mExpressionSet.containsKey(path))

        {

            expression = (XPathExpression)mExpressionSet.get(path);

        }

        else

        {

            expression = mXPath.compile(path);

            mExpressionSet.put(path, expression);

        }

        return expression;

    }

 

    /**

     * Instance variables.

     */

    private Node mCurrentNode;

    private boolean mIsFirstRecord = true;

    private final String mTableName;

    private final HashMap mExpressionSet = new HashMap();

    private final XPath mXPath = XPathFactory.newInstance().newXPath();

}

 

 

End Result

And here’s the result below, page one of four pages in the output file, one page for each of the four customers in the XML file. The merge fields in the template have been replaced by the customer details in the XML file.