Aspose.Words

Document Tree Navigation

Tree Overview

Aspose.Words represents a document as a tree of nodes. An integral feature of the tree is the ability to navigate between the nodes. This section shows how to explore and navigate the document tree in Aspose.Words.

When the sample fax document presented earlier is opened in DocumentExplorer (it is a demo project shipped with Aspose.Words and can be found in the <install dir>\Demos\DocumentExplorer directory), it shows the tree of nodes exactly as it is represented in Aspose.Words:

The nodes in the tree are said to have relationships between them. A node that contains another node is a parent and the contained node is a child. Children of the same parent are sibling nodes. The Document node is always the root node.

The nodes that can contain other nodes derive from the CompositeNode class and all nodes ultimately derive from the Node class. The two base classes provide common methods and properties to navigate and modify the tree structure.

The following UML class diagram shows the classes and methods we are going to explore in the remainder of this topic:

The UML object diagram below shows several nodes of the fax sample document and how they are connected to each other via the parent, child and sibling properties:

Parent Node

Each node has a parent that is specified by the Node.ParentNode property. A node does not have a parent node (Node.ParentNode is null) when a node has just been created and not yet added to the tree, or if it has been removed from the tree. You can remove a node from its parent by calling Node.Remove.

The parent node of the root Document node is always null.

Example AccessParentNode

Shows how to access the parent node.

[Java]

 

// Create a new empty document. It has one section.

Document doc = new Document();

 

// The section is the first child node of the document.

Node section = doc.getFirstChild();

 

// The section's parent node is the document.

System.out.println("Section parent is the document: " + (doc == section.getParentNode()));

 

 

Owner Document

It is important to mention that a node always belongs to a particular document, even if it was just created or has been removed from the tree. The document to which the node belongs is returned by the Node.Document property.

A node always belongs to a document, because some vital document-wide structures such as styles and lists are stored in the Document node. For example, it is not possible to have a Paragraph without a Document because each paragraph has a style assigned to it and the style is defined globally for the document.

This rule is enforced when creating any new nodes. For instance, a new Paragraph to be added directly to the DOM requires a document object passed to the constructor. This is the document to which the paragraph belongs to.

When creating a new paragraph using DocumentBuilder the builder always has a Document class linked to it through the DocumentBuilder.Document property.

Example CreatingNodeRequiresDocument

Shows that when you create any node, it requires a document that will own the node.

[Java]

 

// Open a file from disk.

Document doc = new Document();

 

// Creating a new node of any type requires a document passed into the constructor.

Paragraph para = new Paragraph(doc);

 

// The new paragraph node does not yet have a parent.

System.out.println("Paragraph has no parent node: " + (para.getParentNode() == null));

 

// But the paragraph node knows its document.

System.out.println("Both nodes' documents are the same: " + (para.getDocument() == doc));

 

// The fact that a node always belongs to a document allows us to access and modify

// properties that reference the document-wide data such as styles or lists.

para.getParagraphFormat().setStyleName("Heading 1");

 

// Now add the paragraph to the main text of the first section.

doc.getFirstSection().getBody().appendChild(para);

 

// The paragraph node is now a child of the Body node.

System.out.println("Paragraph has a parent node: " + (para.getParentNode() != null));

 

 

Child Nodes

The most efficient way to access child nodes of a CompositeNode is via the CompositeNode.FirstChild and CompositeNode.LastChild properties that return the first and last child nodes respectively. If there are no child nodes, a null is returned.

CompositeNode also provides the CompositeNode.ChildNodes collection that allows indexed or enumerated access to the children. The CompositeNode.ChildNodes property is a live collection of nodes. It means that whenever the document is changed (nodes removed or inserted), the CompositeNode.ChildNodes collection is automatically updated. Node collections are discussed in detail in further topics.

If a node has no children, then CompositeNode.ChildNodes returns an empty collection. You can check if a CompositeNode contains any child nodes using the CompositeNode.HasChildNodes property.

Example ChildNodesForEach

Shows how to enumerate immediate children of a CompositeNode using the enumerator provided by the ChildNodes collection.

[Java]

 

NodeCollection children = paragraph.getChildNodes();

for (Node child : (Iterable<Node>) children)

{

    // Paragraph may contain children of various types such as runs, shapes and so on.

    if (child.getNodeType() == NodeType.RUN)

    {

        // Say we found the node that we want, do something useful.

        Run run = (Run)child;

        System.out.println(run.getText());

    }

}

 

 

Example ChildNodesIndexer

Shows how to enumerate immediate children of a CompositeNode using indexed access.

[Java]

 

NodeCollection children = paragraph.getChildNodes();

for (int i = 0; i < children.getCount(); i++)

{

    Node child = children.get(i);

 

    // Paragraph may contain children of various types such as runs, shapes and so on.

    if (child.getNodeType() == NodeType.RUN)

    {

        // Say we found the node that we want, do something useful.

        Run run = (Run)child;

        System.out.println(run.getText());

    }

}

 

 

Sibling Nodes

You can obtain the node immediately preceding or following a certain node using Node.PreviousSibling and Node.NextSibling, respectively. If a node is the last child of its parent, then the Node.NextSibling property is null. Conversely, if the node is a first child of its parent, the Node.PreviousSibling property is null.

Note that because the child nodes are internally stored in a single linked list in Aspose.Words, Node.NextSibling is more efficient than Node.PreviousSibling.

Example RecurseAllNodes

Shows how to efficiently visit all direct and indirect children of a composite node.

[Java]

 

public void recurseAllNodes() throws Exception

{

    // Open a document.

    Document doc = new Document(getMyDir() + "Node.RecurseAllNodes.doc");

 

    // Invoke the recursive function that will walk the tree.

    traverseAllNodes(doc);

}

 

/**

* A simple function that will walk through all children of a specified node recursively

* and print the type of each node to the screen.

*/

public void traverseAllNodes(CompositeNode parentNode) throws Exception

{

    // This is the most efficient way to loop through immediate children of a node.

    for (Node childNode = parentNode.getFirstChild(); childNode != null; childNode = childNode.getNextSibling())

    {

        // Do some useful work.

        System.out.println(Node.nodeTypeToString(childNode.getNodeType()));

 

        // Recurse into the node if it is a composite node.

        if (childNode.isComposite())

            traverseAllNodes((CompositeNode)childNode);

    }

}

 

 

Typed Access to Children and Parent

So far, we have discussed the properties that return one of the base types Node or CompositeNode. You will have noticed that you might have to cast the values to the concrete class of the node, such as Run or Paragraph.

Many casting or explicit conversions between types using the as operator is often considered a bad smell in an object oriented code. However, casting is not always bad; sometimes a bit of casting is necessary. We found you cannot completely get away without casting when working with an object model that is a Composite, like the Aspose.Words DOM.

To reduce the need for casting, most of the Aspose.Words classes provide properties and collections that allow strictly typed access. There are three basic patterns for typed access:

·          A parent node exposes typed FirstXXX and LastXXX properties. For example, Document has Document.FirstSection and Document.LastSection properties. Similarly, Table has Table.FirstRow and Table.LastRow properties and so on.

·          A parent node exposes a typed collection of child nodes, for example Document.Sections, Body.Paragraphs and so on.

·          A child node provides typed access to its parent, for example Run.ParentParagraph, Paragraph.ParentSection etc.

Typed properties are merely useful shortcuts that sometimes allow easier access than the generic properties inherited from Node.ParentNode and CompositeNode.FirstChild.

Example TypedPropertiesAccess

Demonstrates how to use typed properties to access nodes of the document tree.

[Java]

 

// Quick typed access to the first child Section node of the Document.

Section section = doc.getFirstSection();

 

// Quick typed access to the Body child node of the Section.

Body body = section.getBody();

 

// Quick typed access to all Table child nodes contained in the Body.

TableCollection tables = body.getTables();

 

for (Table table : tables)

{

    // Quick typed access to the first row of the table.

    if (table.getFirstRow() != null)

        table.getFirstRow().remove();

 

    // Quick typed access to the last row of the table.

    if (table.getLastRow() != null)

        table.getLastRow().remove();

}