Aspose.Words can be used not only for creating Microsoft Word documents by building them dynamically or merging templates with data, but also for parsing documents in order to extract separate document elements such as headers, footers, paragraphs, tables, images, and others. Another possible task is to find all text of a specific formatting or style.
Use the DocumentVisitor class to implement this usage scenario. This class corresponds to the well-known Visitor design pattern. With DocumentVisitor, you can define and execute custom operations that require enumeration over the document tree.
DocumentVisitor provides a set of VisitXXX methods that are invoked when a particular document element (node) is encountered. For example, DocumentVisitor.VisitParagraphStart is called when the beginning of a text paragraph is found and DocumentVisitor.VisitParagraphEnd is called when the end of a text paragraph is found. Each DocumentVisitor.VisitXXX method accepts the corresponding object that it encounters so you can use it as needed (say retrieve the formatting), e.g. both DocumentVisitor.VisitParagraphStart and DocumentVisitor.VisitParagraphEnd accept a Paragraph object.
Each DocumentVisitor.VisitXXX method returns a VisitorAction value that controls the enumeration of nodes. You can request either to continue the enumeration, skip the current node (but continue the enumeration), or stop the enumeration of nodes.
These are the steps you should follow to programmatically determine and extract various parts of a document:
· Create a class derived from DocumentVisitor.
· Override and provide implementations for some or all of the DocumentVisitor.VisitXXX methods to perform some custom operations.
· Call Node.Accept on the node from where you want to start the enumeration. For example, if you want to enumerate the whole document, use Document.Accept(DocumentVisitor).
DocumentVisitor provides default implementations for all of the DocumentVisitor.VisitXXX methods. This makes it easier to create new document visitors as only the methods required for the particular visitor need to be overridden. It is not necessary to override all of the visitor methods.
This example shows how to use the Visitor pattern to add new operations to the Aspose.Words object model. In this case, we create a simple document converter into a text format.
Example
Shows how to use the Visitor pattern to add new operations to the Aspose.Words object model. In this case we create a simple document converter into a text format.
[Java]
public void toText() throws Exception
{
// Open the document we want to convert.
Document doc = new Document(getMyDir() + "Visitor.ToText.doc");
// Create an object that inherits from the DocumentVisitor class.
MyDocToTxtWriter myConverter = new MyDocToTxtWriter();
// This is the well known Visitor pattern. Get the model to accept a visitor.
// The model will iterate through itself by calling the corresponding methods
// on the visitor object (this is called visiting).
//
// Note that every node in the object model has the Accept method so the visiting
// can be executed not only for the whole document, but for any node in the document.
doc.accept(myConverter);
// Once the visiting is complete, we can retrieve the result of the operation,
// that in this example, has accumulated in the visitor.
System.out.println(myConverter.getText());
}
/**
* Simple implementation of saving a document in the plain text format. Implemented as a Visitor.
*/
public class MyDocToTxtWriter extends DocumentVisitor
{
public MyDocToTxtWriter() throws Exception
{
mIsSkipText = false;
mBuilder = new StringBuilder();
}
/**
* Gets the plain text of the document that was accumulated by the visitor.
*/
public String getText() throws Exception
{
return mBuilder.toString();
}
/**
* Called when a Run node is encountered in the document.
*/
public int visitRun(Run run) throws Exception
{
appendText(run.getText());
// Let the visitor continue visiting other nodes.
return VisitorAction.CONTINUE;
}
/**
* Called when a FieldStart node is encountered in the document.
*/
public int visitFieldStart(FieldStart fieldStart) throws Exception
{
// In Microsoft Word, a field code (such as "MERGEFIELD FieldName") follows
// after a field start character. We want to skip field codes and output field
// result only, therefore we use a flag to suspend the output while inside a field code.
//
// Note this is a very simplistic implementation and will not work very well
// if you have nested fields in a document.
mIsSkipText = true;
return VisitorAction.CONTINUE;
}
/**
* Called when a FieldSeparator node is encountered in the document.
*/
public int visitFieldSeparator(FieldSeparator fieldSeparator) throws Exception
{
// Once reached a field separator node, we enable the output because we are
// now entering the field result nodes.
mIsSkipText = false;
return VisitorAction.CONTINUE;
}
/**
* Called when a FieldEnd node is encountered in the document.
*/
public int visitFieldEnd(FieldEnd fieldEnd) throws Exception
{
// Make sure we enable the output when reached a field end because some fields
// do not have field separator and do not have field result.
mIsSkipText = false;
return VisitorAction.CONTINUE;
}
/**
* Called when visiting of a Paragraph node is ended in the document.
*/
public int visitParagraphEnd(Paragraph paragraph) throws Exception
{
// When outputting to plain text we output Cr+Lf characters.
appendText(ControlChar.CR_LF);
return VisitorAction.CONTINUE;
}
public int visitBodyStart(Body body) throws Exception
{
// We can detect beginning and end of all composite nodes such as Section, Body,
// Table, Paragraph etc and provide custom handling for them.
mBuilder.append("*** Body Started ***\r\n");
return VisitorAction.CONTINUE;
}
public int visitBodyEnd(Body body) throws Exception
{
mBuilder.append("*** Body Ended ***\r\n");
return VisitorAction.CONTINUE;
}
/**
* Called when a HeaderFooter node is encountered in the document.
*/
public int visitHeaderFooterStart(HeaderFooter headerFooter) throws Exception
{
// Returning this value from a visitor method causes visiting of this
// node to stop and move on to visiting the next sibling node.
// The net effect in this example is that the text of headers and footers
// is not included in the resulting output.
return VisitorAction.SKIP_THIS_NODE;
}
/**
* Adds text to the current output. Honours the enabled/disabled output flag.
*/
private void appendText(String text) throws Exception
{
if (!mIsSkipText)
mBuilder.append(text);
}
private final StringBuilder mBuilder;
private boolean mIsSkipText;
}