The following diagrams define the "schema" of the Aspose.Words document tree. From the diagrams and descriptions, you can understand which nodes can contain which nodes.
On the above diagram:
· Document has one or more Section nodes.
· Section has one Body and zero or more HeaderFooter nodes.
· Both Body and HeaderFooter contain zero or more block-level nodes.
· A Document can have a GlossaryDocument.
A Microsoft Word document consists of one or more sections. A section can define its own page size, margins, orientation, number of text columns as well as headers and footers. Sections are separated by section breaks in a document. The Section class represents a section of a document.
A section contains main text as well as headers and footers for the first, even and odd pages. These different “flows” of text are called stories. In Aspose.Words, the Section node contains the story nodes Body and HeaderFooter. The main text is stored inside the Body object. The text of each header and footer is stored in HeaderFooter objects.
The text of any story consists of paragraphs and tables, represented by the Paragraph and Table objects respectively.
Additionally, each word document can contain a glossary document. A glossary document stores building blocks, AutoText and AutoCorrect entries. In Aspose.Words this is represented by the GlossaryDocument node, which in turn contains BuildingBlock nodes that represent different types of glossary document entries. Each BuildingBlock contains sections which can be inserted, removed and copied in documents.
On the above diagram:
· Block-level elements can occur in a number of places in the document tree (e.g. as children of Body, Footnote, Comment, Cell and other nodes).
· Most important block-level nodes are Table and Paragraph.
· Table contains zero or more rows.
· Paragraph contains zero or more inline elements.
· CustomXmlMarkup and StructuredDocumentTag can wrap other block-level nodes.
On the above diagram:
· Paragraph is the most frequently encountered container of inline-level nodes.
· Paragraph can contain runs of text formatted differently, represented by Run nodes.
· Paragraph can contain bookmarks - BookmarkStart and BookmarkEnd.
· Paragraph can contain annotations – CommentRangeStart, CommentRangeEnd, Comment and Footnote nodes.
· Paragraph can contain Word fields - FieldStart, FieldSeparator and FieldEnd nodes that represent field characters and also FormField nodes.
· Paragraph can contain shapes, drawings, images etc. represented by Shape, GroupShape and DrawingML nodes.
· Paragraph can contain custom markup in the form of SmartTag, CustomXmlMarkup and StructuredDocumentTag nodes that can contain nested inline nodes.
Shapes in Microsoft Word include Office Art auto shapes, textboxes, images, OLE objects and ActiveX controls, all of which are represented using the Shape class. Some shapes can contain text. Shapes can be grouped inside each other using the GroupShape nodes.
Even though a shape in a Microsoft Word document can be positioned inline with text or floating at any position on the page, a shape always has an “anchor” position in text and the Shape or GroupShape object in Aspose.Words represents that anchor position.
Documents in DOCX format can contain a special type of graphics called DrawingML. These are represented by the DrawingML node.
Footnote and Comment nodes represent the anchor position of a footnote, endnote or comment in the document. Footnotes and comments can have text inside them, therefore Footnote and Comment nodes in Aspose.Words can contain block-level nodes.
On the above diagram:
· Table can have many rows.
· Row can have many cells.
· Cell can contain block-level nodes (e.g. Paragraph and Table).
· Rows, cell and block-level elements can be wrapped inside CustomXmlMarkup and StructuredDocumentTag.
OOXML documents allow users to embed their own custom semantics in the form of Smart Tags, Structured Document Tags (content controls) and Custom XML Markup.
In Aspose.Words a Smart Tag is represented by the SmartTag class. A Structured Document Tag is represented by the StructuredDocumentTag class and Custom XML Markup is represented by the CustomXmlMarkup class. Each class exposes properties which allow you to access the custom data of these markup nodes.
A way to think about markup nodes in Aspose.Words is that SmartTag, StructuredDocumentTag and CustomXmlMarkup nodes “wrap” content on the same level in the document hierarchy. The content that it wraps can then be found as children of the markup node.
Each markup node can be found in different levels in the document. SmartTag nodes can only occur at the inline-level. StructuredDocumentTag and CustomXmlMarkup are more flexible and can occur at several different levels in the document tree.
The StructuredDocumentTag.Level and CustomXmlMarkup.Level properties return the MarkupLevel value that specifies the level of the markup node in the document tree.
The different levels a markup node can be found in the document tree are:
· Block – The markup node appears at the block-level. For example, as a child of a Body in the document. The children of these markup nodes can contain block-level nodes.
· Row – The markup node appears as a child of Table and can contain Row nodes.
· Cell – The markup node appears as a child of Row and can contain Cell nodes.
· Inline – The markup node appears at the inline-level. For example as a child of Paragraph and can contain inline-level nodes.
On each level markup nodes of the same level can be nested. For example, StructuredDocumentTag at the block-level can contain nested block-level StructuredDocumentTag and CustomXmlMarkup nodes.