The following tables provide implementation details about how Aspose.Words loads a document in HTML based formats: HTML, XHTML, and MHTML.
Aspose.Words supports importing and exporting HTML based documents. You can load such documents in the Document Object Model, edit and add new content and convert them to any supported format such as DOCX, PDF, Image etc.
The Aspose.Words HTML engine is resilient and can properly import simple and complex HTML even if there are problems with it, resolving any malformed structure, parts and ignoring any unsupported tags. Most common native HTML tags and CSS formatting are supported during import. The input HTML can skip tags and still be imported well e.g you can miss out <p> or <span> tags and the text content is still imported properly.
Note that Aspose.Words works with Word documents, therefore not all HTML features are supported during import and export. Not all HTML attributes may be imported as they do not have Microsoft Word equivalents. Also during export some document features may not be included as they cannot be represented in HTML properly. There may be many "N/A" values in this list for these reasons however Aspose.Words strives to support all HTML features possible.
Normally elements or attributes that do not have an eqivilant feature in a Microsoft Word document are ignored during import.
You can set the BaseUri path of the document being loaded so relative resources can be correctly imported.
Aspose.Words supports most CSS 1 and CCS 2 properties that have an eqivilant use in Word documents. Note that multiple classes on a single class attribute is currently not supported during import.
The HTML produced by Aspose.Words conforms to HTML 4.0 or XHTML 1.0 Transitional specifications. Multiparts/mixed content is supported in HTML during load. You can choose the encoding used during import from and export to HTML based formats. During load you can choose to auto detect the encoding.
Aspose.Words does not deal with Javascript and no Javascript is read or written during open and save. If you are dealing with a page which part of is generated dynamically using Javascript then you still achieve the same results by first emulating the page in a browser and then importing the page source. This can be automated. Adding Javascript to an output HTML document can be done with some simple postprocessing.
Currently special Microsoft "mso" attributes are not imported or exported with the exception of "mso-break-type"which is supported both in import and export. These properties help with round-tripping HTML back to a document format but significantly bloat the HTML which is why most users want to avoid such extra markup. However since it is a useful tool to provide Word-HTML round-trip, we will support these attributes both import and export in a future version.
See the following links in the documentation for further information:
· Loading, Saving and Converting
· Aspose.Words Document Object Model
· Document
Feature |
Supported |
Comment |
See Also |
Attached Template |
N/A |
|
|
Built-In Properties |
Yes |
All Built-in Document Properties can be accessed and modified in Aspose.Words API. There are methods to update the "count" properties such as character, word and page count. All such properties are supported with the exception of the "line" count which is currently not updated. Title, Keywords, Description properties are imported from meta tags in HTML. Other built-in properties stored in custom tags are currently not imported. |
· Document.BuiltInDocumentProperties · Document.UpdatePageLayout · Document.UpdateWordCount |
Custom Properties |
Planned |
Custom Document Properties can be created, accessed and modified through the API. Currently Custom Document properties or Built-in properties other than Title, Keywords or Description or are not imported from HTML. |
· Document.CustomDocumentProperties |
Custom Payload Part |
N/A |
|
|
Custom XML Data Storage |
N/A |
|
|
Digital Signature |
N/A |
Digital signatures cannot be added to HTML format. |
|
Embedded Package |
N/A |
|
|
Encryption |
N/A |
|
|
Font Table |
Yes |
|
|
Glossary Document/Quick Parts/Auto Text |
N/A |
|
|
Hyphenation |
Planned |
There is currently no API to access and modify hypenation settings in a document. |
· ParagraphFormat.SuppressAutoHyphens |
Key Map Customizations |
N/A |
|
|
Mail Merge Recipient Data |
N/A |
|
|
Office Math |
N/A |
|
|
Themes |
N/A |
Only OOXML documents have native support for themes. During export, theme formatting is applied as direct formatting to HTML. During round-trip back to DOCX this formatting is retained but the theme information is lost. |
|
Toolbar Customizations |
N/A |
|
|
Variables |
N/A |
|
|
VBA Project (Macro) |
N/A |
|
|
VBA Project Digital Signature |
N/A |
|
|
Background |
Yes |
A background of a Word document can be a solid color or an image. Only solid background is imported. Imported from style="background:xxx" on <body> tag. There are plans to support image background through the style-background attribute. |
· Document.BackgroundShape |
Thumbnail |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Embed Fonts |
Planned |
Currently embedding new fonts into a document is unsupported. |
|
Access and Use Embedded Fonts |
Planned |
There is an option to subset and export font resources to EPUB, MHTML and HTML. Fonts that are embedded in the original DOCX document can be optionally exported. Embedded fonts linked in HTML are currently not read during import. |
· FontInfo · FontInfo.GetEmbeddedFont |
Feature |
Supported |
Comment |
See Also |
Bibliography |
N/A |
There is no tag in HTML which corresponds to a Microsoft word Bibliography. However a bibliography is saved to HTML as regular text and therefore will be loaded back into Aspose.Words as plain text as well. |
|
Sources/Citations |
N/A |
|
|
Citation Style |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Allow Only Comments |
N/A |
|
|
Allow Only Form Fields |
N/A |
|
|
Allow Only Revisions |
N/A |
|
|
Limit Formatting to Selection of Styles |
N/A |
|
|
Protection Password (Legacy) |
N/A |
|
|
Protection Password (OOXML) |
N/A |
|
|
Protected Sections |
N/A |
|
|
Protection Ranges |
N/A |
|
|
Read Only |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Asian Typography Settings |
N/A |
|
|
Compatibility Options |
Planned |
|
· Document.CompatibilityOptions |
Endnote Options |
N/A |
|
|
Footnote Options |
N/A |
|
|
Mail Merge Settings |
N/A |
|
|
Print Settings |
N/A |
|
|
Show/Hide Settings |
N/A |
|
|
View Settings |
N/A |
|
|
Web Settings |
N/A |
|
|
XML Settings |
N/A |
|
|
Each paragraph in a document is represented in Aspose.Words as a Paragraph node. A paragraph represesents a block of text in a document and have a variety of properties and styles.
Using Aspose.Words you can access and change virtually all properties of a paragraph. Nearly all paragraph attributes are supported. You can also easily insert and remove paragraphs.
Paragraph formatting is contained within the ParagraphFormat class which is linked to the paragraph.
Paragraphs are imported from HTML from <p> and <h1> - <h6> tags.
Most common native HTML tags and CSS formatting are supported during import. Note that Aspose.Words works with Word documents, therefore not all CSS can be imported as some features do not have a useful eqivilant in Word document formats. Such attributes are ignored during import.
Aspose.Words supports most CSS 1 and CCS 2 properties that have an eqivilant use in Word documents.
There is a load option to skip loading any embedded or linked style sheet.
See the following links in the documentation for further information:
· Paragraph
· Paragraph.ParagraphFormat
· LoadOptions.ResourceLoadingCallback
Paragraph style and formatting can be imported from HTML in the form of tags such as <h1> to <h6> or from <p> tags that have CSS styles.
<h1> to <h6> tags are imported into the Aspose.Words DOM as the built-in Heading styles: Heading 1 - Heading 6.
Inline CSS (through use of the style attribute) is imported as direct formatting on the paragraph (stored in the ParagraphFormat of the Paragraph node).
An Embedded or Linked CSS style (through use of the class attribute) is imported as a Style and applied to the Paragraph node in the document. This style formatting can be accessed using the ParagraphFormat.Style property. A linked CSS sheet can also be downloaded automatically from an external address on the internet.
When there is conflicting formatting on inline and embedded/external CSS, as with CSS the formatting from inline styles are taken first, then the embedded formatting and finally the external formatting.
Feature |
Supported |
Comment |
See Also |
Paragraph Style |
Yes |
Styles are imported from embedded or external style sheets. If there is no linked style sheet of either of these kinds then the document is imported with no styles (apart from default Normal style). To make sure styles are imported use a style sheet of any kind. There is a load option to control whether embedded or external style sheets are read or skipped during HTML import. There is also an option to supply your own CSS style sheet instead. |
· ParagraphFormat · ParagraphFormat.Style |
Alignment |
Yes |
Imported from the "text-align" paragraph style attribute. |
· ParagraphFormat.Alignment |
Right to Left Paragraph |
Planned |
|
· ParagraphFormat.Bidi |
Bullets and Numbers |
Yes |
Imported from <ol>, <ul>, <li> tags. Simulated lists using <p> and <span> look correct but will not be imported as proper lists in the DOM. |
· ParagraphFormat.ListFormat · ParagraphFormat.ListLabel |
Outline Level |
Planned |
|
· ParagraphFormat.OutlineLevel |
Run Properties for the Paragraph Mark |
Planned |
Can be implemented with Microsoft Office specific techniques. During import the formatting from the last span from <p> becomes the font properties for the paragraph. |
· ParagraphFormat.ParagraphBreakFont |
Suppress Line Numbers |
Planned |
|
· ParagraphFormat.SurpressLineNumbers |
Suppress Hyphenation |
Planned |
|
· ParagraphFormat.SurpressAutoHyphens |
Feature |
Supported |
Comment |
See Also |
Left Indent |
Yes |
Imported from margin-left on style attribute. |
· ParagraphFormat.LeftIndent |
Right Indent |
Yes |
Imported from margin-right on style attribute. |
· ParagraphFormat.RightIndent |
First Line Indent |
Yes |
Imported from text-indent on style attribute. |
· ParagraphFormat.FirstLineIndent |
Hanging Indent |
Yes |
Imported from a combination of margin-left and text-indent style attribute. |
· ParagraphFormat.FirstLineIndent |
Mirror Indents |
N/A |
|
|
Automatically Adjust Right Indent |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Space Before |
Yes |
Imported from "margin-top" style attribute. If this attribute is missing from a paragraph during import from HTML then Space Before is set to Auto. |
· ParagraphFormat.SpaceBefore |
Space After |
Yes |
Imported from "margin-bottom" style attribute. If this attribute is missing from a paragraph during import from HTML then Space After is set to Auto. |
· ParagraphFormat.SpaceAfter |
Space Auto |
Yes |
Paragraphs imported from HTML without margin-top or margin-bottom style attributes are imported as Auto spacing by default. |
· ParagraphFormat.SpaceBeforeAuto · ParagraphFormat.SpaceAfterAuto |
Line Spacing |
Yes |
Imported from "line-height" style attribute. |
· ParagraphFormat.LineSpacing · ParagraphFormat.LineSpacingRule |
No Space between Conforming Paragraphs |
Planned |
|
· ParagraphFormat.NoSpaceBetweenParagraphsOfSameStyle |
Snap To Grid |
Planned |
|
|
Feature |
Supported |
Comment |
See Also |
Widow/Orphan Control |
Yes |
Imported from "widows" CSS attribute. A value of 0 from this attribute is imported as Widow/Orphan control as being disabled. A value of 1 or greater is imported as enabled. A paragraph without this attrubite is automatically given Widow/Orphan control in the model. |
· ParagraphFormat.WidowControl |
Keep With Next |
Yes |
Imported from style attribute with "page-break-after:avoid". |
· ParagraphFormat.KeepWithNext |
Keep Lines Together |
Yes |
Imported from style attribute with "page-break-inside:avoid". |
· ParagraphFormat.KeepTogether |
Page Break Before |
Yes |
Imported from "page-break-before" on style attribute. |
· ParagraphFormat.PageBreakBefore |
This is the legacy text frames from Word 97, not to be confused with the Autoshape Textbox which is discussed under Drawing Objects.
Text frames are preserved in the model but there is no API or node to modify or access information about frames.
Frames exported to HTML as paragraphs surronded by a border.
These are round-tripped back to a document with similar formatting but not as actual text frames
Feature |
Supported |
Comment |
See Also |
Text Frames |
Planned |
|
|
All features of tab stops are supported in Aspose.Words except for relative tab stops.
Using Aspose.Words you can find tab stops based off position or index. You can change tab stop features like position, alignment etc or remove tabstops completely.
Tab stops are not natively available in HTML so Aspose.Words exports spacing as a set of non-breaking spaces. These can not be imported back as tab-stops again.
In future improvements, Aspose.Words will convert tab stops as a fixed space which will should allow proper round-trip. In the same way we will also provide support for the Microsoft Office mso-tab-count attribute.
See the following link in the documentation for further information:
· ParagraphFormat.TabStops
Feature |
Supported |
Comment |
See Also |
Absolute Position |
Planned |
|
· TabStop.Position |
Relative Position |
Planned |
A relative position tab can be inserted in Microsoft Word using the "Insert Alignment Tab" button. This type of tab is relative to either the page margin or the indent of the paragraph. This allows tab stops to appear in the same relative place even when the position of the paragraph or page is modified. Currently Aspose.Words supports these types of tab stops in OOXML and WordML formats only. There is currently no API to retrieve the properties of this tab e.g RelativeTo, Alignment, Leader etc. Further support is planned. This feature might be supported during HTML import if a proper analog can be found. |
· AbsolutePositionTab |
Alignment: Left, Center, Right, Decimal, Bar |
Planned |
|
· TabStop.Alignment |
Leader |
Planned |
|
· TabStop.Leader |
Drop Caps are partially supported and preserved during document conversion. A drop cap is a text frame which is imported as a separate paragraph (from the rest of the paragraph as seen in the source document).
You can modify drop cap properties and position, however the new settings are not applied to the drop cap. You cannot yet create new drop caps (although you can easily simulate them through the use of a textbox).
This will be improved in a future version of Aspose.Words.
Drop cap is a frame. During import the appearance of a drop cap is round-tripped correctly, however it is not imported as proper drop cap therefore options cannot be modified.
See the following links in the documentation for further information:
· ParagraphFormat.DropCapPositon
· ParagraphFormat.LinesToDrop
Feature |
Supported |
Comment |
See Also |
Drop Caps |
Yes |
|
|
Borders are imported from border-style, border-width etc on style or from indivudal borders using border-xxx-style and border-xxx-width etc style attributes.
A div with embedded or linked CSS containing a border style has all of the paragraphs and spans inside the div imported with full borders. This will be improved in a future version.
Feature |
Supported |
Comment |
See Also |
Border Sides |
Yes |
|
· ParagraphFormat.Borders · LineStyle |
Shadow |
Planned |
|
· Border.Shadow |
3D Frame |
Planned |
|
· Border.LineStyle |
Style |
Yes |
|
· Border.LineStyle |
Color |
Yes |
|
· Border.Color |
Width |
Yes |
|
· Border.LineWidth |
Distance from Text |
Yes |
Imported from "padding-xxx" settings. |
· Border.DistanceFromText |
Fill color imported from "background-color" on style attribute.
Currently cell background is imported as paragraph shading. This will be improved in a future version of Aspose.Words.
See the following link in the documentation for further information:
· ParagraphFormat.Shading
Feature |
Supported |
Comment |
See Also |
Shading |
Yes |
|
|
Asian Typography settings is fully supported during conversion. However there is currently no API to access or modify these settings.
Feature |
Supported |
Comment |
See Also |
Use Asian Rules for Controlling First and Last Characters |
Planned |
|
|
Allow Latin Text to Wrap in the Middle of a Word |
Planned |
|
|
Allow Hanging Punctuation |
Planned |
|
|
Allow Punctuation at Start of a Line to Compress |
Planned |
|
|
Automatically Adjust Space between Asian and Latin Text |
Planned |
|
|
Automatically Adjust Space between Asian Text and Numbers |
Planned |
|
|
Text Vertical Alignment |
Planned |
|
|
In Aspose.Words DOM all text is represented in the form of Run nodes. A single Run contains not only the string of text but also complex properties which describe how the text appears and behaves in the document. All characters in a Run have identical formatting.
Using Aspose.Words you can insert, move, and remove runs. You can also access and modify all properties of a run.
All formatting of a run is contained within a linked classed called Font.
Text content is imported from any text area found in the HTML document. The formatting of text elements are imported from <span> elements.
Aspose.Words supports reading the text content even if the input HTML is not properly formed.
See the following links in the documentation for further information:
· Run
· Run.Font
· Run.Text
Feature |
Supported |
Comment |
See Also |
Western Languages |
Yes |
|
|
East European Languages |
Yes |
|
|
East Asian Languages |
Yes |
|
|
Right to Left Languages |
Yes |
Imported from dir attribute on span. |
· Font.Bidi · Font.BoldBi · Font.LocaleIdBi |
Carriage Return (not a Paragraph Break) |
Yes |
|
|
Non Breaking Space |
Yes |
Imported from " " entity code. |
· ControlChar.NonBreakingSpace |
Non Breaking Hyphen |
Planned |
|
· ControlChar.NonBreakingHyphen |
Soft Hyphen |
Planned |
This type of hyphen is referred to as an "Optional Hyphen" in Microsoft Word documents. |
· ControlChar.OptionalHyphen |
Symbol |
Yes |
|
|
Tab |
Planned |
There is no equivalent of a tab in HTML documents. Tabs are currently exported as a series of non-breaking spaces. These are imported back as a series of non-breaking spaces but there is a work around to replace these with proper tabs. It is planne to import the special Microsoft Word mso attribute to properly import tab sequences. |
· ControlChar.Tab |
Feature |
Supported |
Comment |
See Also |
Line Break |
Yes |
Imported from <br> element. |
· ControlChar.LineBreak |
Line Break Clear Type |
Yes |
|
|
Page Break |
Yes |
Imported from <br style="page-break-before:always; clear:both"> |
· ControlChar.PageBreak |
Column Break |
Yes |
Imported using the Microsoft Office attribute on break: style="mso-column-break-before:always" |
· ControlChar.ColumnBreak |
Feature |
Supported |
Comment |
See Also |
Character Style |
Yes |
Character style is imported from either inline CSS (style) or embedded or linked CSS style sheet (class) on span elements. Inline CSS (from style attribute) is imported as direct formatting on the text (stored in the Font of the Run node). An Embedded or Linked CSS style (through the class attribute) is imported as a Style and applied to the Run node in the document. This formatting can be accessed using the Run.Font.Style property. A linked CSS sheet can also be downloaded from an external address on the internet. When there is conflicting formatting on inline and embedded/external CSS, as with CSS the formatting from inline styles are taken first, then the embedded formatting and finally the external formatting. Styles are imported from embedded or external style sheets. If there is no linked style sheet of either of these kinds then the document is imported with no styles (apart from default Normal style). To make sure styles are imported use a CSS style sheet of any kind. There is a load option to control whether embedded or external style sheets are read or skipped during HTML import. There is also an option to supply your own CSS style sheet instead. |
· Font.Style · LoadOptions.ResourceLoadingCallback |
Color |
Yes |
Imported from color on style attribute. |
· Font.Color |
East Asian Typography |
Planned |
|
|
Highlight Color |
Planned |
Highlight is imported as a solid pattern. Can be made to round-trip with some research. |
· Font.HighlightColor |
Language |
Yes |
Imported from lang attribute on <span>. If this attribute is missing then the default language for the document is used. |
· Font.LocaleId · Font.LocaleIdBi |
Do not Check Spelling or Grammar |
Planned |
|
· Font.NoProofing |
Border |
Yes |
Imported from border-style, border-width, border-color on <span>. |
· Font.Border |
Shading |
Yes |
Imported from background-color on <span>. Imported into the model as solid pattern on Run. |
· Font.Shading |
Bold and italics imported from font-weight:bold and font-style:italics on style.
This formatting can also be imported from simple tags e.g <b></b>, <i></i>.
All other font formatting is imported from standard CCS attributes on the "style" attribute.
<pre> tag is imported as text formatted with "Courier New" font.
There is an option to control how size is exported. Font can be exported as points or as em units. This allows fonts to be resized automatically by browsers by increasing or decreasing font size.
See the following links in the documentation for further information:
· Font.Bold
· Font.Italics
· Font.Name
· Font.NameFarEast
Feature |
Supported |
Comment |
See Also |
Font |
Yes |
|
|
Imported from style attribute "text-decoration:underline" or from <u></u> tags.
Import of underline color is currently not supported.
See the following link in the documentation for further information:
· Font.Underline
Feature |
Supported |
Comment |
See Also |
Underline Type |
N/A |
|
|
Underline Color |
Planned |
Can be imported from a bottom border with different color from text. |
· Font.UnderlineColor |
See the following link in the documentation for further information:
· Font
Feature |
Supported |
Comment |
See Also |
Animated Effect |
N/A |
|
|
Double Strikethrough |
N/A |
|
|
Strikethrough |
Yes |
Imported from text-decoration:line-through on <span> style. |
· Font.StrikeThrough |
Subscript/Superscript |
Yes |
Imported from vertical-align:sub and vertical-align:super on <span> style. |
· Font.Subscript · Font.Superscript |
Shadow |
N/A |
|
|
Outline |
N/A |
|
|
Emboss |
N/A |
|
|
Imprint (Engrave) |
N/A |
|
|
Small Caps |
Yes |
Imported from style="font-variant:small-caps". |
· Font.SmallCaps |
All Caps |
Yes |
Imported from style="text-transform:uppercase". |
· Font.AllCaps |
Hidden Text |
Yes |
Imported from style="display:none". |
· Font.Hidden |
Special Hidden |
Planned |
|
|
Web Hidden |
Planned |
|
|
Feature |
Supported |
Comment |
See Also |
Scale |
N/A |
|
|
Expanded/Compressed |
Yes |
Imported from style="letter-spacing:XXXpt". |
· Font.Spacing |
Vertical Position |
Yes |
Imported from "vertical-align:XXXpt". |
· Font.Position |
A table is comprised of rows and cells and is used to display data in a grid-like layout.
Aspose.Words supports imports of tables from all loaded formats including Microsoft Word, Open Office and HTML documents.
A table is represented in Aspose.Words by a Table node. Each row of the table is represented by a separate Row node. Likewise each cell of the row is represented by a Cell node. Each node type has it's own formatting properties which controls the table's apperance and behavior.
· Table contains the properties for controlling the formatting of a table as a whole.
· Each Row provides a RowFormat object which contains the properties that control formatting for that particular row.
· Each Cell has a CellFormat object which provides properties to control the formatting of each cell.
Using Aspose.Words you can access and modify all features and formatting of a table along with creating new tables and removing existing ones from the document.
Note that some elements of a table may be wrapped with Markup nodes such as CustomXmlMarkup or StructuredDocumentTag nodes.
A table is imported from source HTML from <table> and other applicable tags.
Currently CSS styles are not imported from table, tr and td elements on import. These feature are planned. You can define inline styles on TD in the mean time.
See the following links in the documentation for further information:
· Table
Feature |
Supported |
Comment |
See Also |
Nested Tables |
Yes |
|
|
Right To Left Tables |
Yes |
|
· Table.Bidi |
Table Style |
Planned |
Table styles are supported in model and during conversion. A table style can be applied or removed from tables. Only in-built or table styles already in the document can be applied - there is currently no support for creating new table styles. There are plans to import CSS style on table as a Table Style. |
· Table.Style · Table.StyleIdentifier |
Conditional Formatting Style |
N/A |
|
|
Table Alignment |
Yes |
Imported as a table wrapped inside a <div> formatted with text-align. |
· Table.Alignment |
Table Indent |
Planned |
Will be imported from margin-left:XXX on table. Currently this property is skipped. |
· Table.LeftIndent |
Allow AutoFit |
Planned |
Can be imported from "table-layout:fixed" attribute. |
· Table.AllowAutoFit |
Default Cell Margins |
Planned |
Can be imported from "spacing" style attribute on table. |
· Table.LeftPadding · Table.RightPadding · Table.BottomPadding · Table.TopPadding |
Default Cell Spacing |
Planned |
Can be imported from "padding-left", "padding-right" etc style attribute on table. |
· Table.CellSpacing |
Preferred Table Width |
Yes |
Preferred width on table can be set to absolute (points), relative (percent) or auto setting. Imported from width as relative (percent) or absolute (point) width from <table>. |
· Table.PreferredWidth |
Table Shading |
Yes |
Imported from background-color style attribute on all cells in the table. |
· Table.SetShading |
Hidden |
N/A |
|
|
Floating tables are supported during import and export. However there is currently no API to access or modify the floating position of a table.
Floating tables are imported as inline.
Feature |
Supported |
Comment |
See Also |
Floating Tables |
Planned |
|
|
Table borders are stored in the rows of the table. This mimics the structure of an OOXML document.
If you try to set borders or shading on a table without any rows then an exception will be thrown. Add at least one row first.
Borders are imported from each cell from style attribute border-XXX-style, border-XXX-color etc.
See the following links in the documentation for further information:
· Table.SetBorders
· Table.ClearBorders
· RowFormat.Borders
Feature |
Supported |
Comment |
See Also |
Table Borders |
Planned |
|
|
Feature |
Supported |
Comment |
See Also |
Allow Break Across Pages |
Planned |
|
· Keeping Tables and Rows from Breaking across Pages · RowFormat.AllowBreakAcrossPages |
Repeat as Header Row |
Planned |
Will be imported from <thead> and <th> elements. Currently content from such elements are still imported properly but not as header rows. |
· Specifying Rows to Repeat on Subsequent Pages as Header Rows · RowFormat.HeadingFormat |
Height |
Yes |
Imported from "height" of style attribute on <tr>. Row height is only from <tr> and not from <td> cells. |
· RowFormat.Height |
Height Rule |
Planned |
A row without any height is imported as "Auto" height rule. A row with height defined is imported as "At Least". |
· RowFormat.HeightRule |
Feature |
Supported |
Comment |
See Also |
Cell Margins |
Yes |
Imported from padding-XXX on <td> elements. |
· CellFormat.TopPadding · CellFormat.BottomPadding · CellFormat.LeftPadding · CellFormat.RightPadding |
Borders |
Yes |
Imported from <td> style attribute border-XXX-style, border-XXX-width etc. |
· CellFormat.Borders |
Shading |
Yes |
Imported from "background-color" style attribute on <td>. Note that background-image attribute is not supported as a Cell in a Microsoft Word document does not have a corresponing feature to this>. Instead consider applying the background image to the paragraph inside the cell in the HTML document. |
· CellFormat.Shading |
Wrap Text |
Planned |
|
· CellFormat.WrapText |
Fit Text |
Planned |
|
· CellFormat.FitText |
Preferred Width |
Yes |
Imported from style attribute width from cells as either relative (percent) or fixed (points). |
· CellFormat.PreferredWidth |
Merged Horizontally |
Yes |
Imported from "row-span" attribute on <td>. |
· CellFormat.HorizontalMerge |
Merged Vertically |
Yes |
Imported from "col-span" attribute on <td>. |
· CellFormat.VerticalMerge |
Vertical Alignment |
Yes |
Imported from vertical-align attribute on cell. |
· CellFormat.VerticalAlignment |
Text Direction |
Yes |
Imported from "writing-mode" style attribute. |
· CellFormat.Orientation |
Custom Markup are elements that added to parts of the document which allow extra information to be embedded within that particular document feature.
For example, CustomXML markup can be wrapped around a paragraph in the document and user-defined data added to it. This data can then be retrieved from that paragraphs when required.
It is planned to import custom tags from HTML as CustomXML around document elements in the document.
Represented in Aspose.Words DOM as a CustomXmlMarkup node.
You can create and remove CustomXmlMarkup in a document. You can also access the properties of the XML markup node.
It is planned to import custom tags from HTML as CustomXML around document elements in the document.
See the following link in the documentation for further information:
· CustomXmlMarkup
Feature |
Supported |
Comment |
See Also |
CustomXML |
Planned |
|
|
Feature |
Supported |
Comment |
See Also |
Content Controls (Structured Document Tags) |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Smart Tag Properties |
N/A |
|
|
Sections allow you to divide parts of the document so page formatting and headers and footers apply only to that part of the document. This allows for example different parts of the document to completley different page sizes or page orientations.
A section is represented as a Section node in the Aspose.Words model.
Aspose.Words supports the creation and deletion of sections in a document, along with accessing and modifying all section properties.
Sections are imported from <div> elements. Section-wide formatting is imported through linked CSS on <div>.
See the following links in the documentation for further information:
· Section
· Document.Sections
Each Header and Footer in a document is stored per section. Each header or footer is imported into Aspose.Words as a HeaderFooter node. This node is always a child of a Section.
Most documents have header or footer content represented by the primary header or footer. This displays content on all pages of the section. There is also different types of headers and footers to display different content on the first page or even/odd pages of the header footer.
There can be up to three different types of headers and three different types of footers per section. You can only have one type of the header or footer per section.
In Aspose.Words this is represented by Header Footer nodes of different types. The different types are:
· HeaderFirst
· HeaderPrimary
· HeaderEven
· FooterFirst
· FooterPrimary
· FooterEven
There is a save option that controls how headers and footers are output.
Header and footer content is not round-tripped and instead after importing from HTML will appear in the document body. There are plans to properly support this in the future.
If embedded or external style sheet is set when saving the HTML then regular paragraphs in the header or footer are exported with "Header" or "Footer" style. These can be used to reconstruct a proper header or footer in the document.
Import of external headers and footers (stored in a separate file) like how Microsoft Word exports them is currently unsupported.
See the following links in the documentation for further information:
· Section.HeadersFooters
· PageSetup.DifferentFirstPageHeaderFooter
· PageSetup.OddAndEvenPageHeaderFooter
· HeaderFooterCollection.LinkToPrevious
Feature |
Supported |
Comment |
See Also |
Different First Page |
Planned |
|
|
Different Even and Odd Pages |
Planned |
Note that setting a Microsoft Word Document to display even or odd header footers applies to the entire document. If you set this option in Microsoft Word then all sections follow this rule. Even though this is a documentw-wide setting, in Aspose.Words this property appears per section as a PageSetup property. Changing this property affects all sections in the document. |
|
Continue from Previous Section |
Planned |
In a Microsoft Word document a header or footer can be linked to the previous section. This means the same headers and footers from the section before will be displayed for this section as well. In some cases you can check this by using the HeaderFooter.LinkedToPrevious property. In Aspose.Words, the different situations are represented in the model as follows: · If a document has no headers or footers of a certain type then no Section node contains any child Header Footer of that type. · If header or footer is not linked to the previous section (the header of footer is different from the previous section) then the Section node will have its own Header Footer node of that type. This is the same for each type of header or footer that is not linked in the Section. · If a header or footer is linked to the previous section then there will be no header or footer of that type in the current section. This means that a section that appears to have no header or footer nodes can still be displaying headers and footers as they come from previous sections. Check the HeaderFooter.LinkedToPrevious property. · If a header or footer is not linked to the previous section but it simply blank (no content) then there will be a header or footer in that section, however it will contain no content (no runs). You can link/unlink header footers from previous sections by using the HeaderFooter.LinkToPrevious method. If you unlink a headerfooter from the previous section using Microsoft Word, the content from the previous header or footer is copied over. In Aspose.Words however the header footer is unlinked but left blank. You can copy the content from the previous section if required. Note that you can choose to unlink all headers and footers of all types or just a particular type. For example the primary header footer can be different whereas the primary footer can still be linked to the previous section. |
|
The different section breaks types are imported from <br> tag which contains the special Microsoft Office attribute mso-break-type:section-break.
See the following links in the documentation for further information:
· PageSetup.SectionStart
· DocumentBuilder.InsertBreak
Feature |
Supported |
Comment |
See Also |
Continuous |
Yes |
Imported as <br> with page-break-before:auto. |
|
Even Page |
Yes |
Imported as <br> with page-break-before:left. |
|
Odd Page |
Yes |
Imported as <br> with page-break-before:right. |
|
Next Column |
Yes |
Imported as <br> with mso-column-break-before:always |
|
Next Page |
Yes |
Imported as <br> with page-break-before:always. |
|
HTML and EPUB have no native support for text columns.
Support for this feature may be possible in a future version using CSS3 features for HTML and EPUB 3.0 features for EPUB.
Feature |
Supported |
Comment |
See Also |
Text Columns |
N/A |
|
|
Paper size and margins are imported from size and margin attributes on each section (imported from div elements).
See the following links in the documentation for further information:
· PageSetup
· PageSetup.LeftMargin
· PageSetup.FooterDistance
· PageSetup.Gutter
Feature |
Supported |
Comment |
See Also |
Page Margins |
Yes |
|
|
Feature |
Supported |
Comment |
See Also |
Page Numbering |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Right to Left Section |
Planned |
|
· PageSetup.Bidi |
Line Numbering |
Planned |
|
· PageSetup.LineNumberCountBy · PageSetup.LineNumberDistanceFromText · PageSetup.LineNumberRestartMode · PageSetup.LineStartingNumber |
Paper Source |
Planned |
|
· PageSetup.FirstPageTray · PageSetup.OtherPageTray |
Paper Size |
Yes |
|
· PageSetup.PaperSize |
Orientation |
Yes |
Currently imported paper size depends on orientation as width and height are switched. |
· PageSetup.Orientation |
Protection |
N/A |
|
|
Text Direction |
Planned |
|
|
Vertical Alignment |
N/A |
|
|
Asian Document Grid |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Chapter Numbering |
N/A |
|
|
HTML does not have any "page" concept so no page border is imported.
Feature |
Supported |
Comment |
See Also |
Page Border |
N/A |
|
|
A style allows you to define a set of formatting that can be reused on many elements in a document. This saves time and allows for a more consistent formatting throughout your document.
A style loaded into a document is represented in the Aspose.Words DOM by the Style class. You can access or modify any type of style (both in-built or custom) in a document.
You can also create a new style from scratch (with the exception of a table style which new styles cannot be created for currently). You can choose to set any style you want to document elements
You currently cannot rename a style name or remove an exisiting style from a document. Copying styles from one document to another is also unsupported, however for the time being you can achieve this by copying a node with a style to another document. This will copy the source style along with it.
Styles are imported from embedded or external CSS style sheets. Each selector is imported as a new Style in the Aspose.Words model. All styles are imported even if they are not actually used within the HTML body.
Style type is calculated based on the elements that the style is applied to. The appropriate style type is created from this.
The logic used when a external style sheet is enctounered on ocument load can be controlled using IResourceLoadingCallback. Using this callback you can choose to download the external style sheet, skip loading and avoid applying the styles from the sheet or specify your own style sheet to use instead.
Styles are imported from embedded or external CSS style sheets. Each selector is imported as a new Style in the Aspose.Words model. All styles are imported even if they are not actually used within the HTML body.
Style type is calculated based on the elements that the style is applied to. The appropriate style type is created from this.
The logic used when a external style sheet is enctounered on ocument load can be controlled using IResourceLoadingCallback. Using this callback you can choose to download the external style sheet, skip loading and avoid applying the styles from the sheet or specify your own style sheet to use instead.
See the following links in the documentation for further information:
· Document.Styles
· Style
· Style.Name
· IResourceLoadingCallback
Feature |
Supported |
Comment |
See Also |
Paragraph Style |
Yes |
Imported from "class" attribute on HTML paragraph elements. |
· StyleType.Paragraph |
Character Style |
Yes |
Imported from "class" attribute on span elements. |
· StyleType.Character |
List Style |
Planned |
|
· StyleType.List |
Table Style |
Planned |
|
· Table.Style · TableStyle · StyleType.Table |
Feature |
Supported |
Comment |
See Also |
Aliases |
Yes |
Aliases are exported as ordinary CSS classes. On subsequent import they produce independent styles. |
|
Based On |
Planned |
|
· Style.BaseStyleName |
Built-in Styles |
Yes |
Some built-in styles are imported from specific elements. For instance Normal redirects to general <p> element, <h1> to Heading 1 etc. |
· Style.BuiltIn · Style.StyleIdentifier |
Custom Styles |
Yes |
A new style is created for all other CSS styles in the HTML document. |
|
Linked Styles |
Planned |
|
|
Style Name |
Yes |
|
· Style.Name |
Next Style |
N/A |
|
|
Paragraph Properties |
Yes |
|
· Style.ParagraphFormat |
Run Properties |
Yes |
|
· Style.Font |
Bullets and Numbering |
Yes |
|
· Style.List · Style.ListFormat |
Feature |
Supported |
Comment |
See Also |
Paragraph Properties |
N/A |
|
|
Run Properties |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Apply Formatting to |
Planned |
|
· Table.StyleOptions |
Table Properties |
Planned |
|
· TableStyle |
Banding |
Planned |
|
· Table.StyleOptions |
Paragraph Properties |
Planned |
|
· TableStyle.ParagraphFormat |
Run Properties |
Planned |
|
· TableStyle.Font |
A list used in a document is actually made up of many complex parts. List and their properties are fully supported by Aspose.Words.
There are two main types of lists:
· Numbered (Ordered)
· Bullet (Unordered)
Most properties of lists are supported by Aspose.Words. You can create new lists, access and modify properties of existing lists. You currently cannot remove an existing list from a document.
In all import formats the list value is not stored with the document, it is calculated dynamically. Aspose.Words automatically calculates the values for all list paragraphs in the document even for complex lists. You can retrieve this value through a property of the ListLabel class.
You can find what paragraphs a list is applied to and work with them manually. There are plans to allow to retrieve a list in the document body as an object. You can remove list formatting from a paragraph however you cannot remove a list reference from a document.
Lists are imported in HTML from <ul> and <ol> tags. Nested lists of different types are also supported during import. A <li> tag must be wrapped in a <ul> or <ol> tag to be imported as a proper list item, otherwise it is imported as a regular paragraph. We will look into importing such elements on there own as list items as well in a future version.
Lists can also appear in HTML as ordinary paragraphs, which apperance-wise are imported correctly, however are not read as proper List objects.
See the following links in the documentation for further information:
· Paragraph.IsListItem
· Paragraph.ListFormat
· Paragraph.ListLabel
· List.ListLevels
Feature |
Supported |
Comment |
See Also |
Single Level |
Yes |
|
|
Multi Level |
Yes |
Some parts of multi-level lists will be imported as separte List objects. This can cause some formatting differences during conversion. This will be improved in a future version of Aspose.Words so multi-level lists are imported as a single List object. |
· List.IsMultiLevel |
Name |
Planned |
|
|
Feature |
Supported |
Comment |
See Also |
Label Alignment |
Yes |
|
· ListLevel.Alignment |
Picture Bullet |
Planned |
Picture bullets are supported, however there is currently no way to set a new picture bullet for a list item. Consider creating a list with the picture bullet first in the document using Microsoft Word and then apply this list to the required paragraphs. Will be imported using list-style-image attribute and other related attributes. |
|
Restart Level |
Yes |
|
· ListLevel.RestartAfterLevel |
Bullet Character |
Yes |
Imported from type attribute on <ul> tag. |
|
Label/Format String |
Yes |
|
· ListLabel.LabelString |
Number Format |
Yes |
Imported from "type" attribute on <ol>. |
· ListLevel.NumberFormat |
Paragraph Properties |
Planned |
|
|
Font Properties |
Planned |
|
· ListLevel.Font |
Linked Paragraph Style |
Planned |
|
· ListLevel.LinkedStyle |
Starting Value |
Yes |
Imported from start attribute on <ul> or <ol>. |
· ListLevel.StartAt |
Text After |
Planned |
|
· ListLevel.TrailingCharacter |
An endnote or footnote is a note that appears at the bottom of a page that is commonly used by writers to cite other authors publication in their document.
Using Aspose.Words you can interact with footnotes and endnotes and access or modify footnote related propeties such as the location of the footnotes and when they restart.
Footnotes and endnote markers are imported as hyperlinks. The content of these is separated at the bottom of the section with a horizontal rule.
There is a sample project which demonstrates how to convert this type of footnote import into proper footnotes again.
Footnotes and endnote markers are imported as hyperlinks. The content of these is separated at the bottom of the section with a horizontal rule.
There is a sample project which demonstrates how to convert this type of footnote import into proper footnotes again.
See the following links in the documentation for further information:
· Footnote
· Document.FootnoteOptions
Feature |
Supported |
Comment |
See Also |
Reference Mark |
N/A |
|
|
Custom Reference Mark |
N/A |
|
|
Custom Separator |
N/A |
|
|
Continuation Separator Mark |
N/A |
|
|
Document Wide Properties |
N/A |
|
|
Section Wide Properties |
N/A |
|
|
Number Format |
Planned |
|
|
Restart Location |
Planned |
|
|
Starting Value |
Planned |
|
|
Placement |
Planned |
|
· FootnoteOptions.Location |
Feature |
Supported |
Comment |
See Also |
Reference Mark |
N/A |
|
|
Custom Reference Mark |
N/A |
|
|
Custom Separator |
N/A |
|
|
Continuation Separator Mark |
N/A |
|
|
Document Wide Properties |
Planned |
|
|
Section Wide Properties |
N/A |
|
|
Number Format |
Yes |
|
· FootnoteOptions.NumberStyle |
Restart Location |
Planned |
|
· FootnoteOptions.RestartRule |
Starting Value |
Planned |
|
· FootnoteOptions.StartNumber |
Placement |
Planned |
|
|
Annonations allow the user to add extra information to the document normally for use in review or collaboration.
These features are supported by Aspose.Words.
Bookmarks are imported as BookmarkStart and BookmarkEnd nodes. In Microsoft Word document formats a bookmark range can span over long amoutns of content, including over different paragraphs and even tables.
In Aspose.Words the BookmarkStart node designates where the start of the bookmarked region begins in the document. Likewise, the BookmarkEnd node designates where the end of the bookmark region closes.
You can access the bookmark as a "single entity" by using the Bookmark façade. You can add and remove bookmarks from a document and also set and get the text of the bookmark content.
Bookmark nodes are represented as inline nodes (child of a paragraph). Some bookmarks markers in Word documents are at different levels of the document hierarchy than just inline. This means when they are imported into Aspose.Words they are translated to the cloesest inline position.
This normally causes no problems but some bookmarks on tables can appear differently when imported.
The Aspose.Words model is based on Word document formats. In these formats bookmark names must be unique. The model will allow bookmarks with the same name, however all duplicates are removed automatically during export. Note that duplicate bookmarks can happen when you accentitly create a bookmark with the same name, or when documents that contain the same bookmark are joined together using the AppendDocument or InsertDocument methods.
Bookmark is imported from <a> element. Bookmark start and end appear in the same position. Nesting and overlapping of bookmarks is not allowed.
See the following links in the documentation for further information:
· Range.Bookmarks
· Bookmark
Feature |
Supported |
Comment |
See Also |
Bookmark Start |
Yes |
|
· BookmarkStart |
Bookmark End |
Planned |
|
· BookmarkEnd |
Bookmark Name |
Yes |
|
· Bookmark.Name |
Bookmark Table Columns |
N/A |
|
|
A comment in a document is imported as a Comment node in the Aspose.Words DOM.
The range of a comment can span over various parts of the document text, including over many paragraphs and tables.
In Aspose.Words this range is represented by the following nodes:
· Comment
· CommentRangeStart
· CommentRangeEnd
The CommentRangeStart and CommentRangeEnd nodes define the area of the document that the comment is applied to. The Comment node defines the actual content of the comment and provides members to access the comment properties such as Author and Time.
All three comment nodes are related through the use of the ID properties on each node.
There are plans to import comments from the HTML footnote element. This is how Microsoft Word exports comments so this will allow import of comments in documents saved using Microsoft Word.
See the following links in the documentation for further information:
· How to Extract or Remove Comments
· Comment
· Comment.Id
Feature |
Supported |
Comment |
See Also |
Comment |
Planned |
|
· Comment |
Comment Range |
Planned |
|
· CommentRangeStart · CommentRangeEnd |
Author |
Planned |
|
· Comment.Author |
Date |
Planned |
|
· Comment.Date |
Initial |
Planned |
|
· Comment.Inital |
Tracked changes are imported into the model as regular nodes. Paragraphs, Runs and Shapes all provide special properties to specify if they are insert or delete revisions.
You can work with each these revisions manually or choose to accept all revisions at once. There is currently no API to reject changes.
Using Aspose.Words you can set tracked changes to be on or off. Note however that any changes made in the DOM using Aspose.Words are not recorded as tracked changes.
You may need to accept tracked changes before saving to different formats or else the deleted revisions will still show up in the output document.
Most revision types properly round-tripped to the appropriate formats. Currently only Insert and Delete revisions are made avaliable in the public API. Also Move and some Table revisions are unsupported. Additionally formatting changes are also unsupported.
These additonal features will be included in a future version as well as an API to easily retrieve revisions by author, date etc.
Imported from <ins> and <del> elements.
See the following links in the documentation for further information:
· Document.HasRevisions
· Document.TrackRevisions
· Document.AcceptAllRevisions
Feature |
Supported |
Comment |
See Also |
On/Off State |
N/A |
|
|
Table Cell Deletion |
N/A |
|
|
Table Cell Insertion |
N/A |
|
|
Cell Merge or Split |
N/A |
|
|
Run Deletion |
Planned |
|
· Run.IsDeleteRevision |
Run Insertion |
Planned |
|
· Run.IsInsertRevision |
Paragraph Deletion |
Planned |
|
· Paragraph.IsDeleteRevision |
Paragraph Insertion |
Planned |
|
· Paragraph.IsInsertRevision |
Table Row Deletion |
N/A |
|
|
Table Row Insertion |
N/A |
|
|
Numbering Insertion |
N/A |
|
|
Numbering Change |
N/A |
|
|
Moves |
Planned |
Currently is imported as a pair of deletion and insertion revisions. |
|
Paragraph Properties Change |
N/A |
|
|
Run Properties Change |
N/A |
|
|
Section Properties Change |
N/A |
|
|
Table Properties Change |
N/A |
|
|
Cell Properties Change |
N/A |
|
|
Row Properties Change |
N/A |
|
|
RSIDs Session Identifiers |
N/A |
|
|
Fields are place holders in the document which can be dynamically updated to display new information . The most common type of fields are MergeFields and Page fields. The first allows you to merge data into a document, the latter displays the current page number of the page where the field appears on.
Aspose.Words supports almost all common field types and can peform field update on most field types, even ones with complex content. This includes the TOC (Table of Contents) field. With one call to Document.UpdateFields the TOC field or any other supported field is fully updated. New or existing fields are fully updated by the Aspose.Words field engine. There is a document option to control the culture/locale used during field update. This can be the language setting of the field in the document or the current culture/locale used by the application.
A field is represented in the document model as:
· FieldStart node.
· Run node(s) (represents the field code).
· FieldSeparator node.
· Other nodes (represents the field result) such as runs, shapes. A field can span across many different types of content. A field result can consist of other block level nodes such as Table or Paragraph.
· FieldEnd node.
We provide the Field facade for working with this structure more easily. This allows you to easily find the field code and field result of a field. Currently you can only retrieve this facade while inserting a new field into the document, there are plans to introduce a new field API which allows you to get this facade from any field the document.
Using Aspose.Words you can insert new fields, as well as find and modify existing fields. You can also remove fields. You can also find the field code and field result of any field.
Currently to work with a field you need to iterate through the different field nodes above. Sometime soon we will release the Field API which will provide an API to achieve such operations much more easily.
Fields with custom field codes or field results (modified manually in the document to appear different) are retained during import and export. However if you invoke field update, these might be replaced with the proper field content.
Only form fields and hyperlinks are importing from HTML as dynamic fields.
Other fields are imported from HTML as plain text.
There are plans to try make some fields round-trip capable back to Word document formats by adding extra markup to the output HTML.
There are also plans to import of fields from HTML by allowing the user to define a custom syntax that is imported into the model as a working field.
Only form fields and hyperlinks are importing from HTML as dynamic fields.
Other fields are imported from HTML as plain text.
There are plans to try make some fields round-trip capable back to Word document formats by adding extra markup to the output HTML.
There are also plans to import of fields from HTML by allowing the user to define a custom syntax that is imported into the model as a working field.
See the following links in the documentation for further information:
· DocumentBuilder.InsertField
· Document.UpdateFields
· FieldType
Feature |
Supported |
Comment |
See Also |
Field Codes |
Yes |
|
|
Feature |
Supported |
Comment |
See Also |
CreateDate |
N/A |
|
|
Date |
N/A |
|
|
EditTime |
N/A |
|
|
PrintDate |
N/A |
|
|
SaveDate |
N/A |
|
|
Time |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Compare |
N/A |
|
|
DocVariable |
N/A |
|
|
GoToButton |
N/A |
|
|
If |
N/A |
|
|
MacroButton |
N/A |
|
|
|
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Author |
N/A |
|
|
Comments |
N/A |
|
|
DocProperty |
N/A |
|
|
FileName |
N/A |
|
|
FileSize |
N/A |
|
|
Info |
N/A |
|
|
Keywords |
N/A |
|
|
LastSavedBy |
N/A |
|
|
NumChars |
N/A |
|
|
NumPages |
N/A |
|
|
NumWords |
N/A |
|
|
Subject |
N/A |
|
|
Template |
N/A |
|
|
Title |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Formula |
N/A |
|
|
Advance |
N/A |
|
|
Eq |
N/A |
|
|
Symbol |
N/A |
|
|
Form fields are fully supported by Aspose.Words.
There is an option to export form fields as dynamic fields in HTML as <input> and <select> tags or to export them as plain text.
Only <input> and <select> tags are imported back as fields. The input types that have direct Microsoft Word analogs are imported as working form fields.
Radio and image input elements are imported as image shapes and are non-clickable.
Input elements attributed with hidden or disabled are not imported.
See the following links in the documentation for further information:
· FormField
· FormField.Type
· FormField.Result
Feature |
Supported |
Comment |
See Also |
TextInput |
Yes |
Imported from <input type="text | password | file" name="XXX" />. |
· FormField.TextInputDefault · FormField.TextInputFormat |
CheckBox |
Yes |
Imported from <input type="checkbox" name="XXX" /> |
· FormField.Type |
DropDown |
Yes |
Imported from <select name="XXX" />. Each item in the list is imported from <option> child elements. <optgroup> tag is not supported. Multiselect list attribute is ignored as there is no analog in Microsoft Word drop down lists. |
· FormField.DropDownItems · FormField.DropDownSelectedIndex |
Calc On Exit |
N/A |
|
|
Checked |
Planned |
Will be mported from checked="checked" attribute on <input>. |
· FormField.Checked |
Default Value |
Yes |
With text form fields this is imported from value="XXX" attribute on <input> tag. With a drop down list, this is imported form the <option> element which has selected="selected" attribute. |
· FormField.TextInputDefault |
Enabled |
Planned |
The "disabled" attribute can be used here. |
· FormField.Enabled |
Entry and Exit Macro |
N/A |
|
|
Name |
Yes |
Imported from the name attribute on <input> or <select> element. |
· FormField.Name |
Help Text |
Planned |
The "alt" attribute can be used. |
· FormField.HelpText |
Status Text |
Planned |
|
· FormField.StatusText |
Max Length |
Yes |
Exported as maxlength attribute. |
· FormField.MaxLength |
Check Box Size |
Planned |
There are plans to use width and height CSS attributes to increase size of checkboxes exported to HTML. |
· FormField.CheckboxSize · FormField.IsCheckBoxExactSize |
Text Input Type |
Planned |
|
· FormField.TextInputType |
Feature |
Supported |
Comment |
See Also |
Index |
N/A |
|
|
RD |
N/A |
|
|
TA |
N/A |
|
|
TC |
N/A |
|
|
TOA (Table of Authorities) |
N/A |
|
|
TOC (Table of Contents) |
N/A |
Hyperlinked entries are imported as working hyperlinks but the entire content is not imported as a TOC field. |
|
XE |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
AutoText |
N/A |
|
|
AutoTextList |
N/A |
|
|
Bibliography |
N/A |
|
|
Citation |
N/A |
|
|
Hyperlink |
Yes |
This field is fully supported. No update of this field is required. |
|
IncludePicture |
N/A |
Imported as a regular image. |
|
IncludeText |
N/A |
|
|
Link |
N/A |
|
|
NoteRef |
N/A |
|
|
PageRef |
N/A |
|
|
Quote |
N/A |
|
|
Ref |
N/A |
|
|
StyleRef |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
AddressBlock |
N/A |
|
|
Ask |
N/A |
|
|
Compare |
N/A |
|
|
Database |
N/A |
|
|
Fill-in |
N/A |
|
|
GreetingLine |
N/A |
|
|
If |
N/A |
|
|
MergeField |
N/A |
|
|
MergeRec |
N/A |
|
|
MergeSeq |
N/A |
|
|
Next |
N/A |
|
|
NextIf |
N/A |
|
|
Set |
N/A |
|
|
SkipIf |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
AutoNum |
N/A |
|
|
AutoNumLgl |
N/A |
|
|
AutoNumOut |
N/A |
|
|
BarCode |
N/A |
|
|
ListNum |
N/A |
|
|
Page |
N/A |
|
|
RevNum |
N/A |
|
|
Section |
N/A |
|
|
SectionPages |
N/A |
|
|
Seq |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
UserAddress |
N/A |
|
|
UserInitials |
N/A |
|
|
UserName |
N/A |
|
|
Aspose.Words fully supports all features of hyperlink fields.
You can create new hyperlinks by using the DocumentBuilder class. You can also find and edit hyperlinks inside the DOM and change the address of an existing hyperlink.
Imported from <a> element. Several different objects can have hyperlinks imported from this element. The most common is plain text which is imported a regular hyperlink.
If the <a> element has image as a child then the hyperlink is imported on Shape node.
See the following links in the documentation for further information:
· DocumentBuilder.InsertHyperlink
· How to Replace or Modify Hyperlinks
Feature |
Supported |
Comment |
See Also |
Text |
Yes |
|
|
Hyperlinked Shape or Image |
Yes |
|
|
Hyperlink across Multiple Paragraphs |
N/A |
Hyperlinks across multiple paragraphs are exported as separate hyperlinks. On round-trip these are imported as several separate hyperlinks. |
|
Hyperlink to a Local Bookmark |
Yes |
|
|
Hyperlink to an External Resource |
Yes |
|
|
Screen Tip |
Planned |
|
|
Target Frame |
Yes |
Imported from target="_XXX" attribute. |
|
Feature |
Supported |
Comment |
See Also |
Date and Time Formatting |
N/A |
|
|
Numbering Formatting |
N/A |
|
|
General Formatting |
N/A |
|
|
Aspose.Words supports many types of drawing entities on document load.
Graphic objects in any document format loaded into Aspose.Words are represented in the model by Shape nodes. If you are loading an OOXML document such as the DOCX format then you may have such content imported as a DrawingML node. Both node types provide similar members which allow you to access and modify both the image data and also the properties of the object such as positioning and behavior.
Using Aspose.Words you can create and modify different types of graphic objects.
Almost all properties that deal with object positioning use points as a unit of measurment. There is a class to help work with points by converting different types of units to and from points e.g pixel to point, point to inch.
Images can be imported from link (src) or from embedded base64 image data.
There is a load option avalible to control whether external images are downloaded, left as link only or the bytes of the image data are provided by the user through the use of the IResourceLoadingCallback
You can also set the BaseUri path of the document being loaded so relative resources can be correctly imported.
"px" measurement unit is currently imported as if resolution is 96dpi. There will be a load option to control this in the future.
You can insert new images of any type into a document by using the DocumentBuilder.InsertImage method or by setting the image of an existing shape using the Shape.ImageData property.
All of the following image types listed in the table below this overview are supported. When a document contains multiple references to the same image from an from an external address (e.g the internet) then the image is only downloaded once.
It is useful to know how images are stored in the model when you insert a new image using Aspose.Words There are three classes of image from the Aspose.Words point-of-view.
1. Microsoft Word Native (which can be stored directly in model without any changes). These are the JPEG, PNG, and PICT formats and are left untouched during insertion.
2. Windows Metafiles (can also be stored directly in the model). These are the EMF and WMF vector formats and are left untouched during insertion.
3. Microsoft Word Non-Native. These are not supported and have to be converted (to PNG) before being stored in the model. These are the GIF, TIFF and BMP formats.
Aspose.Words automatically converts the formats found in the third item if such a format is inserted into a document.
The reason why the formats found in the third item must be converted to PNG is because Microsoft Word formats don't support the GIF or TIFF formats. It makes sense to store these in memory in a format that is supported by Microsoft Word. Note that when you insert an image of these types in Microsoft Word it also converts them to PNG in the same way behind the scenes.
BMP is the exception and is supported by Microsoft Word. However, since a BMP stored in memory is often very large it too is converted to PNG to save memory.
Note that PNG is a lossless compression format, so there is no degregration of image quality using the above techniques.
If you are using Aspose.Words for Java you may need to ensure that you have the appropriate JAI image libraries installed in order for Aspose.Words to convert GIF, TIFF and BMP formats to PNG. If the required functionality is missing you may recieve a "Image type not supported" exception.
Images can be imported from link (src) or from embedded base64 image data.
There is a load option avalible to control whether external images are downloaded, left as link only or the bytes of the image data are provided by the user through the use of the IResourceLoadingCallback
You can also set the BaseUri path of the document being loaded so relative resources can be correctly imported.
"px" measurement unit is currently imported as if resolution is 96dpi. There will be a load option to control this in the future.
See the following links in the documentation for further information:
· Shape.IsImage
· LoadOptions.BaseUri
· Shape.ImageData
· ImageData.ImageType
· ConvertUtil
· IResourceLoadingCallback
Feature |
Supported |
Comment |
See Also |
PNG |
Yes |
|
|
JPG |
Yes |
|
|
WMF |
Yes |
|
|
EMF |
Yes |
|
|
EMF+ |
Yes |
|
|
BMP |
Yes |
|
|
GIF |
Yes |
|
|
TIFF |
Yes |
|
|
Borders |
Planned |
Native borders will be imported from style attributes such as border-style, border-color etc. Some complex borders may have been already exported in raster form and are imported back correctly but borders cannot be modified or removed. |
· ImageData.Borders |
Cropping |
Yes |
During export images are cropped permantly and cropping cannot be removed when round-tripped back into Word document formats. |
· ImageData.CropLeft · ImageData.CropRight · ImageData.CropTop · ImageData.CropBottom |
Alternative text |
Yes |
Imported from alt=xxx. |
· Shape.AltText |
Feature |
Supported |
Comment |
See Also |
Brightness |
Yes |
Brightness modifier is applied to the image during export. The image brightness is preserved on round-tripped but it cannot be modified. |
· ImageData.Brightness |
Contrast |
Yes |
Contrast modifier is applied to the image during export. The image contrast is preserved on round-tripped but it cannot be modified. |
· ImageData.Contrast |
Recolor |
Planned |
|
|
Textboxes are rastered to image during export to HTML to improve fidelity.
Upon subsequent import this content appears correctly (the same as the textbox with settings) but is imported as an image and not as a working textbox. Text is not editable and textbox settings cannot be changed.
There is no tag that directly imports a new textbox from HTML.
See the following link in the documentation for further information:
· Shape.TextBox
Feature |
Supported |
Comment |
See Also |
Text Direction |
Yes |
|
· TextBox.LayoutFlow |
Linked Textboxes |
Planned |
Linked text boxes are supported in Aspose.Words model, however there is currently no API to access or modify these values. |
|
Internal Margins |
Yes |
|
· TextBox.InternalMarginLeft · TextBox.InternalMarginRight · TextBox.InternalMarginTop · TextBox.InternalMarginBottom |
Vertical Alignment |
Yes |
|
|
Resize To Fit Text |
Yes |
|
· TextBox.FitShapeToText |
Text in Other Shapes |
Yes |
|
|
OLE Objects are exported as images so are imported back as regular images and not OLE Objects.
Feature |
Supported |
Comment |
See Also |
Linked |
N/A |
|
|
Embedded |
N/A |
|
|
Draw Aspect |
N/A |
|
|
Auto Update |
N/A |
|
|
Lock |
N/A |
|
|
Ole Object Data |
N/A |
|
|
Ole Object Picture |
N/A |
|
|
Source Range |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Persistent Properties Storage |
N/A |
|
|
Aspose.Words supports almost all Shape and Image elements. References to external images such as ones on the internet are automatically downloaded as well. All of these elements are imported into Aspose.Words as Shape nodes.
Using Aspose.Words you can create any type of new shape including images, AutoShapes etc. you can also access, modify and remove such elements from a document.
Most common properties such as borders or position can be modified through the API. There is currently no API for modifying advanced shape properties e.g ArcSize of a RoundRectangle.
There is also no API for creating or modifiying advanced features such as Diagrams, Ink Annotations or Charts. These elements are retained fully during conversion.
Shapes which are linked to external resources such as images on the internet can be automatically downloaded when required.
During export most shapes are rendered to HTML as regular images. On import a Shape is loaded as a regular image and not as a working AutoShape, Diagram or SmartArt object.
There is no tag that directly imports a new shape object from HTML.
See the following links in the documentation for further information:
· Shape
· Shape.ShapeType
· Shape.IsTopLevel
· LoadOptions.ResourceLoadingCallback
Feature |
Supported |
Comment |
See Also |
Lines |
Yes |
All rasterized elements are imported as regular images. |
|
Basic Shapes |
Yes |
|
|
Block Arrows |
Yes |
|
|
Flowcharts |
Yes |
|
|
Callouts |
Yes |
|
|
Stars and Banners |
Yes |
|
|
Group Shape |
Yes |
|
· GroupShape · Shape.IsGroup |
Drawing Canvas |
Yes |
|
|
Signature Line |
N/A |
|
|
Ink Annotation |
N/A |
|
|
Clip Art |
Yes |
|
|
Diagrams (VML) |
Planned |
VML graphics format is normally used in pre-OOXML formats such as DOC or RTF. VML in comments as exported by Word is not imported. |
|
SmartArt (VML) |
Planned |
Represented as a groupshape with child shapes representing the different elements. You can add, modify or remove parts of the smart art. You can also extract the plain text content. |
|
Charts (VML) |
Planned |
Currently there is no API for accessing or modifying the content of a chart. You cannot retrieve the text of a chart. |
|
Shape Customizations |
N/A |
|
|
Hyperlink on Shape |
Yes |
Imported from parent <a> element of the <img> element. |
· Shape.HRef |
Watermark |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Images/Shapes |
N/A |
|
|
Diagrams |
N/A |
|
|
SmartArt |
N/A |
|
|
Charts |
N/A |
|
|
WordArt is imported as a Shape object in Aspose.Words. This class provides properties to extract and modify properties of a WordArt object.
Using Aspose.Words you can create new WordArt graphics. Note that not all WordArt features are avalible through the API.
During export WordArt is exported to HTML as a regular image. On import this feature is loaded as a regular image and not as a working WordArt object.
There is no tag that directly imports a new WordArt object from HTML.
See the following links in the documentation for further information:
· Shape.IsWordArt
· Shape.TextPath
Feature |
Supported |
Comment |
See Also |
Styles |
Yes |
|
|
Outline |
Yes |
|
|
Fill |
Yes |
|
|
3D Properties |
Planned |
|
|
Text Spacing |
Planned |
|
· TextBox.Spacing |
Vertical Text |
Planned |
|
· TextBox.TextPathAlignment |
Even Height |
Planned |
|
· TextPath.SameLetterHeight |
Align and Justify Text |
Planned |
|
· Textbox.TextPathAlignment |
WordArt Shape |
Planned |
|
|
Horizontal Line Objects are represented as a Shape node in Aspose.Words. Since a Shape can also represent an image there is a property which returns if this shape is a Horizontal Line Object.
Using Aspose.Words you can create new or modify existing Horizontal Rule objects.
Imported from <hr> element.
See the following link in the documentation for further information:
· Shape.IsHorizontalRule
Feature |
Supported |
Comment |
See Also |
Width |
Yes |
Width appears in the API only as absoloute points and not as percent as what Horizontal Line widths are normally calcuated in.The percent value can be calculated by using the width of the page. Imported from width:XXX% on style attribute. |
· Shape.Width |
Height |
Yes |
Imported from height:XXpt on style attribute. |
· Shape.Height |
Color |
Yes |
Imported from color on style attribute. Note that border:none must be present on the style attribute for the color to be imported correctly. |
· Shape.FillColor |
Alignment |
Yes |
Imported from "text-align:XXX" on style attribute. |
· Shape.HorizontalAlignment |
Hyperlink |
Yes |
An <hr> tag wrapped with an <a> hyperlink element is imported as a working hyperlink. However this link is not of hyperlink property on a Horiziontal Rule Object, instead the object is wrapped with a Hyperlink field. |
· Shape.HRef |
Image |
Planned |
There are plans to import a horizontal line with an image from <hr> element with style="background: url(xxx.png)". |
· Shape.HRef |
Aspose.Words supports creating objects with a variety of different positioning settings. Almost all possible settings are supported in the Aspose.Words model
You can also access and modify existing shape's positioning.
Currently all drawing objects are imported as inline.
This will be improved in a future version.
See the following links in the documentation for further information:
· Shape.Top
· Shape.Width
Feature |
Supported |
Comment |
See Also |
Inline |
Yes |
|
· Shape.IsInline |
Floating |
Planned |
In a Word document floating content is anchored to a paragraph. When a document is loaded into Aspose.Words this anchor is represented by the position of the Shape node in relation to Paragraph and the Runs of text. |
|
Wrap Type |
Planned |
|
· Shape.WrapType |
Wrap Sides |
Planned |
|
· Shape.WrapSide |
Distance from Text |
Planned |
|
· Shape.DistanceFromTextTop · Shape.DistanceFromTextBottom · Shape.DistanceFromTextLeft · Shape.DistanceFromTextRight |
Z-Order |
Planned |
|
· Shape.ZOrder |
Polygon Wrap Points |
N/A |
|
|
Rotation |
Yes |
Using AsposeWords rotation is exported by converting the shape to image and including the rotation in the process. The imported shape will appear rotated but won't be true editable rotation. |
· Shape.Rotation |
Flip |
Yes |
Using AsposeWords flip is exported by converting the shape to image and flipping the shape in the process. The imported shape will appear flipped but won't be a true "flip". |
· Shape.FlipOrientation |
Horizontal Alignment |
Planned |
|
· Shape.HorizontalAlignment |
Horizontal Position Relative To |
Planned |
|
· Shape.RelativeHorizontalPosition |
Vertical Alignment |
Planned |
|
· Shape.VerticalAlignment |
Vertical Position Relative To |
Planned |
|
· Shape.RelativeVerticalPosition |
Anchor Lock |
N/A |
|
|
Allow Overlap |
N/A |
|
|
Layout in Table Cell |
N/A |
|
|
Feature |
Supported |
Comment |
See Also |
Width and Height |
Yes |
Imported from height and width attributes. It is planned to import these attributes using the style attribute on <img>. |
· Shape.Width · Shape.Height |
Scale |
N/A |
Only the absolute size of the input image is taken as the shape size. |
|
Relative Size |
N/A |
|
|
Lock Aspect Ratio |
N/A |
Imported as enabled on shapes by default. |
|
Using Aspose.Words you can access, modify and remove most fill properties of a shape.
Fill is exported on image during export. During round-trip this is imported as an image shape which looks visually the same but the fill properties cannot be edited.
See the following link in the documentation for further information:
· Shape.Fill
Feature |
Supported |
Comment |
See Also |
No Fill |
Yes |
|
· Shape.Filled |
Solid Fill |
Yes |
|
· Shape.FillColor |
Gradient Fill |
Yes |
There is currently no API for accessing or modifying the graident fill of a shape. |
|
Pattern Fill |
Yes |
The raw bytes of the Pattern fill can be extracted only. A new pattern can not be set. |
· Fill.ImageBytes |
Picture or Texture Fill |
Yes |
The raw bytes of the Texture fill can be extracted only. A new texture or image can not be set. |
· Fill.ImageBytes |
Line styles will visually be imported properly. However these lines are images and are imported as shapes. Line style properties cannot be edited.
See the following links in the documentation for further information:
· Shape.Stroke
· Shape.Stroked
Feature |
Supported |
Comment |
See Also |
Line Color |
Yes |
|
· Stroke.Color · Stroke.Color2 |
Line Fill |
Yes |
|
· Stroke.ImageBytes |
Line Width |
Yes |
|
· Stroke.Weight |
Compound Type |
Yes |
|
· Stroke.LineStyle |
Dash Type |
Yes |
|
· Stroke.DashStyle |
Cap Type |
Yes |
|
· Stroke.Cap |
Join Type |
Yes |
|
· Stroke.JoinStyle |
Arrow Settings |
Yes |
|
· Stroke.StartArrowLength · Stroke.StartArrowType · Stroke.EndArrowLength · Stroke.EndArrowType |
Shadow properties are currently not supported during HTML import.
Feature |
Supported |
Comment |
See Also |
Shadow |
Planned |
|
|
3D properties are currently unsupported during HTML import. It is planned to rasterize 3D effects on shape image during export to HTML.
During import this will allow 3D objects to appear similar. The 3D properties of the imported will not be editable.
Feature |
Supported |
Comment |
See Also |
3D Properties |
Planned |
|
|