com.aspose.words
Class HtmlLoadOptions

java.lang.Object
  extended by LoadOptions
      extended by com.aspose.words.HtmlLoadOptions

public class HtmlLoadOptions 
extends LoadOptions

Allows to specify additional options when loading HTML document into a Document object.

Constructor Summary
HtmlLoadOptions()
           Initializes a new instance of this class with default values.
HtmlLoadOptions(java.lang.String password)
           A shortcut to initialize a new instance of this class with the specified password to load an encrypted document.
HtmlLoadOptions(int loadFormat, java.lang.String password, java.lang.String baseUri)
           A shortcut to initialize a new instance of this class with properties set to the specified values.
 
Property Getters/Setters Summary
booleangetAllowTrailingWhitespaceForListItems()→ inherited from LoadOptions
voidsetAllowTrailingWhitespaceForListItems(boolean value)
           Allows to specify how numbered list items are recognized when document is imported from plain text format. The default value is true.
java.lang.StringgetBaseUri()→ inherited from LoadOptions
voidsetBaseUri(java.lang.String value)
           Gets or sets the string that will be used to resolve relative URIs found in the document into absolute URIs when required. Can be null or empty string. Default is null.
java.nio.charset.CharsetgetEncoding()→ inherited from LoadOptions
voidsetEncoding(java.nio.charset.Charset value)
           Gets or sets the encoding that will be used to load an HTML or TXT document if the encoding is not specified in HTML/TXT. Can be null. Default is null.
FontSettingsgetFontSettings()→ inherited from LoadOptions
voidsetFontSettings(FontSettings value)
           Allows to specify document font settings.
LanguagePreferencesgetLanguagePreferences()→ inherited from LoadOptions
           Gets language preferences that will be used when document is loading.
intgetLoadFormat()→ inherited from LoadOptions
voidsetLoadFormat(int value)
           Specifies the format of the document to be loaded. Default is LoadFormat.AUTO. The value of the property is LoadFormat integer constant.
intgetMswVersion()→ inherited from LoadOptions
voidsetMswVersion(int value)
           Allows to specify that the document loading process should match a specific MS Word version. Default value is MsWordVersion.WORD_2007The value of the property is MsWordVersion integer constant.
java.lang.StringgetPassword()→ inherited from LoadOptions
voidsetPassword(java.lang.String value)
           Gets or sets the password for opening an encrypted document. Can be null or empty string. Default is null.
intgetPreferredControlType()
voidsetPreferredControlType(int value)
           Gets or sets preffered type of document nodes that will represent imported <input> and <select> elements. Default value is HtmlControlType.FORM_FIELD. The value of the property is HtmlControlType integer constant.
booleangetPreserveIncludePictureField()→ inherited from LoadOptions
voidsetPreserveIncludePictureField(boolean value)
           Gets or sets whether to preserve the INCLUDEPICTURE field when reading Microsoft Word formats. The default value is false.
IResourceLoadingCallbackgetResourceLoadingCallback()→ inherited from LoadOptions
voidsetResourceLoadingCallback(IResourceLoadingCallback value)
           Allows to control how external resources (images, style sheets) are loaded when a document is imported from HTML, MHTML.
booleangetSupportVml()
voidsetSupportVml(boolean value)
           Specifies HTML parser to parse conditional comments exactly like <!--[if gte vml 1]> and not to parse conditional comments exactly like <![if !vml]>.
booleangetUpdateDirtyFields()→ inherited from LoadOptions
voidsetUpdateDirtyFields(boolean value)
           Specifies whether to update the fields with the dirty attribute.
IWarningCallbackgetWarningCallback()→ inherited from LoadOptions
voidsetWarningCallback(IWarningCallback value)
           Called during a load operation, when an issue is detected that might result in data or formatting fidelity loss.
intgetWebRequestTimeout()
voidsetWebRequestTimeout(int value)
           The number of milliseconds to wait before the web request times out. The default value is 100000 milliseconds (100 seconds).
 

Constructor Detail

HtmlLoadOptions

public HtmlLoadOptions()
Initializes a new instance of this class with default values.

HtmlLoadOptions

public HtmlLoadOptions(java.lang.String password)
A shortcut to initialize a new instance of this class with the specified password to load an encrypted document.
Parameters:
password - The password to open an encrypted document. Can be null or empty string.

HtmlLoadOptions

public HtmlLoadOptions(int loadFormat, java.lang.String password, java.lang.String baseUri)
A shortcut to initialize a new instance of this class with properties set to the specified values.
Parameters:
loadFormat - A LoadFormat value. The format of the document to be loaded.
password - The password to open an encrypted document. Can be null or empty string.
baseUri - The string that will be used to resolve relative URIs to absolute. Can be null or empty string.

Property Getters/Setters Detail

getAllowTrailingWhitespaceForListItems/setAllowTrailingWhitespaceForListItems

→ inherited from LoadOptions
public boolean getAllowTrailingWhitespaceForListItems() / public void setAllowTrailingWhitespaceForListItems(boolean value)
Allows to specify how numbered list items are recognized when document is imported from plain text format. The default value is true.

This property is used only when loading plain text documents.

If set to true, lists recognition algorithm allows list numbers to end with either dot or whitespace character.

If this option is set to false then the list item is only recognized as such if the leading number is ending with dot "." symbol.


getBaseUri/setBaseUri

→ inherited from LoadOptions
public java.lang.String getBaseUri() / public void setBaseUri(java.lang.String value)
Gets or sets the string that will be used to resolve relative URIs found in the document into absolute URIs when required. Can be null or empty string. Default is null.

This property is used to resolve relative URIs into absolute in the following cases:

  1. When loading an HTML document from a stream and the document contains images with relative URIs and does not have a base URI specified in the BASE HTML element.
  2. When saving a document to PDF and other formats, to retrieve images linked using relative URIs so the images can be saved into the output document.

Example:

Opens an HTML document with images from a stream using a base URI.
// We are opening this HTML file:
//    <html>
//    <body>
//    <p>Simple file.</p>
//    <p><img src="Aspose.Words.gif" width="80" height="60"></p>
//    </body>
//    </html>
String fileName = getMyDir() + "Document.OpenFromStreamWithBaseUri.html";

// Open the stream.
InputStream stream = new FileInputStream(fileName);

// Open the document. Note the Document constructor detects HTML format automatically.
// Pass the URI of the base folder so any images with relative URIs in the HTML document can be found.
LoadOptions loadOptions = new LoadOptions();
loadOptions.setBaseUri(getMyDir());
Document doc = new Document(stream, loadOptions);

// You can close the stream now, it is no longer needed because the document is in memory.
stream.close();

// Save in the DOC format.
doc.save(getMyDir() + "Document.OpenFromStreamWithBaseUri Out.doc");

getEncoding/setEncoding

→ inherited from LoadOptions
public java.nio.charset.Charset getEncoding() / public void setEncoding(java.nio.charset.Charset value)
Gets or sets the encoding that will be used to load an HTML or TXT document if the encoding is not specified in HTML/TXT. Can be null. Default is null.

This property is used only when loading HTML or TXT documents.

If encoding is not specified in HTML/TXT and this property is null, then the system will try to automatically detect the encoding.


getFontSettings/setFontSettings

→ inherited from LoadOptions
public FontSettings getFontSettings() / public void setFontSettings(FontSettings value)
Allows to specify document font settings.

When loading some formats, Aspose.Words may require to resolve the fonts. For example, when loading HTML documents Aspose.Words may resolve the fonts to perform font fallback.

If set to null, default static font settings FontSettings.DefaultInstance will be used.

The default value is null.


getLanguagePreferences

→ inherited from LoadOptions
public LanguagePreferences getLanguagePreferences()
Gets language preferences that will be used when document is loading.

getLoadFormat/setLoadFormat

→ inherited from LoadOptions
public int getLoadFormat() / public void setLoadFormat(int value)
Specifies the format of the document to be loaded. Default is LoadFormat.AUTO. The value of the property is LoadFormat integer constant.

It is recommended that you specify the LoadFormat.AUTO value and let Aspose.Words detect the file format automatically. If you know the format of the document you are about to load, you can specify the format explicitly and this will slightly reduce the loading time by the overhead associated with auto detecting the format. If you specify an explicit load format and it will turn out to be wrong, the auto detection will be invoked and a second attempt to load the file will be made.


getMswVersion/setMswVersion

→ inherited from LoadOptions
public int getMswVersion() / public void setMswVersion(int value)
Allows to specify that the document loading process should match a specific MS Word version. Default value is MsWordVersion.WORD_2007The value of the property is MsWordVersion integer constant. Different Word versions may handle certain aspects of document content and formatting slightly differently during the loading process, which may result in minor differences in Document Object Model.

getPassword/setPassword

→ inherited from LoadOptions
public java.lang.String getPassword() / public void setPassword(java.lang.String value)
Gets or sets the password for opening an encrypted document. Can be null or empty string. Default is null.

You need to know the password to open an encrypted document. If the document is not encrypted, set this to null or empty string.


getPreferredControlType/setPreferredControlType

public int getPreferredControlType() / public void setPreferredControlType(int value)
Gets or sets preffered type of document nodes that will represent imported <input> and <select> elements. Default value is HtmlControlType.FORM_FIELD. The value of the property is HtmlControlType integer constant. Please note that setting this property does not guarantee that all imported controls will be of the specified type. If an HTML control is not representable with document nodes of the preferred type, Aspose.Words will use a compatible HtmlControlType for that control.

getPreserveIncludePictureField/setPreserveIncludePictureField

→ inherited from LoadOptions
public boolean getPreserveIncludePictureField() / public void setPreserveIncludePictureField(boolean value)
Gets or sets whether to preserve the INCLUDEPICTURE field when reading Microsoft Word formats. The default value is false.

By default, the INCLUDEPICTURE field is converted into a shape object. You can override that if you need the field to be preserved, for example, if you wish to update it programmatically. Note however that this approach is not common for Aspose.Words. Use it on your own risk.

One of the possible use cases may be using a MERGEFIELD as a child field to dynamically change the source path of the picture. In this case you need the INCLUDEPICTURE to be preserved in the model.


getResourceLoadingCallback/setResourceLoadingCallback

→ inherited from LoadOptions
public IResourceLoadingCallback getResourceLoadingCallback() / public void setResourceLoadingCallback(IResourceLoadingCallback value)
Allows to control how external resources (images, style sheets) are loaded when a document is imported from HTML, MHTML.

getSupportVml/setSupportVml

public boolean getSupportVml() / public void setSupportVml(boolean value)
Specifies HTML parser to parse conditional comments exactly like <!--[if gte vml 1]> and not to parse conditional comments exactly like <![if !vml]>.

getUpdateDirtyFields/setUpdateDirtyFields

→ inherited from LoadOptions
public boolean getUpdateDirtyFields() / public void setUpdateDirtyFields(boolean value)
Specifies whether to update the fields with the dirty attribute.

getWarningCallback/setWarningCallback

→ inherited from LoadOptions
public IWarningCallback getWarningCallback() / public void setWarningCallback(IWarningCallback value)
Called during a load operation, when an issue is detected that might result in data or formatting fidelity loss.

getWebRequestTimeout/setWebRequestTimeout

public int getWebRequestTimeout() / public void setWebRequestTimeout(int value)
The number of milliseconds to wait before the web request times out. The default value is 100000 milliseconds (100 seconds). The number of milliseconds that Aspose.Words waits for a response, when loading external resources (images, style sheets) linked in HTML and MHTML documents.

See Also:
          Aspose.Words Documentation - the home page for the Aspose.Words Product Documentation.
          Aspose.Words Support Forum - our preferred method of support.