An EPUB document (short for electronic publication) is HTML-based format commonly used for electronic book distribution. This format is fully supported in Aspose.Words for exporting electronic books compatible with majority of devices used for reading. This article shows how to convert simple MS Word document to EPUB with a few lines of code. It also demonstrates what a sample document looks like after being converted to EPUB using Aspose.Words.
Example
Converts a document to EPUB using default save options.
[Java]
// Open an existing document from disk.
Document doc = new Document(getMyDir() + "Document.EpubConversion.doc");
// Save the document in EPUB format.
doc.save(getMyDir() + "Document.EpubConversion Out.epub");
You can specify a number of options by passing an instance of HtmlSaveOptions to the Document.Save method. The code snippet below shows a few of them in action.
Example
Converts a document to EPUB with save options specified.
[Java]
// Open an existing document from disk.
Document doc = new Document(getMyDir() + "Document.EpubConversion.doc");
// Create a new instance of HtmlSaveOptions. This object allows us to set options that control
// how the output document is saved.
HtmlSaveOptions saveOptions =
new HtmlSaveOptions();
// Specify the desired encoding.
saveOptions.setEncoding(Charset.forName("UTF-8"));
// Specify at what elements to split the internal HTML at. This creates a new HTML within the EPUB
// which allows you to limit the size of each HTML part. This is useful for readers which cannot read
// HTML files greater than a certain size e.g 300kb.
saveOptions.setDocumentSplitCriteria(DocumentSplitCriteria.HEADING_PARAGRAPH);
// Specify that we want to export document properties.
saveOptions.setExportDocumentProperties(true);
// Specify that we want to save in EPUB format.
saveOptions.setSaveFormat(SaveFormat.EPUB);
// Export the document as an EPUB file.
doc.save(getMyDir() + "Document.EpubConversion Out.epub", saveOptions);
In the next few paragraphs we’ll review the results of a sample document converted to EPUB format. The screenshots below shows the key features.
Since EPUB is a publishing format for electronic books, it’s apparent that the most import features will involve text. At a glance, we can see the text and all key features in the EPUB output look identical to the source document.
The picture below shows the key text formatting features after conversion to EPUB.
A wide range of paragraph formatting settings used in following example perform equal to the source document.
The following picture shows how great tables are rendered despite of their complexity.
Even complex lists from the source document are exported well to EPUB.
Images are essential for most publications and can be aligned differently on the screen.
This picture shows an table of contents generated from source document exported as inline text with working hyperlinks. The same headings which make up the TOC in the source document are exported to the navigation pane in the EPUB for easy navigation.
The EPUB documents produced by Aspose.Words pass validation which means that EPUB standards are adhered to and there are no errors with the EPUB.
Even though passing validation doesn’t guarantee that every device or EPUB viewer will display your document in exactly the same way, it does however give the highest chance that your document will be viewed as close as possible as originally intended.
The picture below shows report on the document we just converted on one of the validation services.
Meta-data is additional information such as Author Name, Title, Comments, etc. added to a file that’s not visible in the content of the file itself.
Word document formats have special properties dedicated to such metadata and this can exported o EPUB files as well. Metadata fields are often required by distributors and e-book stores as keywords for their search engines and providing information about books for customers.
The picture below shows the metadata after the conversion.