Aspose.Words

Supported Features on Loading Plain Text (TXT) Files

You can download the complete source code of the LoadTxt sample here.

Aspose.Words allows you to import plain text data the same way as other document formats, by using the Document constructor.

Example LoadTxt

Loads a plain text file into an Aspose.Words.Document object.

[Java]

 

package LoadTxt;

 

import java.io.*;

import java.io.File;

import java.net.URI;

 

import com.aspose.words.Document;

 

 

class Program

{

    public static void main(String[] args) throws Exception

    {

        // Sample infrastructure.

        URI exeDir = Program.class.getResource("").toURI();

        String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

 

        // The encoding of the text file is automatically detected.

        Document doc = new Document(dataDir + "LoadTxt.txt");

 

        // Save as any Aspose.Words supported format, such as DOCX.

        doc.save(dataDir + "LoadTxt Out.docx");

    }

}

 

 

Plain text format is a basic format that does not require advanced text processor to be viewed or edited. However some plain text files attempt to demonstrate of more complex formats such as lists and indentation. For example, a list might be represented as a series of lines, each starting with the same character.

Aspose.Words attempts to detect and load such features into a new document as their equivalent Microsoft word feature instead of just as plain text.

Text Import Features

The table below shows the key features of the text import engine:

Feature

Details

Text encoding

The following encoding are supported:

·          Latin1.

·          BigEndianUnicode.

·          UTF-16.

·          UTF-7.

·          UTF-8.

 

Import of ordered lists

·          Arabic number with dot or right parenthesis e.g 1. or 2). Multilevel list are supported only supported when using dot.

·          Uppercase or lowercase Latin letter with dot or right parenthesis e.g a. or b).

Import of unordered lists

Unordered lists are imported from consecutive lines which start with any of the following characters: *, o, .

Paragraph indentation

Left indent and first line indent are detected and imported for paragraphs using appropriate number space characters at the beginning of the paragraph.

Paragraph detection

Rules for detecting a new paragraph start:

·         If next line left indent isn’t equal with the current paragraph’s left indent.

·         An empty line starts a new paragraph.

·         Any list detected starts a new paragraph.

 

Sample Conversion

The sample input plain text file:

sample_input.jpg

The result of the text file loaded into Aspose.Words as saved as a DOCX document is below.

Notice that the preceding space is interpreted as indentation, and the lists are loaded as a proper list feature.

sample_output.jpg