You can download the complete source code of the CheckFormat sample here.
When you are dealing with multiple documents in various file formats, you may need to separate out those files that can be processed by Aspose.Words from those that cannot. You may also want to know why some of the documents cannot be processed.
If you attempt to load a file into a Document object and Aspose.Words cannot recognize the file format or the format is not supported, Aspose.Words will throw an exception. You can catch those exceptions and analyze them, but Aspose.Words also provides a specialized method that allows to quickly determine the file format without loading a document with possible exceptions.
This article describes how you can check the format compatibility of all files in the selected folder and sort them by file format into appropriate subfolders.
To do this, we will work through the following steps in the code:
1. Get the collection of all files in the selected folder.
2. Loop through the collection.
3. For each file:
a. Check the file format.
b. Display the check results.
c. Move the file to the appropriate folder.
The following files are used in this sample. The file name is on the left and its description is on the right.
Input Document |
Type |
Test File (docx).docx |
Office Open XML WordprocessingML document without macros. |
Test File (docm).docm |
Office Open XML WordprocessingML document with macros. |
Test File (doc).doc |
Microsoft Word 97 - 2003 document. |
Test File (rtf).rtf |
Rich Text Format document. |
Test File (dot).dot |
Microsoft Word 97 - 2003 template |
Test File (dotx).dotx |
Office Open XML WordprocessingML template. |
Test File (HTML).html |
HTML document. |
Test File (MHTML).mhtml |
MHTML (Web archive) document. |
Test File (WordML).xml |
Microsoft Word 2003 WordprocessingML document.
|
Test File (odt).odt |
OpenDocument Text format (OpenOffice Writer).
|
Test File (XML).xml |
FlatOPC OOXML Document. |
Input Document |
Type |
Test File (enc).doc |
Encrypted Microsoft Word 97 - 2003 document. |
Test File (enc).docx |
Encrypted Office Open XML WordprocessingML document. |
Input Document |
Type |
Test File (pre97).doc |
Microsoft Word 95 document. |
Test File (JPG).jpg |
JPEG image file. |
As we’re dealing with the content in a folder, the first thing we need to do is to get the collection of all files in this folder using the List method of the File class:
Example
Get the list of all files in the dataDir folder.
[Java]
File[] fileList = new java.io.File(dataDir).listFiles();
When all the files are collected, the rest of the work is done by a single method within the Aspose.Words component – FileFormatUtil.DetectFileFormat. The FileFormatUtil.DetectFileFormat method checks the file format, but note that it only checks the file format, it does not validate the file format. This means that there is no guarantee that the file will be opened even if FileFormatUtil.DetectFileFormat returns that it is one of the supported formats. This is because the FileFormatUtil.DetectFileFormat method reads only partial data of the file format, enough to check the file format, but not enough for complete validation.
The following code loops through the collected list of files, checks the file format of each file, displays them in the console and moves each file into the appropriate folder:
Example
Check each file in the folder and move it to the appropriate subfolder.
[Java]
// Loop through all found files.
for (File file : fileList)
{
if (file.isDirectory())
continue;
// Extract and display the file name without the path.
String nameOnly = file.getName();
System.out.print(nameOnly);
// Check the file format and move the file to the appropriate folder.
String fileName = file.getPath();
FileFormatInfo info = FileFormatUtil.detectFileFormat(fileName);
// Display the document type.
switch (info.getLoadFormat())
{
case LoadFormat.DOC:
System.out.println("\tMicrosoft Word 97-2003 document.");
break;
case LoadFormat.DOT:
System.out.println("\tMicrosoft Word 97-2003 template.");
break;
case LoadFormat.DOCX:
System.out.println("\tOffice Open XML WordprocessingML Macro-Free Document.");
break;
case LoadFormat.DOCM:
System.out.println("\tOffice Open XML WordprocessingML Macro-Enabled Document.");
break;
case LoadFormat.DOTX:
System.out.println("\tOffice Open XML WordprocessingML Macro-Free Template.");
break;
case LoadFormat.DOTM:
System.out.println("\tOffice Open XML WordprocessingML Macro-Enabled Template.");
break;
case LoadFormat.FLAT_OPC:
System.out.println("\tFlat OPC document.");
break;
case LoadFormat.RTF:
System.out.println("\tRTF format.");
break;
case LoadFormat.WORD_ML:
System.out.println("\tMicrosoft Word 2003 WordprocessingML format.");
break;
case LoadFormat.HTML:
System.out.println("\tHTML format.");
break;
case LoadFormat.MHTML:
System.out.println("\tMHTML (Web archive) format.");
break;
case LoadFormat.ODT:
System.out.println("\tOpenDocument Text.");
break;
case LoadFormat.OTT:
System.out.println("\tOpenDocument Text Template.");
break;
case LoadFormat.DOC_PRE_WORD_97:
System.out.println("\tMS Word 6 or Word 95 format.");
break;
case LoadFormat.UNKNOWN:
default:
System.out.println("\tUnknown format.");
break;
}
// Now copy the document into the appropriate folder.
if (info.isEncrypted())
{
System.out.println("\tAn encrypted document.");
fileCopy(fileName, new File(encryptedDir, nameOnly).getPath());
}
else
{
switch (info.getLoadFormat())
{
case LoadFormat.DOC_PRE_WORD_97:
fileCopy(fileName, new File(pre97Dir + nameOnly).getPath());
break;
case LoadFormat.UNKNOWN:
fileCopy(fileName, new File(unknownDir + nameOnly).getPath());
break;
default:
fileCopy(fileName, new File(supportedDir + nameOnly).getPath());
break;
}
}
}
The files are moved into appropriate subfolders using the additional method “fileCopy” which will copy the file from the soure location to the new location.
The sample moves all the files to subfolders and displays the following log: