Aspose.Words

How-to: Extract Images from a Document

All images are stored inside Shape nodes in a Document.

To extract all images or images having specific type from the document, follow these steps:

·          Use the Document.GetChildNodes method to select all Shape nodes.

·          Iterate through resulting node collections.

·          Check the Shape.HasImage boolean property.

·          Extract image data using the Shape.ImageData property.

·          Save image data to a file.

Example ExtractImagesToFiles

Shows how to extract images from a document and save them as files.

[Java]

 

public void extractImagesToFiles() throws Exception

{

    Document doc = new Document(getMyDir() + "Image.SampleImages.doc");

 

    NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);

    int imageIndex = 0;

    for (Shape shape : (Iterable<Shape>) shapes)

    {

        if (shape.hasImage())

        {

            String imageFileName = java.text.MessageFormat.format(

                    "Image.ExportImages.{0} Out{1}", imageIndex, FileFormatUtil.imageTypeToExtension(shape.getImageData().getImageType()));

            shape.getImageData().save(getMyDir() + imageFileName);

            imageIndex++;

        }

    }

 

    // Newer Microsoft Word documents (such as DOCX) may contain a different type of image container called DrawingML.

    // Repeat the process to extract these if they are present in the loaded document.

    NodeCollection dmlShapes = doc.getChildNodes(NodeType.DRAWING_ML, true);

    for (DrawingML dml : (Iterable<DrawingML>) dmlShapes)

    {

        if (dml.hasImage())

        {

            String imageFileName = java.text.MessageFormat.format(

                    "Image.ExportImages.{0} Out{1}", imageIndex, FileFormatUtil.imageTypeToExtension(dml.getImageData().getImageType()));

            dml.getImageData().save(getMyDir() + imageFileName);

            imageIndex++;

        }

    }

}