You can download the complete source code of the ProcessComments sample here.
Using Comments in a Word document (in addition to Track Changes) is a common practice when reviewing documents, particularly when there are multiple reviewers. There can be situations where the only thing you need from a document is the comments. Say you want to generate a list of review findings, or perhaps you have collected all the useful information from the document and you simply want to remove unnecessary comments. You may want to view or remove the comments of a particular reviewer.
In this sample we are going to look at some simple methods for both gathering information from the comments within a document and for removing comments from a document. Specifically we’ll cover how to:
· Extract all the comments from a document or only the ones made by a particular author.
· Remove all the comments from a document or only from a particular author.
To illustrate how to extract and remove comments from a document, we will go through the following steps:
1. Open a Word document using the Aspose.Words.Document class.
2. Get all comments from the document into a collection.
3. To extract comments:
a. Go through the collection using the foreach operator.
b. Extract and list the author name, date & time and text of all comments.
c. Extract and list the author name, date & time and text of comments written by a specific author, in this case the author ‘ks’.
4. To remove comments:
a. Go backwards through the collection using the for operator.
b. Remove comments.
5. Save the changes.
We’re going to use the following Word document for this exercise:
As you can see, it contains several Comments from two authors with the initials “pm” and “ks”.
The code in this sample is actually quite simple and all methods are based on the same approach. A comment in a Word document is represented by a Comment object in the Aspose.Words document object model. To collect all the comments in a document use the Document.GetChildNodes method with the first parameter set to NodeType.Comment. Make sure that the second parameter of the Document.GetChildNodes method is set to true: this forces the Document.GetChildNodes to select from all child nodes recursively, rather than only collecting the immediate children.
The Document.GetChildNodes method is very useful and you can use it every time you need to get a list of document nodes of any type. The resulting collection does not create an immediate overhead because the nodes are selected into this collection only when you enumerate or access items in it.
Example
Extracts the author name, date&time and text of all comments in the document.
[Java]
static ArrayList extractComments(Document doc) throws Exception
{
ArrayList collectedComments = new ArrayList();
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Look through all comments and gather information about them.
for (Comment comment : (Iterable<Comment>) comments)
{
collectedComments.add(comment.getAuthor() + " " + comment.getDateTime() + " " + comment.toString(SaveFormat.TEXT));
}
return collectedComments;
}
After you have selected Comment nodes into a collection, all you have to do is extract the information you need. In this sample, author initials, date, time and the plain text of the comment is combined into one string; you could choose to store it in some other ways instead.
The overloaded method that extracts the Comments from a particular author is almost the same, it just checks the author’s name before adding the info into the array.
Example
Extracts the author name, date&time and text of the comments by the specified author.
[Java]
static ArrayList extractComments(Document doc, String authorName) throws Exception
{
ArrayList collectedComments = new ArrayList();
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Look through all comments and gather information about those written by the authorName author.
for (Comment comment : (Iterable<Comment>) comments)
{
if (comment.getAuthor().equals(authorName))
collectedComments.add(comment.getAuthor() + " " + comment.getDateTime() + " " + comment.toString(SaveFormat.TEXT));
}
return collectedComments;
}
If you are removing all comments, there is no need to move through the collection deleting comments one by one; you can remove them by calling NodeCollection.Clear on the comments collection.
Example
Removes all comments in the document.
[Java]
static void removeComments(Document doc) throws Exception
{
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Remove all comments.
comments.clear();
}
When you need to selectively remove comments, the process becomes more similar to the code we used for comment extraction.
Example
Removes comments by the specified author.
[Java]
static void removeComments(Document doc, String authorName) throws Exception
{
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Look through all comments and remove those written by the authorName author.
for (int i = comments.getCount() - 1; i >= 0; i--)
{
Comment comment = (Comment)comments.get(i);
if (comment.getAuthor().equals(authorName))
comment.remove();
}
}
The main point to highlight here is the use of the for operator. Unlike the simple extraction, here you want to delete a comment. A suitable trick is to iterate the collection backwards from the last Comment to the first one. The reason for this if you start from the end and move backwards, the index of the preceding items remains unchanged, and you can work your way back to the first item in the collection.
Example
The demo-code that illustrates the methods for the comments extraction and removal.
[Java]
// Extract the information about the comments of all the authors.
for (String comment : (Iterable<String>) extractComments(doc))
System.out.print(comment);
// Remove comments by the "pm" author.
removeComments(doc, "pm");
System.out.println("Comments from \"pm\" are removed!");
// Extract the information about the comments of the "ks" author.
for (String comment : (Iterable<String>) extractComments(doc, "ks"))
System.out.print(comment);
// Remove all comments.
removeComments(doc);
System.out.println("All comments are removed!");
// Save the document.
doc.save(dataDir + "Test File Out.doc");
When launched, the sample displays the following results. First it lists all comments by all authors, then it lists comments by the selected author only. Finally, the code removing all comments.
The output Word document has now comments removed from it: