问题
I have my students submit their Microsoft Word assignments to a ColdFusion 10 server. I'd like to write an error checker to check for common mistakes like not having a page number in the header, the name of the school on the title page, their name on the title page, etc. I specify a lot of APA rules. Example: The phrase "Running head:" must be in the header section of page 1 but not the rest of the paper. I assign a point value to each rule.
Ideally, this error checker would run when they submit the assignment and tell them immediately. That might require using
parser.parseFromString(str, "text/xml");
But as an alternate, if I could write a program that I run to check for errors, that could help automate my grading. In other words, using Microsoft Access or Visual Studio. But I don't want to do that because then I'd have to have Visual Studio on the server and I don't think that's going to be feasible.
The last option would be to download all the papers off the server and run a program locally, which is one step better than grading everything manually.
回答1:
I did this a few years back using VBA, refer to this article. Here is an excerpt that parses each paragraph of a document:
Public Sub ParseLines()
Dim singleLine As Paragraph
Dim lineText As String
For Each singleLine In ActiveDocument.Paragraphs
lineText = singleLine.Range.Text
'// parse the text here...
Next singleLine
End Sub
回答2:
I know you already found an answer, but I thought Id throw in Apache POI to extract the data from the word document. I know you can get the headers of the pages like so
fis = createObject("java","java.io.FileInputStream").init(ExpandPath('./mydoc.docx'));
document = createObject("java","org.apache.poi.xwpf.usermodel.XWPFDocument").init(fis);
fis.close();
policy = document.getHeaderFooterPolicy();
firstHeader = policy.getFirstPageHeader().getText();
defaultHeader = policy.getDefaultHeader().getText();
I know this is only the header portion of your request. There is a way to get the even and odd page headers too looking at the documentation.
APACHE POI
回答3:
Try out:
http://docxextractor.riaforge.org/
I extracts all clear and some of the formatting
Disclaimer: I wrote it
来源:https://stackoverflow.com/questions/14655315/programmatically-reading-a-microsoft-word-document