need a document to extract text from image using onenote Interop?

你说的曾经没有我的故事 提交于 2019-12-01 22:43:21

问题


I need to do the simple Program whcih need to extract text from image using Onenote Interop? Could any one suggest me the appropriate document for my concept please?


回答1:


Text recognized by OneNote's OCR is stored in the one:OCRText element in the XML file structure in OneNote. e.g.

<one:Page ...>
    ...
    <one:Image ...>
        ...
        <one:OCRData lang="en-US">
            <one:OCRText><![CDATA[This is some sampletext]]></one:OCRText>
        </one:OCRData>
    </one:Image>
</one:Page>

You can see this XML using a program called OMSPY (it shows you the XML behind OneNote pages) - http://blogs.msdn.com/b/johnguin/archive/2011/07/28/onenote-spy-omspy-for-onenote-2010.aspx

To extract the text you would use the OneNote COM interop (as you pointed out). e.g.

//Instantialize OneNote
ApplicationClass onApp = new ApplicationClass();

//Get the XMl from the selected page
string xml = "";
onApp.GetPageContent("put the page id here", out xml);

//Put it into an XML document (from System.XML.Linq)
XDocument xDoc = XDocument.Parse(xml);

//OneNote's Namespace - for OneNote 2010
XNamespace one = "http://schemas.microsoft.com/office/onenote/2010/onenote";

//Get all the OCRText from the page
string[] OCRText = xDoc.Descendants(one + "OCRText").Select(x => x.Value).ToArray();

See the "Application Interface" docs on MSDN for more info - http://msdn.microsoft.com/en-us/library/gg649853.aspx



来源:https://stackoverflow.com/questions/7212787/need-a-document-to-extract-text-from-image-using-onenote-interop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!