问题
I started to learn about NoSQL, but I can not find a some good examples for RavenDB. Can anybody tell me how to add Word, PDF, Excel, ... binary document as an attachment in RavenDB and search the content of that document? Is there any example for that? Is that possible? How can I make an MVC application for that?
回答1:
First, understand that when we talk about "document databases" in NoSQL, we aren't talking about Word, PDF, Excel documents. We are usually talking about a document in JSON format that represents some specific data, usually serialized from domain entities. The vast majority of RavenDB is focused on working with this sort of data.
However, you can indeed work with the sort of documents you are talking about. It's done with an add-on "bundle", not something that is built in. It's called the "Indexed Attachments Bundle", and I wrote it. You'll find the source code here. There are also unit tests that show how it can be used. For example, see this test. If you are interested in highlighting the search results, see this test also.
The bundle uses Windows IFilters to extract text from the binary document. You will need appropriate IFilters for the document types you plan to work with installed on your local system. If you plan to do a lot with PDF files, I highly recommend the Foxit PDF IFilter. It is much better and faster than Adobe's. If you are just working with Word and Excel documents, you may need the Office IFilters from Microsoft - Download either x86 or x64, plus the Service Pack.
With the appropriate IFilter installed, simply upload an attachment to RavenDB. The bundle will intercept the upload, extract its contents with the IFilter, save the contents to a JSON document, and index that document for easy searching.
You can also get a compiled version of the bundle from Nuget here. The dll needs to go in the plugins directory on your RavenDB server.
I do not currently have a full end-to-end sample of an application or website that uses this bundle. I also do not have any documentation on this bundle - so be sure to read through the unit tests.
If you just need information about attachments in general, not about indexing or searching them, then you should read the RavenDB documentation.
来源:https://stackoverflow.com/questions/16774036/search-inside-an-attachment-in-ravendb