Using Lucene like a relational database

前端 未结 5 2078
故里飘歌
故里飘歌 2021-01-01 03:10

I am just wondering if we could achieve some RDBMS capabilities in lucene.

Example: 1) I have 10,000 project documents (pdf files) which have to be indexed with thei

相关标签:
5条回答
  • 2021-01-01 03:14

    If I understand you correctly, you have two questions:

    1. Can I store a project id in Lucene and use it for further searches? Yes, you can. This is a common practice.
    2. Can I use this project id to search Lucene for project meta data? Yes, you can. I do not know if this is a good idea. It depends on the frequency of your meta data updates and your access pattern. If the meta data is relatively static, and you only access it by id, Lucene may be a good place to store it. Otherwise, you can use the project id as a primary key to a database table, which could be a better fit.
    0 讨论(0)
  • 2021-01-01 03:14

    Sounds like a perfectly good thing to do. The only limitation you'll have (by storing a reference to the project in Lucene rather than the project data itself) is that you won't be able to query both the document text and project metadata at the same time. For example, "documentText:foo OR projectName:bar" . If you have no such requirement, then seems like storing the ID in Lucene which refers to a database row is a fine thing to do.

    0 讨论(0)
  • 2021-01-01 03:20

    I am not sure on your overall setup, but maybe Hibernate Search is for you. It would allow you to combine the benefits of a relational database with the power of a fulltext search engine like Lucene. The meta data could live in the database, maybe together with the original pdf documents, while the Lucene documents just contain the searchable data.

    0 讨论(0)
  • 2021-01-01 03:26

    This is definitely possible. But always be aware of the fact that you're using Lucene for something that it was not intended for. In general, Lucene is designed for full-text search, not for mapping relational content. So the more complex your system your relational content becomes, the more you'll see a decrease in performance.

    In particular, there are a few areas to keep a close eye on:

    • Storing the value of each field in your index will decrease performance. If you are not overly concerned with sub-second search results, or if your index is relatively small, then this may not be a problem.
    • Also, be aware that if you are not using the default ranking algorithm, and your custom algorithm requires information about the project in order to calculate the score for each document, this will have a dramatic impact on search performance, as well.

    If you need a more powerful index that was designed for relational content, there are hierarchical indexing tools out there (one developed by Apache, called Jackrabbit) that are worth looking into.

    As your project continues to grow, you might also check out Solr, also developed by Apache, which provides some added functionality, such as multi-faceted search.

    0 讨论(0)
  • 2021-01-01 03:29

    You can use Lucene that way;

    Pros:

    Full-text search is easy to implement, which is not the case in an RDBMS.

    Cons:

    Referential integrity: you get it for free in an RDBMS, but in Lucene, you must implement it yourself.

    0 讨论(0)
提交回复
热议问题