Solr / rdbms, where to store additonal data

北慕城南 提交于 2019-12-24 11:51:43

问题


What would be considered best practice when you need additional data about facet results.

ie. i need a friendlyname / image / meta keywords / description / and more.. for product categories. (when faceting on categories)

  • include it in the document? (can lead to looots of duplication)
  • introduce category as a new index in solr (or fake by doctype=category field in solr)
  • use a rdbms to lookup additional data using a SELECT WHERE IN (..category facet result ids..)

Thanks,

Remco


回答1:


I would think about 2 alternatives:

1.) strong the informations for every document without indexing it (to keep the index small as possible). The point is, that i would not store the image insight Lucene/Solr - only an file pointer.

2.) store the additional data on an rdbms or nosql (linke mongoDB) to lookup, as you wrote.

My favorite is the 2nd. one, because an database is the traditional and most optimized way to storing data. But finally it depends on your system, because you should keep in mind, that you need time for connecting an database, searching through the data and sending the additional information back to the application. So it could be faster to store everything on lucene.

Probably an small performance test would be useful.




回答2:


  • use fast NoSQL db that fits your data

BTW Lucene, which is Solr's underlying layer, is in fact also NoSQL-type storage facility.

If I were you, I'd use MongoDB. That's the first db that came to mind, since you need binary data and they practically invented BSON, which is now widespread mean of transferring binary data in a JSON-like fashion.

If your data structure is more graph-shaped (like social network) check out Neo4j, which has blindingly fast graph traversal algorithms.




回答3:


A relational DB can reliably enforce the "category is first class entity" thing. You would need referential integrity: a product may not belong to a category that doesnt exist. A deleted category must not have it's child categories lying around. A normalized RDB can enforce referential integrity through schema. A NoSQL DB must work with client-side code (you must write) to enforce referential integrity.


Lets see how "product's category must exist" and "subcategories' parents must exist" are done:

RDB: The table that assigns categories to products (an m:n relation) must be keyed up to the product and category by an ON DELETE CASCADE. If a category is deleted, a product simply cannot have such a category. A category that links up to another category as a child: the relavent field has an ON DELETE CASCADE. This means that if a parent is deleted, it's children cannot exist. This entire method is declarative ("it is declared thus"), all complexities exist in the data, we dont need no stinking code to do it for us. You can model a DB as naturally as you understand their real world implications.

Document store-type NoSQL: You need to write code to do everything. A "category is deleted" is an use case, and you need to find products that have that category, and update each one. You have to write code for each use case. Same goes for managing subcategories. The data model may be incredibly stupid, but their real-world implications must be modeled in the code. And its tougher to reason in code and control flow rather than in data structures.

Do you really have performance needs that require NoSQL databases?

So use RDBMSs to manage your data. Then use Direct Import handler or client-side code to insert/update denormalized entities for searching. If most requests to your site can be expressed in Solr queries, great!


As for expressing hierarchial faceting in Solr, see ' Ways to do hierarchial faceting in Solr? '.




回答4:


maybe I am wrong, but if you are on Solr trunk you could benefit from Solr join suport, this would allow you to index several entities with relations among them while enforcing conditions on both.



来源:https://stackoverflow.com/questions/9004901/solr-rdbms-where-to-store-additonal-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!