问题
We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier.
What we need:
- Column search
- ability to search by column
- we flag which columns in a table are searchable
- Keep some relation between db column and data
- we provide advanced filtering on the results
- facilitates (amazon style) filtering
- filter provided by grouping of results and allowing user to filter them via a checkbox
- this is a great feature, users like it very much
- Partial Word Match
- we have a lot of unique identifiers (product id, etc).
- the unique id's can have sub parts with meaning (location, etc)
- or only a portion may be available (when the user is searching)
- or (by a decidedly poor design decision) there may be white space in the id
- this is a major feature that we've implemented now via CHARINDEX (MSSQL) and INSTR (ORACLE)
- using the char index functions turned out to be equivalent performance(+/-) on MSSQL compared to full text
- didn't test on Oracle
- however searches against both types of db are very fast
- We take advantage of Indexed (MSSQL) and Materialized (Oracle) views to increase speed
- this is a huge win, Oracle Materialized views are better than MSSQL Indexed views
- both provide speedups in read-only join situations (like a search combing company and product)
- A search that matches user expectations of the paradigm CTRL-f -> enter text -> find matches
- Full Text Search is not the best in this area (slow and inconsistent matching)
- partial matching (see "Partial Word Match")
Nice to have:
- Search database in real time
- skip the indexing skip, this is not a hard requirement
- Spelling suggestion
- Xapian has this http://xapian.org/docs/spelling.html
- Similar to google's "Did you mean:"
What we don't need:
- We don't need to index documents
- at this point searching our data sources are the most important thing
- even when we do search documents, we will be looking for partial word matching, etc
- Ranking
- Our own simple ranking algorithm has proven much better than an FTS equivalent.
- Users understand it, we understand it, it's almost always relevant.
- Stemming
- Just don't need to get [run|ran|running]
- Advanced search operators
- phrase matching, or/and, etc
- according to Jakob Nielsen http://www.useit.com/alertbox/20010513.html
- most users are using simple search phrases
- very few use advanced searches (when it's available)
- also in Information Architecture 3rd edition Page 185
- "few users take advantage of them [advanced search functions]"
- http://oreilly.com/catalog/9780596000356
- our Amazon like filtering allows better filtering anyway (via user testing)
- Full Text Search
- We've found that results don't always "make sense" to the user
- Searching with FTS is hard to tune (which set of operators match the users expectations)
- Advanced search operators are a no go
- we don't need them because
- users don't understand them
- Performance has been very close (+/1) to the char index functions
- but the results are sometimes just "weird"
The question: Is there a solution that allows us to keep the key value pair "filtering feature", offers the column specific matching, partial word matching and the rest of the features, without the pain of full text search?
I'm open to any suggestion. I've wondered if a document/hash table nosql data store (MongoDB, et al) might be of use? ( http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo ). Any experience with these is appreciated.
Again, just making sure we aren't missing something with our in-house customized version. If there is something "off the shelf" I would be interested in it. Or if you've built something from some components, what components (search engines, data stores, etc) did you use and why?
You can also make your point for FTS. Just make sure it meets the requirements above before you say "just use Full Text Search because that's the only tool we have."
回答1:
I ended up coding my own.
The results are fantastic. Users like it, it works well with our existing technologies.
It really wasn't that hard. Just took some time.
Features:
- Faceted search (amazon, walmart, etc)
- Partial word search (the real stuff not full text)
- Search databases (oracle, sql server, etc) and non database sources
- Integrates well with our existing environment
- Maintains relations, so I can have a n to n search and display
- --> this means I can display child records of a master record in search results
- --> also I can search any child field and return the master record
It's really amazing what you can do with dictionaries and a lot of memory.
回答2:
I recommend looking into Solr, I believe it will meet you needs:
http://lucene.apache.org/solr/
回答3:
For an off-she-shelf solution: Have you checked out the Google Search Appliance?
Quote from the Google Mini/GSA site:
... If direct database indexing is a requirement for you, we encourage you to consider the Google Search Appliance, which has direct database connectivity.
And of course it indexes everything else in the Googly manner you'd expect it to.
回答4:
Apache Solr is a good way to start your project with and it is open source . You can also try Elastic Search and there are a lot of off shelf products which offer good customization abilities and search features such as Coveo, SharePoint Fast, Google ...
来源:https://stackoverflow.com/questions/3198901/how-to-implement-an-enterprise-search