full-text-search | 易学教程

SQL Server 2012 - Fulltext search on top of a filetable - PDF not being searched

阅读更多关于 SQL Server 2012 - Fulltext search on top of a filetable - PDF not being searched

问题 I'm getting my feet wet with handling a load of Office and PDF documents with SQL Server 2012's FILETABLE feature, and using fulltext search on top of that. I've configured my SQL Server to support fulltext search and filestream, and I've created a FILETABLE , dumped 800+ documents of all sorts into the folder, and that all works nicely. In order to be able to fulltext index MS Office documents, I've installed the MS Filter Pack 2.0, and to handle the PDF files, I've downloaded Adobe's

Postgresql fulltext search for words with apostrophes

阅读更多关于 Postgresql fulltext search for words with apostrophes

问题 I'm working on building a customized ispell dictionary configuration for Postgresql 8.4 and am having some problems getting words with apostrophes in them to parse correctly. The ispell dictionaries included with Postgresql include an .affix files which contains a "M" SFX rule which specifies an expanded form of its word. Here is an example, assuming that I have dictionary/SM in my dictionary file: SELECT to_tsvector('english_ispell', 'dictionary''s dictionaries'); Expected output:

How to setup Lucene/Solr for a B2B web app?

阅读更多关于 How to setup Lucene/Solr for a B2B web app?

问题 Given: 1 database per client (business customer) 5000 clients Clients have between 2 to 2000 users (avg is ~100 users/client) 100k to 10 million records per database Users need to search those records often (it's the best way to navigate their data) Possibly relevant info: Several new clients each week (any time during business hours) Multiple web servers and database servers (users can login via any web server) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a

Does PostgreSQL use tf-idf?

阅读更多关于 Does PostgreSQL use tf-idf?

问题 I would like to know whether full text search in PostgreSQL 9.3 with GIN/GiST index uses tf-idf (term frequency-inverse document frequency). In particular, in my columns of phrases, I have some words that are more popular, whereas some are quite unique (i.e., names). I want to index these columns so that the unique words matched will be weighted higher than common words. 回答1: No. Within the ts_rank function, there is no native method to rank results using their global (corpus) frequency. The

MySQL Match Fulltext

阅读更多关于 MySQL Match Fulltext

问题 Im' trying to do a fulltext search with mysql, to match a string. The problem is that it's returning odd results in the first place. For example, the string 'passat 2.0 tdi' : AND MATCH ( records_veiculos.titulo, records_veiculos.descricao ) AGAINST ( 'passat 2.0 tdi' WITH QUERY EXPANSION ) is returning this as the first result (the others are fine) : Volkswagen Passat Variant 1.9 TDI- ANO 2003 wich is incorrect, since there's no "2.0" in this example. What could it be? edit : Also, since

Is it possible to combine NEAR and FORMSOF together in a fulltext search?

阅读更多关于 Is it possible to combine NEAR and FORMSOF together in a fulltext search?

问题 I have this: SELECT * FROM AwesomePeople WHERE CONTAINS(Name, 'NEAR(("Nathan", "Fillion"), MAX, TRUE)') But I want to combine it so it uses my thesaurus of words to look at alternatives for Nathan and Fillion . I can do this: SELECT * FROM AwesomePeople WHERE CONTAINS(Name, 'FORMSOF (THESAURUS, "Nathan")) But I don't know how to search for 2 words, or make it do FORMSOF and NEAR together in a single query. I have tried a few combinations but am out of luck. Any ideas? 回答1: It looks like you

PostgreSQL: Find sentences closest to a given sentence

阅读更多关于 PostgreSQL: Find sentences closest to a given sentence

问题 I have a table of images with sentence captions. Given a new sentence I want to find the images that best match it based on how close the new sentence is to the stored old sentences. I know that I can use the @@ operator with a to_tsquery but tsquery accepts specific words as queries. One problem is I don't know how to convert the given sentence into a meaningful query. The sentence may have punctuation and numbers. However, I also feel that some kind of cosine similarity thing is what I need

fulltext index returning no results from pdf filestream

阅读更多关于 fulltext index returning no results from pdf filestream

问题 I have a filestream table running on SQL Server 2012 on a Windows 8.1 x64 machine, which already have a few PDF and TXT files stored, so I decided to create a fulltext index to search through these files by using the following command: CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT; CREATE FULLTEXT INDEX ON storage (FileName Language 1046, File TYPE COLUMN FileExtension Language 1046) KEY INDEX PK__storage__3214EC077DADCE3C ON FileStreamFTSCatalog WITH CHANGE_TRACKING AUTO; Then I

Performance difference between sunspot and thinking sphinx

阅读更多关于 Performance difference between sunspot and thinking sphinx

问题 I read an article comparing the performance of sunspot and thinking sphinx ( http://www.vijedi.net/2010/ruby-full-text-search-performance-thinking-sphinx-vs-sunspot-solr/ ). As per the article sunspot drags a lot behind thinking sphinx since it uses xml to interact with java layer. This is the result mentioned there Runs Thinking Sphinx Sunspot 5000 38.49 1611.60 10000 38.54 1648.51 15000 39.06 1614.52 20000 38.86 1583.53 25000 39.78 1613.79 30000 38.83 1595.60 35000 38.34 1571.96 40000 38.06

fuzzy search with lucene

阅读更多关于 fuzzy search with lucene

问题 I implemented a fuzzy search with lucene 4.3.1 but i'm not satisfied with the result. I would like to specify a number of results it should return. So for example if I want 10 results, it should return the 10 best matches, no matter how bad they are. Most of the time it returns nothing if the word I search for is very different from anything in the index. How can I achieve more/fuzzier results? Here the code I have: public String[] luceneQuery(String query, int numberOfHits, String path)