full-text-search

Python - Fast count words in text from list of strings and that start with

本秂侑毒 提交于 2021-02-08 02:40:13
问题 I know that similar questions have been asked several times, but my problem is a bit different and I am looking for a time-efficient solution, in Python. I have a set of words, some of them end with the "*" and some others don't: words = set(["apple", "cat*", "dog"]) I have to count their total occurrences in a text, considering that anything can go after an asterisk ("cat*" means all the words that start with "cat"). Search has to be case insensitive. Consider this example: text = "My cat

Elasticsearch : Strip HTML tags before indexing docs with html_strip filter not working

瘦欲@ 提交于 2021-02-07 11:19:19
问题 Given I have specified my html strip char filter in my custom analyser When I index a document with html content Then I expect the html to be strip out of the indexed content And on retrieval the returned doc from the index shoult not contain hmtl ACTUAL : The indexed doc contained html The retrieved doc contained html I have tried specifying the analyzer as index_analyzer as one would expect and a few others out of desperation search_analyzer and analyzer. Non seem to have any effect on the

Elasticsearch : Strip HTML tags before indexing docs with html_strip filter not working

房东的猫 提交于 2021-02-07 11:18:14
问题 Given I have specified my html strip char filter in my custom analyser When I index a document with html content Then I expect the html to be strip out of the indexed content And on retrieval the returned doc from the index shoult not contain hmtl ACTUAL : The indexed doc contained html The retrieved doc contained html I have tried specifying the analyzer as index_analyzer as one would expect and a few others out of desperation search_analyzer and analyzer. Non seem to have any effect on the

Postgresql fulltext search for Czech language (no default language config)

╄→尐↘猪︶ㄣ 提交于 2021-02-07 03:57:45
问题 I am trying to setup fulltext search for Czech language. I am little bit confused, because I see some cs_cz.affix and cs_cz.dict files inside tsearch_data folder, but there is no Czech language configuration (it's probably not shipped with Postgres). So should I create one? Which dics do I have to create/config? Is there some support for Czech language at all? Should I use all possible dicts? ( Synonym Dictionary, Thesaurus Dictionary, Ispell Dictionary, Snowball Dictionary ) I am able to

Postgresql fulltext search for Czech language (no default language config)

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-07 03:57:32
问题 I am trying to setup fulltext search for Czech language. I am little bit confused, because I see some cs_cz.affix and cs_cz.dict files inside tsearch_data folder, but there is no Czech language configuration (it's probably not shipped with Postgres). So should I create one? Which dics do I have to create/config? Is there some support for Czech language at all? Should I use all possible dicts? ( Synonym Dictionary, Thesaurus Dictionary, Ispell Dictionary, Snowball Dictionary ) I am able to

WinSCP: Text search on remote files

被刻印的时光 ゝ 提交于 2021-02-06 12:42:28
问题 I use WinSCP to get access on the remote files of our project. How can I search for some text/words in all remote files/directories using WinSCP? 回答1: WinSCP does not support text searching in its primary GUI. But there's a built-in extension to Search recursively for text in remote directory. This is a universal solution that works with SFTP, even if the server does not allow shell access, or even for FTP or WebDAV sessions. Alternatively, you may be able to make use of WinSCP console window

Solr schema for prefix search, howto?

醉酒当歌 提交于 2021-02-04 21:07:17
问题 I read many Questions from stackoverflow, but didn't found an answer, how to make Solr prefix search. For example I have text: "solr documentation is unreadable", and I need to find something like this: "solr docu*", "documentation unread*", "unreadable is so*", but not "un* so*", I make something like this: <fieldType name="prefix_search" class="solr.TextField"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"

Solr schema for prefix search, howto?

泪湿孤枕 提交于 2021-02-04 21:06:35
问题 I read many Questions from stackoverflow, but didn't found an answer, how to make Solr prefix search. For example I have text: "solr documentation is unreadable", and I need to find something like this: "solr docu*", "documentation unread*", "unreadable is so*", but not "un* so*", I make something like this: <fieldType name="prefix_search" class="solr.TextField"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"

Solr schema for prefix search, howto?

蹲街弑〆低调 提交于 2021-02-04 21:06:14
问题 I read many Questions from stackoverflow, but didn't found an answer, how to make Solr prefix search. For example I have text: "solr documentation is unreadable", and I need to find something like this: "solr docu*", "documentation unread*", "unreadable is so*", but not "un* so*", I make something like this: <fieldType name="prefix_search" class="solr.TextField"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"

Solr wildcards and escaped characters together

梦想的初衷 提交于 2021-01-29 14:49:05
问题 I am trying to search in solr but have a problem. For example i have this fraze, stored in solr: [Karina K[arina ? ! & ?!a& m.malina m:malina 0sal0 0 AND . Now i want to search any request with wildcards * . For example i write *[* or *?* and solr return me this fraze. But it doesn't work. What i tried: i can use escaped characters like this K\[arina , but in this case i need to enter all phrase enter image description here But if i write K\[arin* , i wioll have no results enter image