text-search

MongoDB Index optimization when using text-search in the aggregation framework

让人想犯罪 __ 提交于 2019-12-06 05:14:58
问题 We are building a simplified version of a search engine on top of MongoDB. Sample data set { "_id" : 1, "dept" : "tech", "updDate": ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" } { "_id" : 2, "dept" : "tech", "updDate": ISODate("2014-07-27T09:45:35Z"), "description" : "wireless red mouse" } { "_id" : 3, "dept" : "kitchen", "updDate": ISODate("2014-04-27T09:45:35Z"), "description" : "green placemat" } { "_id" : 4, "dept" : "kitchen", "updDate": ISODate("2014-05-27T09

Flexible sliding window (in Python)

北战南征 提交于 2019-12-06 03:42:07
Problem description: I'm interested in looking at terms in the text window of, say, 3 words to the left and 3 to the right. The base case has the form of w-3 w-2 w-1 term w+1 w+2 w+3. I want to implement a sliding window over my text with which I will be able to record the context words of each term. So, every word is once treated as a term, but when the window moves, it becomes a context word, etc. However, when the term is the 1st word in line, there are no context words on the left (t w+1 w+2 w+3), when it's the 2nd word in line, there's only one context word on the left, and so on. So, I

Fast Dynamic Fuzzy search over 100k+ strings in C#

旧街凉风 提交于 2019-12-05 21:04:05
问题 Let's say they are pre-loaded stock symbols, typed into a text box. I am looking for code that I can copy, not a library to install. This was inspired by this question: Are there any Fuzzy Search or String Similarity Functions libraries written for C#? The Levenstein distance algorithm seems to work well, but it takes time to compute. Are there any optimizations around the fact that the query will need to re-run as the user types in an extra letter? I am interested in showing at most the top

Javascript find index of word in string (not part of word)

我的梦境 提交于 2019-12-01 03:23:09
I am currently using str.indexOf("word") to find a word in a string. But the problem is that it is also returning parts of other words. Example: "I went to the foobar and ordered foo." I want the first index of the single word "foo", not not the foo within foobar. I can not search for "foo " because sometimes it might be followed by a full-stop or comma (any non-alphanumeric character). You'll have to use regex for this: > 'I went to the foobar and ordered foo.'.indexOf('foo') 14 > 'I went to the foobar and ordered foo.'.search(/\bfoo\b/) 33 /\bfoo\b/ matches foo that is surrounded by word

Javascript find index of word in string (not part of word)

匆匆过客 提交于 2019-11-30 23:23:59
问题 I am currently using str.indexOf("word") to find a word in a string. But the problem is that it is also returning parts of other words. Example: "I went to the foobar and ordered foo." I want the first index of the single word "foo", not not the foo within foobar. I can not search for "foo " because sometimes it might be followed by a full-stop or comma (any non-alphanumeric character). 回答1: You'll have to use regex for this: > 'I went to the foobar and ordered foo.'.indexOf('foo') 14 > 'I

How can I create an index with pymongo [duplicate]

拜拜、爱过 提交于 2019-11-30 00:37:45
问题 This question already has answers here : Recommended way/place to create index on MongoDB collection for a web application (3 answers) Closed last year . I want to enable text-search at a specific field in my Mongo DB. I want to implement this search in python (-> pymongo). When I follow the instructions given in the internet: db.foo.ensure_index(('field_i_want_to_index', 'text'), name="search_index") I get the following error message: Traceback (most recent call last): File "CVE_search.py",

How to list all my TODO messages in the current git managed code base

南楼画角 提交于 2019-11-29 21:49:42
I want to see all TODO comments that only I wrote and that exist in the current code base that is git managed. What I've got so far is printing all TODO comments that I've ever created or modified during the complete git history: git log -p --author="My name" -S TODO | grep "\+.*TODO" But this tool chain lists all TODO comments ever written, even those that I've already resolved and thus removed again from code. Is there a tool that can search the current code base line-by-line, check if it contains "TODO" and if this line was authored by me and then print those lines? You can combine git

Mongoose text-search with partial string

你离开我真会死。 提交于 2019-11-29 12:12:18
问题 Hi i'm using mongoose to search for persons in my collection. /*Person model*/ { name: { first: String, last: String } } Now i want to search for persons with a query: let regex = new RegExp(QUERY,'i'); Person.find({ $or: [ {'name.first': regex}, {'name.last': regex} ] }).exec(function(err,persons){ console.log(persons); }); If i search for John i get results (event if i search for Jo ). But if i search for John Doe i am not getting any results obviously. If i change QUERY to John|Doe i get

MongoDB diacriticInSensitive search not showing all accented (words with diacritic mark) rows as expected and vice-versa

时光毁灭记忆、已成空白 提交于 2019-11-29 11:28:22
I have a document collection with following structure uid, name With a Index db.Collection.createIndex({name: "text"}) It contains following data 1, iphone 2, iphóne 3, iphonë 4, iphónë When I am doing text search for iphone I am getting only two records, which is unexpected actual output -------------- 1, iphone 2, iphóne If I search for iphonë db.Collection.find( { $text: { $search: "iphonë"} } ); I am getting --------------------- 3, iphonë 4, iphónë But Actually I am expecting following output db.Collection.find( { $text: { $search: "iphone"} } ); db.Collection.find( { $text: { $search:

How to list all my TODO messages in the current git managed code base

匆匆过客 提交于 2019-11-28 17:42:46
问题 I want to see all TODO comments that only I wrote and that exist in the current code base that is git managed. What I've got so far is printing all TODO comments that I've ever created or modified during the complete git history: git log -p --author="My name" -S TODO | grep "\+.*TODO" But this tool chain lists all TODO comments ever written, even those that I've already resolved and thus removed again from code. Is there a tool that can search the current code base line-by-line, check if it