textmatching

Postgresql - converting text to ts_vector

你说的曾经没有我的故事 提交于 2020-01-03 09:32:18
问题 Sorry for the basic question. I have a table with the following columns. Column | Type | Modifiers --------+---------+----------- id | integer | doc_id | bigint | text | text | I am trying to do text matching on the 'text' (3rd column) I receive an error message when I try to text match on the text column. Saying that the string is too long for ts_vector. I only want observations which contain the words "other events" SELECT * FROM eightks\d WHERE to_tsvector(text) @@ to_tsquery('other_events

Postgresql - converting text to ts_vector

孤者浪人 提交于 2020-01-03 09:31:53
问题 Sorry for the basic question. I have a table with the following columns. Column | Type | Modifiers --------+---------+----------- id | integer | doc_id | bigint | text | text | I am trying to do text matching on the 'text' (3rd column) I receive an error message when I try to text match on the text column. Saying that the string is too long for ts_vector. I only want observations which contain the words "other events" SELECT * FROM eightks\d WHERE to_tsvector(text) @@ to_tsquery('other_events

How do I count varchar in a varchar using TSQL

99封情书 提交于 2019-12-24 03:16:20
问题 What is the best way to count the occurence of a varchar within a varchar. I rather not loop through a text in order to find certain combinations. This select only find the first SELECT CASE WHEN CHARINDEX('!','HOW MANY TIMES IS ! IN THIS TEXT ? THIS IS MY QUESTION !' ) > 0 THEN 1 ELSE 0 END Returns 1 I need a method to find the total number of matches TABLE DATA SEARCHTEXT LONGTEXT ! HOW MANY TIMES IS ! IN THIS TEXT ? THIS IS MY QUESTION ! HELLO HELLO HELLO HELLO HELLO HELLO HELLO L HELLO

how to determine if a record in every source, represents the same person

流过昼夜 提交于 2019-12-06 10:29:28
问题 I have several sources of tables with personal data, like this: SOURCE 1 ID, FIRST_NAME, LAST_NAME, FIELD1, ... 1, jhon, gates ... SOURCE 2 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 1, jon, gate ... SOURCE 3 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 2, jhon, ballmer ... So, assuming that records with ID 1, from sources 1 and 2, are the same person, my problem is how to determine if a record in every source, represents the same person . Additionally, sure not every records exists in

how to determine if a record in every source, represents the same person

与世无争的帅哥 提交于 2019-12-04 13:18:13
I have several sources of tables with personal data, like this: SOURCE 1 ID, FIRST_NAME, LAST_NAME, FIELD1, ... 1, jhon, gates ... SOURCE 2 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 1, jon, gate ... SOURCE 3 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 2, jhon, ballmer ... So, assuming that records with ID 1, from sources 1 and 2, are the same person, my problem is how to determine if a record in every source, represents the same person . Additionally, sure not every records exists in all sources. All the names, are written in spanish, mainly. In this case, the exact matching needs to

Search with various combinations of space, hyphen, casing and punctuations

[亡魂溺海] 提交于 2019-11-30 18:49:34
My schema: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" splitOnNumerics="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> <

Search with various combinations of space, hyphen, casing and punctuations

Deadly 提交于 2019-11-30 02:11:41
问题 My schema: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" splitOnNumerics="0"/> <filter class="solr.LowerCaseFilterFactory

How to match URIs in text?

六眼飞鱼酱① 提交于 2019-11-27 06:28:55
问题 How would one go about spotting URIs in a block of text? The idea is to turn such runs of texts into links. This is pretty simple to do if one only considered the http(s) and ftp(s) schemes; however, I am guessing the general problem (considering tel, mailto and other URI schemes) is much more complicated (if it is even possible). I would prefer a solution in C# if possible. Thank you. 回答1: Regexs may prove a good starting point for this, though URIs and URLs are notoriously difficult to

Regexp recognition of email address hard?

拟墨画扇 提交于 2019-11-26 00:28:45
问题 I recently read somewhere that writing a regexp to match an email address, taking into account all the variations and possibilities of the standard is extremely hard and is significantly more complicated than what one would initially assume. Can anyone provide some insight as to why that is? Are there any known and proven regexps that actually do this fully? What are some good alternatives to using regexps for matching email addresses? 回答1: For the formal e-mail spec, yes, it is technically