string-matching

Looking up multiple dictionary keys in a Pandas Dataframe & return multiple values for matches

可紊 提交于 2020-01-01 06:40:08
问题 First time posting so apologies in advance if my formatting is off. Here's my issue: I've created a Pandas dataframe which contains multiple rows of text: d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']} keywords = pd.DataFrame(d,columns=['keywords']) In [7]: keywords Out[7]: keywords 0 cheap shoes 1 luxury shoes 2 cheap hiking shoes Now I have a dictionary that contains the following keys / values: labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport

r dplyr ends_with multiple string matches

…衆ロ難τιáo~ 提交于 2020-01-01 06:34:08
问题 Can I use dplyr::select(ends_with) to select column names that fit any of multiple conditions. Considering my column names, I want to use ends with instead of contains or matches, because the strings I want to select are relevant at the end of the column name, but may also appear in the middle in others. For instance, df <- data.frame(a10 = 1:4, a11 = 5:8, a20 = 1:4, a12 = 5:8) I want to select columns that end with 1 or 2, to have only columns a11 and a12. Is select(ends_with) the best way

Negative Lookaround Regex - Only one occurrence - Java

微笑、不失礼 提交于 2020-01-01 06:28:31
问题 I am trying to find if a string contains only one occurrence of a word , e.g. String : `jjdhfoobarfoo` , Regex : `foo` --> false String : `wewwfobarfoo` , Regex : `foo` --> true String : `jjfffoobarfo` , Regex : `foo` --> true multiple foo 's may happen anywhere in the string , so they can be non-consecutive, I test the following regex matching in java with string foobarfoo , but it doesn't work and it returns true : static boolean testRegEx(String str){ return str.matches(".*(foo)(?!.*foo).*

Negative Lookaround Regex - Only one occurrence - Java

∥☆過路亽.° 提交于 2020-01-01 06:28:10
问题 I am trying to find if a string contains only one occurrence of a word , e.g. String : `jjdhfoobarfoo` , Regex : `foo` --> false String : `wewwfobarfoo` , Regex : `foo` --> true String : `jjfffoobarfo` , Regex : `foo` --> true multiple foo 's may happen anywhere in the string , so they can be non-consecutive, I test the following regex matching in java with string foobarfoo , but it doesn't work and it returns true : static boolean testRegEx(String str){ return str.matches(".*(foo)(?!.*foo).*

How to subset data with advance string matching

与世无争的帅哥 提交于 2019-12-31 10:43:37
问题 I have the following data frame from which I would like to extract rows based on matching strings. > GEMA_EO5 gene_symbol fold_EO p_value RefSeq_ID BH_p_value KNG1 3.433049 8.56e-28 NM_000893,NM_001102416 1.234245e-24 REXO4 3.245317 1.78e-27 NM_020385 2.281367e-24 VPS29 3.827665 2.22e-25 NM_057180,NM_016226 2.560770e-22 CYP51A1 3.363149 5.95e-25 NM_000786,NM_001146152 6.239386e-22 TNPO2 4.707600 1.60e-23 NM_001136195,NM_001136196,NM_013433 1.538000e-20 NSDHL 2.703922 6.74e-23 NM_001129765,NM

Remove ends of string entries in pandas DataFrame column

旧街凉风 提交于 2019-12-30 06:40:28
问题 I have a pandas Dataframe with one column a list of files import pandas as pd df = pd.read_csv('fname.csv') df.head() filename A B C fn1.txt 2 4 5 fn2.txt 1 2 1 fn3.txt .... .... I would like to delete the file extension .txt from each entry in filename . How do I accomplish this? I tried: df['filename'] = df['filename'].map(lambda x: str(x)[:-4]) but when I look at the column entries afterwards with df.head() , nothing has changed. How does one do this? 回答1: I think you can use str.replace

Find numbers after specific text in a string with RegEx

不羁岁月 提交于 2019-12-30 06:04:47
问题 I have a multiline string like the following: 2012-15-08 07:04 Bla bla bla blup 2012-15-08 07:05 *** Error importing row no. 5: The import of this line failed because bla bla 2012-15-08 07:05 Another text that I don't want to search... 2012-15-08 07:06 Another text that I don't want to search... 2012-15-08 07:06 *** Error importing row no. 5: The import of this line failed because bla bla 2012-15-08 07:07 Import has finished bla bla What I want is to extract all row numbers that have errors

Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe

不问归期 提交于 2019-12-30 03:37:06
问题 This question is based on another question I asked, where I didn't cover the problem entirely: Pandas - check if a string column contains a pair of strings This is a modified version of the question. I have two dataframes : df1 = pd.DataFrame({'consumption':['squirrel ate apple', 'monkey likes apple', 'monkey banana gets', 'badger gets banana', 'giraffe eats grass', 'badger apple loves', 'elephant is huge', 'elephant eats banana tree', 'squirrel digs in grass']}) df2 = pd.DataFrame({'food':[

Returning the lowest index for the first non whitespace character in a string in Python

橙三吉。 提交于 2019-12-29 07:37:24
问题 What's the shortest way to do this in Python? string = " xyz" must return index = 3 回答1: >>> s = " xyz" >>> len(s) - len(s.lstrip()) 3 回答2: >>> next(i for i, j in enumerate(' xyz') if j.strip()) 3 or >>> next(i for i, j in enumerate(' xyz') if j not in string.whitespace) 3 in versions of Python < 2.5 you'll have to do: (...).next() 回答3: Looks like the "regexes can do anything" brigade have taken the day off, so I'll fill in: >>> tests = [u'foo', u' foo', u'\xA0foo'] >>> import re >>> for test

Check whether a string contains a substring

耗尽温柔 提交于 2019-12-28 03:26:25
问题 How can I check whether a given string contains a certain substring, using Perl? More specifically, I want to see whether s1.domain.com is present in the given string variable. 回答1: To find out if a string contains substring you can use the index function: if (index($str, $substr) != -1) { print "$str contains $substr\n"; } It will return the position of the first occurrence of $substr in $str , or -1 if the substring is not found. 回答2: Another possibility is to use regular expressions which