regex

Matching a word with pound (#) symbol in a regex

眉间皱痕 提交于 2021-01-27 14:10:20
问题 I have regexp for check if some text containing word (with ignoring boundary) String regexp = ".*\\bSOME_WORD_HERE\\b.*"; but this regexp return false when "SOME_WORD" starts with # (hashtag). Example, without # String text = "some text and test word"; String matchingWord = "test"; boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*"); // now contains == true; But with hashtag `contains` was false. Example: text = "some text and #test word"; matchingWord = "#test"; contains = text

Regular expression to extract text between braces

人盡茶涼 提交于 2021-01-27 14:01:25
问题 I am trying to extract text between curly braces in PHP. e.g Welcome {$user.first_name} to the {$site} version 1.5. Your username is {$user.username}. Your reputation at present is {$user.reputation.name} I have used \{\$(.*?)\} which works fine in some cases. This matches: {$user.first_name} {$site} {$user.username} {$user.reputation.name} But now I want to match only text which has single or multiple . (dot) within the braces. i.e For the above string I should be able to match only: {$user

Getting img url from RSS feed swift

微笑、不失礼 提交于 2021-01-27 13:59:51
问题 I want to be able to retrieve the img url from a piece of string. Here's a sample of the img URL that I'm trying to retrieve: <p><img width="357" height="500" src="http://images.sgcafe.net/2015/05/OVA1-357x500.jpg" class="attachment- medium wp-post-image" alt="OVA1" /> My current implemention is crashing at textCheck which says its NIL . I looked over at the Objective C solution on stackoverflow and implemented it in swift, but it doesn't seem to work. var elementString = item.summary var

How to remove everything except words and emoji from text?

泄露秘密 提交于 2021-01-27 13:53:11
问题 As a part of text classification problem I am trying to clean a text dataset. So far I was removing everything except text. Punctuation, numbers, emoji - everything was removed. Now I am trying to use emoji as features hence I want to retain words as well emoji. First I am searching the emoji in the text and separating them from other words/emoji. This is because each emoji should be treated individually/separately. So I search an emoji and pad it with spaces at both its ends. But I am at

Regex Recursion: Nth Subpatterns

邮差的信 提交于 2021-01-27 13:44:07
问题 I'm trying to learn about Recursion in Regular Expressions, and have a basic understanding of the concepts in the PCRE flavour. I want to break a string: Geese (Flock) Dogs (Pack) into: Full Match: Geese (Flock) Dogs (Pack) Group 1: Geese (Flock) Group 2: Geese Group 3: (Flock) Group 4: Dogs (Pack) Group 5: Dogs Group 6: (Pack) I know neither regex quite does this, but I was more curious as to the reason why the the first pattern works, but the second one doesn't. Pattern 1: ((.*?)(\(\w{1,}\)

Combining regular expressions in Python - \W and \S

≯℡__Kan透↙ 提交于 2021-01-27 13:43:29
问题 I want my code to only return the special characters [".", "*", "=", ","] I want to remove all digits/alphabetical characters ("\W") and all white spaces ("\S") import re original_string = "John is happy. He owns 3*4=12, apples" new_string = re.findall("\W\S",original_string) print(new_string) But instead I get this as my output: [' i', ' h', ' H', ' o', ' 3', '*4', '=1', ' a'] I have absolutely no idea why this happens. Hence I have two questions: 1) Is it possible to achieve my goal using

Decode Characters Pandas

落爺英雄遲暮 提交于 2021-01-27 13:33:10
问题 Below is a sample of my DF ROLE NAME GESELLSCHAFTER DUPONT DUPONT GESCHäFTSFüHRER DUPONT DUPONT KOMPLEMENTäR DUPONT DUPONT GESELLSCHAFTER DUPONT DUPONT KOMPLEMENTäR DUPONT DUPONT The aim would be to fix the special characters. For eg, 'KOMPLEMENTäR'--> should be 'KOMPLEMENTAR' (with or without the Accent doesn't really matter) Thus, I tried to construct a list and replace the value name by the below dic list. {'A¤':'A', 'A–':'A', 'A¶':'A', 'A€':'A', 'Aƒ':'A', 'A„':'A', 'A\…':'A', 'A¡':'A

Regex subsequence matching

筅森魡賤 提交于 2021-01-27 13:30:28
问题 I'm using python but code in any language will do as well for this question. Suppose I have 2 strings. sequence ='abcd' string = 'axyzbdclkd' In the above example sequence is a subsequence of string How can I check if sequence is a subsequence of string using regex? Also check the examples here for difference in subsequence and subarray and what I mean by subsequence. The only think I could think of is this but it's far from what I want. import re c = re.compile('abcd') c.match('axyzbdclkd')

bash find using regex is not case sensitive

廉价感情. 提交于 2021-01-27 13:26:13
问题 I need to find files starting with three lowercase letters but for some reason I'm getting an undesired case-insensitive behavior. I'm using find with the -regex option but it finds even the files starting with capital. $ find . -regextype posix-egrep -regex '.*/[a-z]{3}\w+\.abc' ./TTTxxx.abc ./tttyyy.abc prints the same as: $ find . -regextype posix-egrep -regex '.*/[A-Z]{3}\w+\.abc' ./TTTxxx.abc ./tttyyy.abc If instead of using a range of characters I use a single character, works as

regex issue sending BAN request to Varnish server via curl

心已入冬 提交于 2021-01-27 13:25:21
问题 I have been trying to send a BAN request via curl to the Varnish server to invalid cached content. The url contains some regex for Varnish to check against. I have been successfully sending this request: 1. curl -X BAN "https://oursite.com/product/item/(100|7|9||8|7|6|5|4|2|1)" <!DOCTYPE html> <html> <head> <title>200 Ban added</title> </head> <body> <h1>Error 200 Ban added</h1> <p>Ban added</p> <h3>Guru Meditation:</h3> <p>XID: 66211</p> <hr> <p>Varnish cache server</p> </body> </html> but