regex | 易学教程

Matching a word with pound (#) symbol in a regex

阅读更多关于 Matching a word with pound (#) symbol in a regex

问题 I have regexp for check if some text containing word (with ignoring boundary) String regexp = ".*\\bSOME_WORD_HERE\\b.*"; but this regexp return false when "SOME_WORD" starts with # (hashtag). Example, without # String text = "some text and test word"; String matchingWord = "test"; boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*"); // now contains == true; But with hashtag `contains` was false. Example: text = "some text and #test word"; matchingWord = "#test"; contains = text

Regular expression to extract text between braces

阅读更多关于 Regular expression to extract text between braces

问题 I am trying to extract text between curly braces in PHP. e.g Welcome {$user.first_name} to the {$site} version 1.5. Your username is {$user.username}. Your reputation at present is {$user.reputation.name} I have used \{\$(.*?)\} which works fine in some cases. This matches: {$user.first_name} {$site} {$user.username} {$user.reputation.name} But now I want to match only text which has single or multiple . (dot) within the braces. i.e For the above string I should be able to match only: {$user

Getting img url from RSS feed swift

阅读更多关于 Getting img url from RSS feed swift

问题 I want to be able to retrieve the img url from a piece of string. Here's a sample of the img URL that I'm trying to retrieve: <p><img width="357" height="500" src="http://images.sgcafe.net/2015/05/OVA1-357x500.jpg" class="attachment- medium wp-post-image" alt="OVA1" /> My current implemention is crashing at textCheck which says its NIL . I looked over at the Objective C solution on stackoverflow and implemented it in swift, but it doesn't seem to work. var elementString = item.summary var

How to remove everything except words and emoji from text?

阅读更多关于 How to remove everything except words and emoji from text?

问题 As a part of text classification problem I am trying to clean a text dataset. So far I was removing everything except text. Punctuation, numbers, emoji - everything was removed. Now I am trying to use emoji as features hence I want to retain words as well emoji. First I am searching the emoji in the text and separating them from other words/emoji. This is because each emoji should be treated individually/separately. So I search an emoji and pad it with spaces at both its ends. But I am at

Regex Recursion: Nth Subpatterns

阅读更多关于 Regex Recursion: Nth Subpatterns

问题 I'm trying to learn about Recursion in Regular Expressions, and have a basic understanding of the concepts in the PCRE flavour. I want to break a string: Geese (Flock) Dogs (Pack) into: Full Match: Geese (Flock) Dogs (Pack) Group 1: Geese (Flock) Group 2: Geese Group 3: (Flock) Group 4: Dogs (Pack) Group 5: Dogs Group 6: (Pack) I know neither regex quite does this, but I was more curious as to the reason why the the first pattern works, but the second one doesn't. Pattern 1: ((.*?)($\w{1,}$

Combining regular expressions in Python - \W and \S

阅读更多关于 Combining regular expressions in Python - \W and \S

问题 I want my code to only return the special characters [".", "*", "=", ","] I want to remove all digits/alphabetical characters ("\W") and all white spaces ("\S") import re original_string = "John is happy. He owns 3*4=12, apples" new_string = re.findall("\W\S",original_string) print(new_string) But instead I get this as my output: [' i', ' h', ' H', ' o', ' 3', '*4', '=1', ' a'] I have absolutely no idea why this happens. Hence I have two questions: 1) Is it possible to achieve my goal using

Decode Characters Pandas

阅读更多关于 Decode Characters Pandas

问题 Below is a sample of my DF ROLE NAME GESELLSCHAFTER DUPONT DUPONT GESCHÃ¤FTSFÃ¼HRER DUPONT DUPONT KOMPLEMENTÃ¤R DUPONT DUPONT GESELLSCHAFTER DUPONT DUPONT KOMPLEMENTÃ¤R DUPONT DUPONT The aim would be to fix the special characters. For eg, 'KOMPLEMENTÃ¤R'--> should be 'KOMPLEMENTAR' (with or without the Accent doesn't really matter) Thus, I tried to construct a list and replace the value name by the below dic list. {'A¤':'A', 'A–':'A', 'A¶':'A', 'A€':'A', 'Aƒ':'A', 'A„':'A', 'A\…':'A', 'A¡':'A

Regex subsequence matching

阅读更多关于 Regex subsequence matching

问题 I'm using python but code in any language will do as well for this question. Suppose I have 2 strings. sequence ='abcd' string = 'axyzbdclkd' In the above example sequence is a subsequence of string How can I check if sequence is a subsequence of string using regex? Also check the examples here for difference in subsequence and subarray and what I mean by subsequence. The only think I could think of is this but it's far from what I want. import re c = re.compile('abcd') c.match('axyzbdclkd')

bash find using regex is not case sensitive

阅读更多关于 bash find using regex is not case sensitive

问题 I need to find files starting with three lowercase letters but for some reason I'm getting an undesired case-insensitive behavior. I'm using find with the -regex option but it finds even the files starting with capital. $ find . -regextype posix-egrep -regex '.*/[a-z]{3}\w+\.abc' ./TTTxxx.abc ./tttyyy.abc prints the same as: $ find . -regextype posix-egrep -regex '.*/[A-Z]{3}\w+\.abc' ./TTTxxx.abc ./tttyyy.abc If instead of using a range of characters I use a single character, works as

regex issue sending BAN request to Varnish server via curl

阅读更多关于 regex issue sending BAN request to Varnish server via curl

问题 I have been trying to send a BAN request via curl to the Varnish server to invalid cached content. The url contains some regex for Varnish to check against. I have been successfully sending this request: 1. curl -X BAN "https://oursite.com/product/item/(100|7|9||8|7|6|5|4|2|1)" <!DOCTYPE html> <html> <head> <title>200 Ban added</title> </head> <body> <h1>Error 200 Ban added</h1> <p>Ban added</p> <h3>Guru Meditation:</h3> <p>XID: 66211</p> <hr> <p>Varnish cache server</p> </body> </html> but