Matching arbitrary number of digits using grep regex

一个人想着一个人 提交于 2020-01-15 03:17:32

问题


I've got a file that has lines in it that look similar as follows

data
datalater
983290842
Data387428later
datafhj893724897290384later
4329804928later

What I am looking to do is use regex to match any line that starts with data and ends with later AND has numbers in between. Here is what I've concocted so far:

^[D,d]ata[0-9]*later$ 

However the output includes all datalater lines. I suppose I could pipe the output and grep -v datalater, but I feel like a single expression should do the trick.


回答1:


Use + instead of *.

+ matches at least one or more of the preceding.
* matches zero or more.

^[Dd]ata[0-9]+later$

In grep you need to escape the +, and we can use \d which is a character class and matches single digits.

^[Dd]ata\d\+later$

In you example file you also have a line:

datafhj893724897290384later

This currently will not be matched due to there being letters in-between data and the numbers. We can fix this by adding a [^0-9]* to match anything after data until the digits.

Our final command will be:

grep '^[Dd]ata[^0-9]*\d\+later$' filename



回答2:


You're matching zero or more digits with the * qualifier. Try

^[Dd]ata\d+later$

instead. You were also finding commas at the beginning of the string (e.g. ",ata1234later"). And \d is a shortcut to finding any digit character. So I changed those as well.




回答3:


You should put a "+" (which means one or several) instead of "*" (which means zero, one or several




回答4:


Using Cygwin, the above commands didn't work. I had to modify the commands given above to get the desired results.

$ cat > file.txt <<EOL
> data
> datalater
> 983290842
> Data387428later
> datafhj893724897290384later
> 4329804928later
> EOL

I always like to make sure my file has what I expect it to have:

$ cat file.txt
data
datalater
983290842
Data387428later
datafhj893724897290384later
4329804928later

$

I needed to run Perl-style expressions with the -P flag. This meant I couldn't use the [^0-9]+, whose necessity @Tom_Cammann aptly pointed out. Instead, I used .* which matches any sequence of characters not matching the next part of the pattern. Here are my command and output.

$ grep -P '^[Dd]ata.*\d+later$' file.txt
Data387428later
datafhj893724897290384later

$

I wish I could give a better explanation of WHY Perl expressions are needed, but I just know that Cygwin's grep works a bit differently.

System Info

$ uname -a
CYGWIN_NT-10.0 A-1052207 2.5.2(0.297/5/3) 2016-06-23 14:29 x86_64 Cygwin

My Results from the previous answers

$ grep '^[Dd]ata[^0-9]*\d\+later$' file2.txt

$ grep '^[Dd]ata\d+later$' file2.txt

$ grep -P '^[Dd]ata[^0-9]*\d\+later$' file2.txt

$ grep -P '^[Dd]ata\d+later$' file2.txt
Data387428later

$


来源:https://stackoverflow.com/questions/14926332/matching-arbitrary-number-of-digits-using-grep-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!