grep whole words made of only uppercase letters

前端未结

关注

 5  2002

[愿得一人]

Seems like this is rather simple, but I\'m having trouble.

I have a text document that looks, for example, like this:

This is a
TEXT DOCUME

相关标签:

5条回答

说谎

2021-01-23 05:06
You miss * and also \w is any word character. Correct regexp is:
```
\<[[:upper:]][[:upper:]]*\>
```
\< \> match word boundaries
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2021-01-23 05:10
To complement Zbynek Vyskovsky - kvr000's helpful answer:

grep's -E option allows use of extended regular expression, which includes quantifier + to mean one or more, which simplifies the solution:
```
 grep -Eo '\<[[:upper:]]+\>' Untitled.txt
```
Also, as mentioned in Benjamin W.'s answer, -w can be used to match on word boundaries without having to specify it as part of the regex:
```
 grep -Ewo '[[:upper:]]+' Untitled.txt
```
Note, however, that -w is a nonstandard option (but both BSD/OSX and GNU grep implement it).

As for egrep: it is nothing more than an (effective) alias of grep -E, which, as stated, activates support for extended regular expressions, but the exact set of features is platform-dependent.

Additionally, only GNU grep supports the -P option to support PCREs (Perl-Compatible Regular Expression), which offer even more features and flexibility.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2021-01-23 05:19
The example output shows multiple space separated uppercase words on the same line, which can be achieved with
```
$ grep -ow '[[:upper:]][[:upper:][:space:]]*[[:upper:]]' infile
TEXT DOCUMENT
SOME
BUT NOT
ALL CAPS
```
Any sequence starting and ending with an uppercase character, and uppercase characters or whitespace between them. -o returns the matches only, and -w makes sure that we don't match something like WORDlowercase.
0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2021-01-23 05:29
You can use this command:
```
grep -o -E "\<[[:upper:]]+\>" Untitled.txt
```
- -E activates extended regexp, this makes + available which stand for 1 or more repetitions
- \< and \> are anchor marking the begin and end of a word
- the whole regex means a sequence of one or more uppercase characters that made up the whole word
Your original regexp gave you three letter matches, because \w stands for [_[:alnum:]], so you instructed grep to match something which consists of three characters:
- the first and third from the [_[:alnum:]]
- the second from the [[:upper:]] range
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2021-01-23 05:29

An "old school" RE would be fewer characters:

grep -o '[A-Z][A-Z]*' Untitled.txt

It uses the -o option to Only print matching words and matches against uppercase A through Z.

Adding -w to search words and -E to invoke the Extended regular expressions allows this one that is even shorter:

grep -woE '[A-Z]+\>' Untitled.txt

0 讨论(0)
发布评论:

提交评论
- 加载中...