grep

Merging word counts with Bash and Unix

旧巷老猫 提交于 2021-02-10 19:51:53
问题 I made a Bash script that extracts words from a text file with grep and sed and then sorts them with sort and counts the repetitions with wc , then sort again by frequency. The example output looks like this: 12 the 7 code 7 with 7 add 5 quite 3 do 3 well 1 quick 1 can 1 pick 1 easy Now I'd like to merge all words with the same frequency into one line, like this: 12 the 7 code with add 5 quite 3 do well 1 quick can pick easy Is there any way to do that with Bash and standard Unix toolset? Or

Remove words from a subtitle file that aren't in a wordlist (of common words)

青春壹個敷衍的年華 提交于 2021-02-10 14:51:16
问题 I have some subtitle files, and I'm not intending to learn every single word in these subtitles, there is no need to learn some hard terms like: cleidocranial, dysplasia... I found this script here: Remove words from a cell that aren't in a list. But I have no idea how to modify it or run it. (I'm using linux) Here is our example: subtitle file (.srt): 2 00:00:13,000 --> 00:00:15,000 People with cleidocranial dysplasia are good. wordlist of 3000 common words (.txt): ... people with are good .

Only extract those words from a list that include no repeating letters, using regex

丶灬走出姿态 提交于 2021-02-10 08:41:52
问题 I have a large word list file with one word per line. I would like to filter out the words with repeating alphabets. INPUT: abducts abe abeam abel abele OUTPUT: abducts abe abel I'd like to do this using Regex (grep or perl or python). Is that possible? 回答1: It's much easier to write a regex that matches words that do have repeating letters, and then negate the match: my @input = qw(abducts abe abeam abel abele); my @output = grep { not /(\w).*\1/ } @input; (This code assumes that @input

why questionmark comes in the end of filename when i create .txt file through shell script? [duplicate]

孤者浪人 提交于 2021-02-09 05:52:09
问题 This question already has answers here : Shell Scripting unwanted '?' character at the end of file name (2 answers) Closed 4 years ago . I am writing one shell script in which I am supposed to create 1 text file. When I do this, a question mark comes at the end of file name. what is the reason? I am trying below methods in bash script. 1) grep ERROR a1* > text.txt 2) touch text.txt In both the methods, instead of text.txt , there is a file generated as text.txt? what should I do to overcome

why questionmark comes in the end of filename when i create .txt file through shell script? [duplicate]

流过昼夜 提交于 2021-02-09 05:52:01
问题 This question already has answers here : Shell Scripting unwanted '?' character at the end of file name (2 answers) Closed 4 years ago . I am writing one shell script in which I am supposed to create 1 text file. When I do this, a question mark comes at the end of file name. what is the reason? I am trying below methods in bash script. 1) grep ERROR a1* > text.txt 2) touch text.txt In both the methods, instead of text.txt , there is a file generated as text.txt? what should I do to overcome

grep multiple patterns single file argument list too long

大憨熊 提交于 2021-02-08 23:41:13
问题 I am currently searching for multiple patterns in a file. The file is of 90GB in size, I am searching on a particular field(from position 6-17 in each line). I am trying to get all the lines that contain any of a particular list of numbers. The current syntax I am using is: grep '^.\{6\}0000000012345\|^.\{6\}0000000012543' somelargeFile.txt > outputFile.txt For small number of patterns this works. For a large number of patterns I get the "Argument list too long" error. One alternative I have

grep multiple patterns single file argument list too long

浪尽此生 提交于 2021-02-08 23:40:47
问题 I am currently searching for multiple patterns in a file. The file is of 90GB in size, I am searching on a particular field(from position 6-17 in each line). I am trying to get all the lines that contain any of a particular list of numbers. The current syntax I am using is: grep '^.\{6\}0000000012345\|^.\{6\}0000000012543' somelargeFile.txt > outputFile.txt For small number of patterns this works. For a large number of patterns I get the "Argument list too long" error. One alternative I have

Using grep from python console

流过昼夜 提交于 2021-02-08 08:59:26
问题 Using python how can I make this happen? python_shell$> print myPhone.print_call_log() | grep 555 The only thing close that I've seen is using "ipython console", assigning output to a variable, and then using a .grep() function on that variable. This is not really what I'm after. I want pipes and grepping on anything in the output (including errors/info). 回答1: Python's interactive REPL doesn't have grep , nor process pipelines, since it's not a Unix shell. You need to work with Python objects

Using grep from python console

给你一囗甜甜゛ 提交于 2021-02-08 08:59:17
问题 Using python how can I make this happen? python_shell$> print myPhone.print_call_log() | grep 555 The only thing close that I've seen is using "ipython console", assigning output to a variable, and then using a .grep() function on that variable. This is not really what I'm after. I want pipes and grepping on anything in the output (including errors/info). 回答1: Python's interactive REPL doesn't have grep , nor process pipelines, since it's not a Unix shell. You need to work with Python objects

How does bgrep work?

我与影子孤独终老i 提交于 2021-02-08 05:31:34
问题 I am studying the command bgrep found here. I run bgrep "fafafa" test_27.6.2015.bin | less -M on the the binary data called test_27.6.2015.bin but I get test_27.6.2015.bin: 00005ee4 test_27.6.2015.bin: 0000bd3c I would suspect to get matches containing the term fafafafa . Two matches is the correct amount of matches. These hex numbers are probably of some segment containing fafafafa . How does bgrep form its search result? 回答1: bgrep's search result are formatted this way: printf("%s: %08llx