Fast shell command to remove stop words in a text file
问题 I have a 2GB text file. I am trying to remove frequently occurring english stop words from this file. I have stopwords.txt containing like this.. a an the for and I What is the fast method to do this using shell command such as tr, sed or awk? 回答1: Here's a method using the command line and perl : Save the text below as replacesw.sh : #! /bin/bash MYREGEX=\\b\(`perl -pe 's/\n/|/g' $1`\)\\b perl -pe "s/$MYREGEX//g" $2 Then if you have saved your file above as stopwords.txt , and have a second