text-processing

Remove first directory components from path of file

让人想犯罪 __ 提交于 2019-11-29 02:50:12
I need to remove one directory (the leftmost) from variables in Bash. I found ways how can I remove all the path or use dirname and others but it was removing all or one path component on the right side; it wouldn't help me. So you have a better understanding of what I need, I'll write an example: I have a/project/hello.c , a/project/docs/README , ... and I want to remove that a/ so after some commands I´ll have project/hello.c and project/docs/README , ... You can use any of: x=a/b/c/d y=a/ echo ${x#a/} echo ${x#$y} echo ${x#*/} All three echo commands produce b/c/d ; you could use the value

Remove empty lines in a text file via grep

夙愿已清 提交于 2019-11-28 17:21:51
FILE : hello world foo bar How can when remove all the empty new lines in this FILE ? Output of command: FILE : hello world foo bar DigitalRoss grep . FILE (And if you really want to do it in sed, then: sed -e /^$/d FILE ) (And if you really want to do it in awk, then: awk /./ FILE ) Mr.Ree Try the following: grep -v -e '^$' with awk, just check for number of fields. no need regex $ more file hello world foo bar $ awk 'NF' file hello world foo bar Marco Coutinho Here is a solution that removes all lines that are either blank or contain only space characters: grep -v '^[[:space:]]*$' foo.txt

Java text classification problem [closed]

心已入冬 提交于 2019-11-28 16:55:11
I have a set of Books objects, classs Book is defined as following : Class Book{ String title; ArrayList<tags> taglist; } Where title is the title of the book, example : Javascript for dummies . and taglist is a list of tags for our example : Javascript, jquery, "web dev", .. As I said a have a set of books talking about different things : IT, BIOLOGY, HISTORY, ... Each book has a title and a set of tags describing it.. I have to classify automaticaly those books into separated sets by topic, example : IT BOOKS : Java for dummies Javascript for dummies Learn flash in 30 days C++ programming

Add text to file at certain line in Linux [duplicate]

血红的双手。 提交于 2019-11-28 16:42:05
问题 This question already has an answer here: Insert a line at specific line number with sed or awk 8 answers I want to add a specific line, lets say avatar to the files that starts with MakeFile and avatar should be added to the 15th line in the file. This is how to add text to files: echo 'avatar' >> MakeFile.websvc and this is how to add text to files that starts with MakeFile I think: echo 'avatar' >> *MakeFile. But I can not manage to add this line to the 15th line of the file. 回答1: You can

Text processing - Python vs Perl performance [closed]

那年仲夏 提交于 2019-11-28 16:18:17
Here is my Perl and Python script to do some simple text processing from about 21 log files, each about 300 KB to 1 MB (maximum) x 5 times repeated (total of 125 files, due to the log repeated 5 times). Python Code (code modified to use compiled re and using re.I ) #!/usr/bin/python import re import fileinput exists_re = re.compile(r'^(.*?) INFO.*Such a record already exists', re.I) location_re = re.compile(r'^AwbLocation (.*?) insert into', re.I) for line in fileinput.input(): fn = fileinput.filename() currline = line.rstrip() mprev = exists_re.search(currline) if(mprev): xlogtime = mprev

Algorithms to detect phrases and keywords from text

谁都会走 提交于 2019-11-28 15:01:13
I have around 100 megabytes of text, without any markup, divided to approximately 10,000 entries. I would like to automatically generate a 'tag' list. The problem is that there are word groups (i.e. phrases) that only make sense when they are grouped together. If I just count the words, I get a large number of really common words (is, the, for, in, am, etc.). I have counted the words and the number of other words that are before and after it, but now I really cannot figure out what to do next The information relating to the 2 and 3 word phrases is present, but how do I extract this data?

Finding common value across multiple files containing single column values

拜拜、爱过 提交于 2019-11-28 14:01:40
I have 100 text files containing single columns each. The files are like: file1.txt 10032 19873 18326 file2.txt 10032 19873 11254 file3.txt 15478 10032 11254 and so on. The size of each file is different. Kindly tell me how to find the numbers which are common in all these 100 files. The same number appear only once in 1 file. awk to the rescue! to find the common element in all files (assuming uniqueness within the same file) awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files count all occurrences and print the values where count equals number of files. This will work whether or

Removing stop words from single string

↘锁芯ラ 提交于 2019-11-28 12:35:46
My query is string = 'Alligator in water' where in is a stop word. How can I remove it so that I get stop_remove = 'Alligator water' as output. I have tried it with ismember but it returns integer value for matching word, I want to get the remaining words as output. in is just an example, I'd like to remove all possible stop words. Use this for removing all stop-words. Code % Source of stopwords- http://norm.al/2009/04/14/list-of-english-stop-words/ stopwords_cellstring={'a', 'about', 'above', 'above', 'across', 'after', ... 'afterwards', 'again', 'against', 'all', 'almost', 'alone', 'along',

Output text file with line breaks in PHP

偶尔善良 提交于 2019-11-28 12:04:30
I'm trying to open a text file and output its contents with the code below. The text file includes line breaks but when I echo the file its unformatted. How do I fix this? Thanks. <html> <head> </head> <body> $fh = fopen("filename.txt", 'r'); $pageText = fread($fh, 25000); echo $pageText; </body> </html> To convert the plain text line breaks to html line breaks, try this: $fh = fopen("filename.txt", 'r'); $pageText = fread($fh, 25000); echo nl2br($pageText); Note the nl2br function wrapping the text. One line of code: echo nl2br( file_get_contents('file.txt') ); If you just want to show the

Eliminate partially duplicate lines by column and keep the last one

五迷三道 提交于 2019-11-28 07:23:11
I have a file that looks like this: 2011-03-21 name001 line1 2011-03-21 name002 line2 2011-03-21 name003 line3 2011-03-22 name002 line4 2011-03-22 name001 line5 for each name, I only want its last appearance. So, I expect the result to be: 2011-03-21 name003 line3 2011-03-22 name002 line4 2011-03-22 name001 line5 Could someone give me a solution with bash/awk/sed? This code get uniq lines by second field but from the end of file or text (like in your result example) tac temp.txt | sort -k2,2 -r -u awk '{a[$2]=$0} END {for (i in a) print a[i]}' file If order of appearance is important: Based on