text-processing | 易学教程

Using multiple delimiters in awk

阅读更多关于 Using multiple delimiters in awk

问题 I have a file which contain following lines: /logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com /logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com /logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com In above output I want to extract 3 fields (Number 2, 4 and the last one *.example.com ). I am getting the following output: cat file | awk -F'/' '{print $3 "\t" $5}'

How to add a new line of text to an existing file in Java? [duplicate]

阅读更多关于 How to add a new line of text to an existing file in Java? [duplicate]

问题 This question already has answers here : How to append text to an existing file in Java (30 answers) Closed 2 years ago . I would like to append a new line to an existing file without erasing the current information of that file. In short, here is the methodology that I am using the current time: import java.io.BufferedWriter; import java.io.FileWriter; import java.io.Writer; Writer output; output = new BufferedWriter(new FileWriter(my_file_name)); //clears file every time output.append("New

Remove carriage returns from CSV data value

阅读更多关于 Remove carriage returns from CSV data value

Modify existing line in file - Java [duplicate]

阅读更多关于 Modify existing line in file - Java [duplicate]

问题 This question already has answers here : Modify a .txt file in Java (11 answers) Closed 5 years ago . I need some help or code examples to update an existing line. File contents: Heinrich: 30 George: 2020 Fred: 9090129 Say if I wanted to update (write) George's value to say 300, how would I achieve this? EDIT: Or would it be better off just using YAML? Thanks. 回答1: Here is a way to do it, try it. In this example the file is C:/user.txt and i change the value of George by 1234 public class

User Warning: Your stop_words may be inconsistent with your preprocessing

阅读更多关于 User Warning: Your stop_words may be inconsistent with your preprocessing

问题 I am following this document clustering tutorial. As an input I give a txt file which can be downloaded here. It's a combined file of 3 other txt files divided with a use of \n. After creating a tf-idf matrix I received this warning: ,,UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['abov', 'afterward', 'alon', 'alreadi', 'alway', 'ani', 'anoth', 'anyon', 'anyth', 'anywher', 'becam', 'becaus', 'becom', 'befor', 'besid',

Fast shell command to remove stop words in a text file

阅读更多关于 Fast shell command to remove stop words in a text file

问题 I have a 2GB text file. I am trying to remove frequently occurring english stop words from this file. I have stopwords.txt containing like this.. a an the for and I What is the fast method to do this using shell command such as tr, sed or awk? 回答1: Here's a method using the command line and perl : Save the text below as replacesw.sh : #! /bin/bash MYREGEX=\\b$`perl -pe 's/\n/|/g' $1`$\\b perl -pe "s/$MYREGEX//g" $2 Then if you have saved your file above as stopwords.txt , and have a second

Lowercase first element of tuple in list of tuples

阅读更多关于 Lowercase first element of tuple in list of tuples

问题 I have a list of documents, labeled with their appropriate categories: documents = [(list(corpus.words(fileid)), category) for category in corpus.categories() for fileid in corpus.fileids(category)] which gives me the following list of tuples, where the first element of the tuple is a list of words (tokens of a sentence). For instance: [([u'A', u'pilot', u'investigation', u'of', u'a', u'multidisciplinary', u'quality', u'of', u'life', u'intervention', u'for', u'men', u'with', u'biochemical', u

Linux join utility complains about input file not being sorted

阅读更多关于 Linux join utility complains about input file not being sorted

问题 I have two files: file1 has the format: field1;field2;field3;field4 (file1 is initially unsorted) file2 has the format: field1 (file2 is sorted) I run the 2 following commands: sort -t\; -k1 file1 -o file1 # to sort file 1 join -t\; -1 1 -2 1 -o 1.1 1.2 1.3 1.4 file1 file2 I get the following message: join: file1:27497: is not sorted: line_which_was_identified_as_out_of_order Why is this happening ? (I also tried to sort file1 taking into consideration the entire line not only the first filed

BufferedReader: read multiple lines into a single string

阅读更多关于 BufferedReader: read multiple lines into a single string

问题 I'm reading numbers from a txt file using BufferedReader for analysis. The way I'm going about this now is- reading a line using .readline, splitting this string into an array of strings using .split public InputFile () { fileIn = null; //stuff here fileIn = new FileReader((filename + ".txt")); buffIn = new BufferedReader(fileIn); return; //stuff here } public String ReadBigStringIn() { String line = null; try { line = buffIn.readLine(); } catch(IOException e){}; return line; } public

Perl or Python: Convert date from dd/mm/yyyy to yyyy-mm-dd

阅读更多关于 Perl or Python: Convert date from dd/mm/yyyy to yyyy-mm-dd

问题 I have lots of dates in a column in a CSV file that I need to convert from dd/mm/yyyy to yyyy-mm-dd format. For example 17/01/2010 should be converted to 2010-01-17. How can I do this in Perl or Python? 回答1: >>> from datetime import datetime >>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d') '2010-11-02' or more hackish way (that doesn't check for validity of values): >>> '-'.join('02/11/2010'.split('/')[::-1]) '2010-11-02' >>> '-'.join(reversed('02/11/2010'.split('/')))