text-processing

Using multiple delimiters in awk

不打扰是莪最后的温柔 提交于 2019-12-17 04:11:26
问题 I have a file which contain following lines: /logs/tc0001/tomcat/tomcat7.1/conf/catalina.properties:app.env.server.name = demo.example.com /logs/tc0001/tomcat/tomcat7.2/conf/catalina.properties:app.env.server.name = quest.example.com /logs/tc0001/tomcat/tomcat7.5/conf/catalina.properties:app.env.server.name = www.example.com In above output I want to extract 3 fields (Number 2, 4 and the last one *.example.com ). I am getting the following output: cat file | awk -F'/' '{print $3 "\t" $5}'

How to add a new line of text to an existing file in Java? [duplicate]

烈酒焚心 提交于 2019-12-17 03:02:09
问题 This question already has answers here : How to append text to an existing file in Java (30 answers) Closed 2 years ago . I would like to append a new line to an existing file without erasing the current information of that file. In short, here is the methodology that I am using the current time: import java.io.BufferedWriter; import java.io.FileWriter; import java.io.Writer; Writer output; output = new BufferedWriter(new FileWriter(my_file_name)); //clears file every time output.append("New

Remove carriage returns from CSV data value

为君一笑 提交于 2019-12-14 03:24:51
问题 I am importing data from a pipe-delimited CSV to MySQL using a LOAD DATA INFILE statement. I am terminating lines by using '\r\n'. My problem is that some of the data within each row has '\r\n' in it, causing the load to error. I have similar files that just use '\n' within data to indicate linebreaks, and that causes no issues. Example GOOD CSV School|City|State|Country\r Harvard University|Cambridge|MA|USA\r Princeton University|Princeton|New Jersey |USA\r Example BAD CSV School|City|State

Modify existing line in file - Java [duplicate]

空扰寡人 提交于 2019-12-13 23:31:14
问题 This question already has answers here : Modify a .txt file in Java (11 answers) Closed 5 years ago . I need some help or code examples to update an existing line. File contents: Heinrich: 30 George: 2020 Fred: 9090129 Say if I wanted to update (write) George's value to say 300, how would I achieve this? EDIT: Or would it be better off just using YAML? Thanks. 回答1: Here is a way to do it, try it. In this example the file is C:/user.txt and i change the value of George by 1234 public class

User Warning: Your stop_words may be inconsistent with your preprocessing

不问归期 提交于 2019-12-13 15:23:26
问题 I am following this document clustering tutorial. As an input I give a txt file which can be downloaded here. It's a combined file of 3 other txt files divided with a use of \n. After creating a tf-idf matrix I received this warning: ,,UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['abov', 'afterward', 'alon', 'alreadi', 'alway', 'ani', 'anoth', 'anyon', 'anyth', 'anywher', 'becam', 'becaus', 'becom', 'befor', 'besid',

Fast shell command to remove stop words in a text file

一世执手 提交于 2019-12-13 15:03:27
问题 I have a 2GB text file. I am trying to remove frequently occurring english stop words from this file. I have stopwords.txt containing like this.. a an the for and I What is the fast method to do this using shell command such as tr, sed or awk? 回答1: Here's a method using the command line and perl : Save the text below as replacesw.sh : #! /bin/bash MYREGEX=\\b\(`perl -pe 's/\n/|/g' $1`\)\\b perl -pe "s/$MYREGEX//g" $2 Then if you have saved your file above as stopwords.txt , and have a second

Lowercase first element of tuple in list of tuples

你。 提交于 2019-12-13 14:19:06
问题 I have a list of documents, labeled with their appropriate categories: documents = [(list(corpus.words(fileid)), category) for category in corpus.categories() for fileid in corpus.fileids(category)] which gives me the following list of tuples, where the first element of the tuple is a list of words (tokens of a sentence). For instance: [([u'A', u'pilot', u'investigation', u'of', u'a', u'multidisciplinary', u'quality', u'of', u'life', u'intervention', u'for', u'men', u'with', u'biochemical', u

Linux join utility complains about input file not being sorted

强颜欢笑 提交于 2019-12-13 11:44:41
问题 I have two files: file1 has the format: field1;field2;field3;field4 (file1 is initially unsorted) file2 has the format: field1 (file2 is sorted) I run the 2 following commands: sort -t\; -k1 file1 -o file1 # to sort file 1 join -t\; -1 1 -2 1 -o 1.1 1.2 1.3 1.4 file1 file2 I get the following message: join: file1:27497: is not sorted: line_which_was_identified_as_out_of_order Why is this happening ? (I also tried to sort file1 taking into consideration the entire line not only the first filed

BufferedReader: read multiple lines into a single string

痞子三分冷 提交于 2019-12-13 11:36:15
问题 I'm reading numbers from a txt file using BufferedReader for analysis. The way I'm going about this now is- reading a line using .readline, splitting this string into an array of strings using .split public InputFile () { fileIn = null; //stuff here fileIn = new FileReader((filename + ".txt")); buffIn = new BufferedReader(fileIn); return; //stuff here } public String ReadBigStringIn() { String line = null; try { line = buffIn.readLine(); } catch(IOException e){}; return line; } public

Perl or Python: Convert date from dd/mm/yyyy to yyyy-mm-dd

女生的网名这么多〃 提交于 2019-12-13 11:33:55
问题 I have lots of dates in a column in a CSV file that I need to convert from dd/mm/yyyy to yyyy-mm-dd format. For example 17/01/2010 should be converted to 2010-01-17. How can I do this in Perl or Python? 回答1: >>> from datetime import datetime >>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d') '2010-11-02' or more hackish way (that doesn't check for validity of values): >>> '-'.join('02/11/2010'.split('/')[::-1]) '2010-11-02' >>> '-'.join(reversed('02/11/2010'.split('/')))