text-processing | 易学教程

How to proceed with NLP task for recognizing intent and slots

阅读更多关于 How to proceed with NLP task for recognizing intent and slots

I wanted to write a program for asking questions about weather. What are the algorithms and techniques I should start looking at. ex: Will it be sunny this weekend in Chicago. I wanted to know the intent = weather query, date = this weekend, location = chicago. User can express the same query in many forms. I would like to solve some constrained form and looking for ideas on how to get started. The solution needs to be just good enough. Since your input is in the natural language form, best way to start looking into it, first by parsing the sentence structure. and running the sentence through

Add text to file at certain line in Linux [duplicate]

阅读更多关于 Add text to file at certain line in Linux [duplicate]

This question already has an answer here: Insert a line at specific line number with sed or awk 8 answers I want to add a specific line, lets say avatar to the files that starts with MakeFile and avatar should be added to the 15th line in the file. This is how to add text to files: echo 'avatar' >> MakeFile.websvc and this is how to add text to files that starts with MakeFile I think: echo 'avatar' >> *MakeFile. But I can not manage to add this line to the 15th line of the file. You can use sed to solve this: sed "15i avatar" Makefile.txt or use the -i option to save the changes made to the

How to add double quotes to a line with SED or AWK?

阅读更多关于 How to add double quotes to a line with SED or AWK?

问题 I have the following list of words: name,id,3 I need to have it double quoted like this: "name,id,3" I have tried sed 's/.*/\"&\"/g' and got: "name,id,3 Which has only one double quote and is missing the closing double quote. I've also tried awk {print "\""$1"\""} with exactly the same result. I need help. 回答1: Your input file has carriage returns at the end of the lines. You need to use dos2unix on the file to remove them. Or you can do this: sed 's/$.*$\r/"\1"/g' which will remove the

Code for identifying programming language in a text file [closed]

阅读更多关于 Code for identifying programming language in a text file [closed]

问题 i'm supposed to write code which when given a text file (source code) as input will output which programming language is it. This is the most basic definition of the problem. More constraints follow: I must write this in C++. A wide variety of languages should be recognized - html, php, perl, ruby, C, C++, Java, C#... Amount of false positives (wrong recognition) should be low - better to output "unknown" than a wrong result. (it will be in the list of probabilities for example as unknown:

shell replace cr\lf by comma

阅读更多关于 shell replace cr\lf by comma

问题 I have input.txt 1 2 3 4 5 I need to get such output.txt 1,2,3,4,5 How to do it? 回答1: Try this: tr '\n' ',' < input.txt > output.txt 回答2: With sed , you could use: sed -e 'H;${x;s/\n/,/g;s/^,//;p;};d' The H appends the pattern space to the hold space (saving the current line in the hold space). The ${...} surrounds actions that apply to the last line only. Those actions are: x swap hold and pattern space; s/\n/,/g substitute embedded newlines with commas; s/^,// delete the leading comma

Extract words surrounding a search word

阅读更多关于 Extract words surrounding a search word

I have this script that does a word search in text. The search goes pretty good and results work as expected. What I'm trying to achieve is extract n words close to the match. For example: The world is a small place, we should try to take care of it. Suppose I'm looking for place and I need to extract the 3 words on the right and the 3 words on the left. In this case they would be: left -> [is, a, small] right -> [we, should, try] What is the best approach to do this? Thanks! def search(text,n): '''Searches for text, and retrieves n words either side of the text, which are retuned seperatly'''

BLEU score implementation for sentence similarity detection

阅读更多关于 BLEU score implementation for sentence similarity detection

问题 I need to calculate BLEU score for identifying whether two sentences are similar or not.I have read some articles which are mostly about BLEU score for Measuring Machine translation accuracy.But i'm in need of a BLEU score to find out similarity between sentences in a same language[English].(i.e)(Both the sentences are in English).Thanks in anticipation. 回答1: Well, if you just want to calculate the BLEU score, it's straightforward. Treat one sentence as the reference translation and the other

Effects of Stemming on the term frequency?

阅读更多关于 Effects of Stemming on the term frequency?

问题 How are the term frequencies (TF), and inverse document frequency (IDF), affected by stop-word removal and stemming? Thanks! 回答1: tf is term frequency idf is inverse document frequency which is obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient. stemming effect is grouping all words which are derived from the same stem (ex: played, play,..), this grouping will increase the occurrence of this stem

Deleting the last line of a file with Java

阅读更多关于 Deleting the last line of a file with Java

I have a .txt file, which I want to process in Java. I want to delete its last line. I need ideas on how to achieve this without having to copy the entire content into another file and ignoring the last line. Any suggestions? You could find the beginning of the last line by scanning the file and then truncate it using FileChannel.truncate or RandomAccessFile.setLength . By taking RandomAccessFile you can: use method seek(long) to jump forward and read those lines. But you won't know exactly how big the jump should be. to delete last lines you need the position of begin of last line so before

tcl text processing - rearrange values in rows and columns based on user defined value

阅读更多关于 tcl text processing - rearrange values in rows and columns based on user defined value

I am new to tcl and would like to use it in text processing of a simple case. The following format is in Liberty (.lib file) which is used in chip design. I would be truly indebted for any help on this. Here is a snippet of my file (text processing to be done only on the "values") timing () { related_pin : "clk"; timing_type : setup_rising; rise_constraint (constraint_template_5X5) { index_1 ("0.01, 0.05, 0.12, 0.2, 0.4"); index_2 ("0.005, 0.025, 0.06, 0.1, 0.3"); index_3 ("0.084, 0.84, 3.36, 8.4, 13.44") ; values ( \ "1.1, 1.2, 1.3, 1.4, 1.5", \ "2.1, 2.2, 2.3, 2.4, 2.5", \ "3.1, 3.2, 3.3, 3