text-processing

Python: How to loop through blocks of lines

只谈情不闲聊 提交于 2019-11-27 01:47:53
How to go through blocks of lines separated by an empty line? The file looks like the following: ID: 1 Name: X FamilyN: Y Age: 20 ID: 2 Name: H FamilyN: F Age: 23 ID: 3 Name: S FamilyN: Y Age: 13 ID: 4 Name: M FamilyN: Z Age: 25 I want to loop through the blocks and grab the fields Name, Family name and Age in a list of 3 columns: Y X 20 F H 23 Y S 13 Z M 25 Here's another way, using itertools.groupby . The function groupy iterates through lines of the file and calls isa_group_separator(line) for each line . isa_group_separator returns either True or False (called the key ), and itertools

How to find out if a sentence is a question (interrogative)?

ε祈祈猫儿з 提交于 2019-11-27 00:59:39
问题 Is there an open source Java library/algorithm for finding if a particular piece of text is a question or not? I am working on a question answering system that needs to analyze if the text input by user is a question. I think the problem can probably be solved by using opensource NLP libraries but its obviously more complicated than simple part of speech tagging. So if someone can instead tell the algorithm for it by using an existing opensource NLP library, that would be good too. Also let

Finding dictionary words

霸气de小男生 提交于 2019-11-27 00:58:15
问题 I have a lot of compound strings that are a combination of two or three English words. e.g. "Spicejet" is a combination of the words "spice" and "jet" I need to separate these individual English words from such compound strings. My dictionary is going to consist of around 100000 words. What would be the most efficient by which I can separate individual English words from such compound strings. 回答1: I'm not sure how much time or frequency you have to do this (is it a one-time operation? daily?

How does uʍop-ǝpᴉsdn text work?

一曲冷凌霜 提交于 2019-11-27 00:32:58
问题 Here's a website I found that will produce upside down versions of any English text. how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function? 回答1: does unicode have upside down chars? Yup! Or at least characters that look like they are upside down. Also, regular English-alphabetical characters can appear to be upside down. Like u could be an upside-down n . To code it up, you just have to take an array of characters, display them in

summarize text or simplify text [closed]

六月ゝ 毕业季﹏ 提交于 2019-11-26 22:31:16
问题 Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text? 回答1: I'm not sure if there is currently any libraries that do this, as text summarization, or at least understandable text summarization isn't something that will be easily accomplished by a simple plug & play library. Here are a few links that I managed to find regarding projects / resources that are related to text summarization to get you started: The Lemur Project

Find all hrefs in page and replace with link maintaining previous link - PHP

我的梦境 提交于 2019-11-26 16:57:25
问题 I'm trying to find all href links on a webpage and replace the link with my own proxy link. For example <a href="http://www.google.com">Google</a> Needs to be <a href="http://www.example.com/?loadpage=http://www.google.com">Google</a> 回答1: Use PHP's DomDocument to parse the page $doc = new DOMDocument(); // load the string into the DOM (this is your page's HTML), see below for more info $doc->loadHTML('<a href="http://www.google.com">Google</a>'); //Loop through each <a> tag in the dom and

Output text file with line breaks in PHP

非 Y 不嫁゛ 提交于 2019-11-26 16:54:20
问题 I'm trying to open a text file and output its contents with the code below. The text file includes line breaks but when I echo the file its unformatted. How do I fix this? Thanks. <html> <head> </head> <body> $fh = fopen("filename.txt", 'r'); $pageText = fread($fh, 25000); echo $pageText; </body> </html> 回答1: To convert the plain text line breaks to html line breaks, try this: $fh = fopen("filename.txt", 'r'); $pageText = fread($fh, 25000); echo nl2br($pageText); Note the nl2br function

How to replace ${} placeholders in a text file?

社会主义新天地 提交于 2019-11-26 14:56:29
I want to pipe the output of a "template" file into MySQL, the file having variables like ${dbName} interspersed. What is the command line utility to replace these instances and dump the output to standard output? user Sed ! Given template.txt: The number is ${i} The word is ${word} we just have to say: sed -e "s/\${i}/1/" -e "s/\${word}/dog/" template.txt Thanks to Jonathan Leffler for the tip to pass multiple -e arguments to the same sed invocation. plockc Update Here is a solution from yottatsa on a similar question that only does replacement for variables like $VAR or ${VAR}, and is a

Using SQL to determine word count stats of a text field

泪湿孤枕 提交于 2019-11-26 14:39:28
I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have found so far (without processing in language of choice outside the DB) is: SELECT AVG(LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1) FROM documents This seems to work* but do you have other suggestions? I'm currently using MySQL 4 (hope to move to version 5 for this app soon), but am also interested in general solutions. Thanks! * I can imagine that this is a pretty rough way to determine this

Converting a \u escaped Unicode string to ASCII

前提是你 提交于 2019-11-26 11:27:40
问题 After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: \'pretty\\u003D\\u003Ebig\' (displayed in the R console as \'pretty\\\\\\u003D\\\\\\u003Ebig\' ). I want to convert this to the ASCII string, which should be \'pretty=>big\' . More simply, if I set x <- \'pretty\\\\u003D\\\\u003Ebig\' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? 回答1: Use parse, but don't evaluate the