text-processing | 易学教程

Python: How to loop through blocks of lines

阅读更多关于 Python: How to loop through blocks of lines

How to go through blocks of lines separated by an empty line? The file looks like the following: ID: 1 Name: X FamilyN: Y Age: 20 ID: 2 Name: H FamilyN: F Age: 23 ID: 3 Name: S FamilyN: Y Age: 13 ID: 4 Name: M FamilyN: Z Age: 25 I want to loop through the blocks and grab the fields Name, Family name and Age in a list of 3 columns: Y X 20 F H 23 Y S 13 Z M 25 Here's another way, using itertools.groupby . The function groupy iterates through lines of the file and calls isa_group_separator(line) for each line . isa_group_separator returns either True or False (called the key ), and itertools

How to find out if a sentence is a question (interrogative)?

阅读更多关于 How to find out if a sentence is a question (interrogative)?

问题 Is there an open source Java library/algorithm for finding if a particular piece of text is a question or not? I am working on a question answering system that needs to analyze if the text input by user is a question. I think the problem can probably be solved by using opensource NLP libraries but its obviously more complicated than simple part of speech tagging. So if someone can instead tell the algorithm for it by using an existing opensource NLP library, that would be good too. Also let

Finding dictionary words

阅读更多关于 Finding dictionary words

问题 I have a lot of compound strings that are a combination of two or three English words. e.g. "Spicejet" is a combination of the words "spice" and "jet" I need to separate these individual English words from such compound strings. My dictionary is going to consist of around 100000 words. What would be the most efficient by which I can separate individual English words from such compound strings. 回答1: I'm not sure how much time or frequency you have to do this (is it a one-time operation? daily?

How does uʍop-ǝpᴉsdn text work?

阅读更多关于 How does uʍop-ǝpᴉsdn text work?

问题 Here's a website I found that will produce upside down versions of any English text. how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function? 回答1: does unicode have upside down chars? Yup! Or at least characters that look like they are upside down. Also, regular English-alphabetical characters can appear to be upside down. Like u could be an upside-down n . To code it up, you just have to take an array of characters, display them in

summarize text or simplify text [closed]

阅读更多关于 summarize text or simplify text [closed]

问题 Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text? 回答1: I'm not sure if there is currently any libraries that do this, as text summarization, or at least understandable text summarization isn't something that will be easily accomplished by a simple plug & play library. Here are a few links that I managed to find regarding projects / resources that are related to text summarization to get you started: The Lemur Project

Find all hrefs in page and replace with link maintaining previous link - PHP

阅读更多关于 Find all hrefs in page and replace with link maintaining previous link - PHP

问题 I'm trying to find all href links on a webpage and replace the link with my own proxy link. For example <a href="http://www.google.com">Google</a> Needs to be <a href="http://www.example.com/?loadpage=http://www.google.com">Google</a> 回答1: Use PHP's DomDocument to parse the page $doc = new DOMDocument(); // load the string into the DOM (this is your page's HTML), see below for more info $doc->loadHTML('<a href="http://www.google.com">Google</a>'); //Loop through each <a> tag in the dom and

Output text file with line breaks in PHP

阅读更多关于 Output text file with line breaks in PHP

问题 I'm trying to open a text file and output its contents with the code below. The text file includes line breaks but when I echo the file its unformatted. How do I fix this? Thanks. <html> <head> </head> <body> $fh = fopen("filename.txt", 'r'); $pageText = fread($fh, 25000); echo $pageText; </body> </html> 回答1: To convert the plain text line breaks to html line breaks, try this: $fh = fopen("filename.txt", 'r'); $pageText = fread($fh, 25000); echo nl2br($pageText); Note the nl2br function

How to replace ${} placeholders in a text file?

阅读更多关于 How to replace ${} placeholders in a text file?

I want to pipe the output of a "template" file into MySQL, the file having variables like ${dbName} interspersed. What is the command line utility to replace these instances and dump the output to standard output? user Sed ! Given template.txt: The number is ${i} The word is ${word} we just have to say: sed -e "s/\${i}/1/" -e "s/\${word}/dog/" template.txt Thanks to Jonathan Leffler for the tip to pass multiple -e arguments to the same sed invocation. plockc Update Here is a solution from yottatsa on a similar question that only does replacement for variables like $VAR or ${VAR}, and is a

Using SQL to determine word count stats of a text field

阅读更多关于 Using SQL to determine word count stats of a text field

I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have found so far (without processing in language of choice outside the DB) is: SELECT AVG(LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1) FROM documents This seems to work* but do you have other suggestions? I'm currently using MySQL 4 (hope to move to version 5 for this app soon), but am also interested in general solutions. Thanks! * I can imagine that this is a pretty rough way to determine this

Converting a \u escaped Unicode string to ASCII

阅读更多关于 Converting a \u escaped Unicode string to ASCII

问题 After reading all about iconv and Encoding , I am still confused. I am scraping the source of a web page I have a string that looks like this: \'pretty\\u003D\\u003Ebig\' (displayed in the R console as \'pretty\\\\\\u003D\\\\\\u003Ebig\' ). I want to convert this to the ASCII string, which should be \'pretty=>big\' . More simply, if I set x <- \'pretty\\\\u003D\\\\u003Ebig\' How do I perform a conversion on x to yield pretty=>big ? Any suggestions? 回答1: Use parse, but don't evaluate the