text-processing

Finding dictionary words

人盡茶涼 提交于 2019-11-28 05:33:51
I have a lot of compound strings that are a combination of two or three English words. e.g. "Spicejet" is a combination of the words "spice" and "jet" I need to separate these individual English words from such compound strings. My dictionary is going to consist of around 100000 words. What would be the most efficient by which I can separate individual English words from such compound strings. I'm not sure how much time or frequency you have to do this (is it a one-time operation? daily? weekly?) but you're obviously going to want a quick, weighted dictionary lookup. You'll also want to have a

How to find out if a sentence is a question (interrogative)?

南笙酒味 提交于 2019-11-28 05:23:05
Is there an open source Java library/algorithm for finding if a particular piece of text is a question or not? I am working on a question answering system that needs to analyze if the text input by user is a question. I think the problem can probably be solved by using opensource NLP libraries but its obviously more complicated than simple part of speech tagging. So if someone can instead tell the algorithm for it by using an existing opensource NLP library, that would be good too. Also let me know if you know a library/toolkit that uses data mining to solve this problem. Although it will be

How does uʍop-ǝpᴉsdn text work?

倾然丶 夕夏残阳落幕 提交于 2019-11-28 04:37:10
Here's a website I found that will produce upside down versions of any English text. how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function? Patrick Hendricks does unicode have upside down chars? Yup! Or at least characters that look like they are upside down. Also, regular English-alphabetical characters can appear to be upside down. Like u could be an upside-down n . To code it up, you just have to take an array of characters, display them in reverse order and replace those characters with the upside down version of them. This will get

Expanding English language contractions in Python

风流意气都作罢 提交于 2019-11-28 04:31:07
The English language has a couple of contractions . For instance: you've -> you have he's -> he is These can sometimes cause headache when you are doing natural language processing. Is there a Python library, which can expand these contractions? I made that wikipedia contraction-to-expansion page into a python dictionary (see below) Note, as you might expect, that you definitely want to use double quotes when querying the dictionary: Also, I've left multiple options in as in the wikipedia page. Feel free to modify it as you wish. Note that disambiguation to the right expansion would be a

Extract words surrounding a search word

自作多情 提交于 2019-11-28 04:02:30
问题 I have this script that does a word search in text. The search goes pretty good and results work as expected. What I'm trying to achieve is extract n words close to the match. For example: The world is a small place, we should try to take care of it. Suppose I'm looking for place and I need to extract the 3 words on the right and the 3 words on the left. In this case they would be: left -> [is, a, small] right -> [we, should, try] What is the best approach to do this? Thanks! 回答1: def search

tcl text processing - rearrange values in rows and columns based on user defined value

戏子无情 提交于 2019-11-27 22:52:02
问题 I am new to tcl and would like to use it in text processing of a simple case. The following format is in Liberty (.lib file) which is used in chip design. I would be truly indebted for any help on this. Here is a snippet of my file (text processing to be done only on the "values") timing () { related_pin : "clk"; timing_type : setup_rising; rise_constraint (constraint_template_5X5) { index_1 ("0.01, 0.05, 0.12, 0.2, 0.4"); index_2 ("0.005, 0.025, 0.06, 0.1, 0.3"); index_3 ("0.084, 0.84, 3.36,

Text processing - Python vs Perl performance [closed]

荒凉一梦 提交于 2019-11-27 19:54:14
问题 Here is my Perl and Python script to do some simple text processing from about 21 log files, each about 300 KB to 1 MB (maximum) x 5 times repeated (total of 125 files, due to the log repeated 5 times). Python Code (code modified to use compiled re and using re.I ) #!/usr/bin/python import re import fileinput exists_re = re.compile(r'^(.*?) INFO.*Such a record already exists', re.I) location_re = re.compile(r'^AwbLocation (.*?) insert into', re.I) for line in fileinput.input(): fn = fileinput

summarize text or simplify text [closed]

霸气de小男生 提交于 2019-11-27 17:08:21
Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text? Rion Williams I'm not sure if there is currently any libraries that do this, as text summarization, or at least understandable text summarization isn't something that will be easily accomplished by a simple plug & play library. Here are a few links that I managed to find regarding projects / resources that are related to text summarization to get you started: The Lemur Project Python Natural Language Toolkit O'Reilly's Book on Natural Language Processing in Python

Remove first directory components from path of file

旧巷老猫 提交于 2019-11-27 17:06:31
问题 I need to remove one directory (the leftmost) from variables in Bash. I found ways how can I remove all the path or use dirname and others but it was removing all or one path component on the right side; it wouldn't help me. So you have a better understanding of what I need, I'll write an example: I have a/project/hello.c , a/project/docs/README , ... and I want to remove that a/ so after some commands I´ll have project/hello.c and project/docs/README , ... 回答1: You can use any of: x=a/b/c/d

Python: Best Way to remove duplicate character from string

[亡魂溺海] 提交于 2019-11-27 14:43:26
How can I remove duplicate characters from a string using Python? For example, let's say I have a string: foo = "SSYYNNOOPPSSIISS" How can I make the string: foo = SYNOPSIS I'm new to python and What I have tired and it's working. I knew there is smart and best way to do this.. and only experience can show this.. def RemoveDupliChar(Word): NewWord = " " index = 0 for char in Word: if char != NewWord[index]: NewWord += char index += 1 print(NewWord.strip()) NOTE: Order is important and this question is not similar to this one. Using itertools.groupby : >>> foo = "SSYYNNOOPPSSIISS" >>> import