text-segmentation

Converting a String to a List of Words?

Deadly 提交于 2019-11-26 15:21:29
I'm trying to convert a string to a list of words using python. I want to take something like the following: string = 'This is a string, with words!' Then convert to something like this : list = ['This', 'is', 'a', 'string', 'with', 'words'] Notice the omission of punctuation and spaces. What would be the fastest way of going about this? Bryan Try this: import re mystr = 'This is a string, with words!' wordList = re.sub("[^\w]", " ", mystr).split() How it works: From the docs : re.sub(pattern, repl, string, count=0, flags=0) Return the string obtained by replacing the leftmost non-overlapping

Explode a paragraph into sentences in PHP

不羁岁月 提交于 2019-11-26 14:36:52
问题 I have been using explode(".",$mystring) to split a paragraph into sentences. However this doen't cover sentences that have been concluded with different punctuation such as ! ? : ; Is there a way of using an array as a delimiter instead of a single character? Alternativly is there another neat way of splitting using various punctuation? I tried explode(("." || "?" || "!"),$mystring) hopefully but it didn't work... 回答1: You can do: preg_split('/\.|\?|!/',$mystring); or (simpler): preg_split('

How to split a string into words. Ex: “stringintowords” -> “String Into Words”?

柔情痞子 提交于 2019-11-26 10:24:06
问题 What is the right way to split a string into words ? (string doesn\'t contain any spaces or punctuation marks) For example: \"stringintowords\" -> \"String Into Words\" Could you please advise what algorithm should be used here ? ! Update: For those who think this question is just for curiosity. This algorithm could be used to camеlcase domain names (\"sportandfishing .com\" -> \"SportAndFishing .com\") and this algo is currently used by aboutus dot org to do this conversion dynamically. 回答1:

How to capitalize first letter of first word in a sentence?

无人久伴 提交于 2019-11-26 07:45:03
问题 I am trying to write a function to clean up user input. I am not trying to make it perfect. I would rather have a few names and acronyms in lowercase than a full paragraph in uppercase. I think the function should use regular expressions but I\'m pretty bad with those and I need some help. If the following expressions are followed by a letter, I want to make that letter uppercase. \".\" \". \" (followed by a space) \"!\" \"! \" (followed by a space) \"?\" \"? \" (followed by a space) Even

php sentence boundaries detection

穿精又带淫゛_ 提交于 2019-11-26 05:31:35
问题 I would like to divide a text into sentences in PHP. I\'m currently using a regex, which brings ~95% accuracy and would like to improve by using a better approach. I\'ve seen NLP tools that do that in Perl, Java, and C but didn\'t see anything that fits PHP. Do you know of such a tool? 回答1: An enhanced regex solution Assuming you do care about handling: Mr. and Mrs. etc. abbreviations, then the following single regex solution works pretty well: <?php // test.php Rev:20160820_1800 $split

Converting a String to a List of Words?

梦想的初衷 提交于 2019-11-26 04:20:30
问题 I\'m trying to convert a string to a list of words using python. I want to take something like the following: string = \'This is a string, with words!\' Then convert to something like this : list = [\'This\', \'is\', \'a\', \'string\', \'with\', \'words\'] Notice the omission of punctuation and spaces. What would be the fastest way of going about this? 回答1: Try this: import re mystr = 'This is a string, with words!' wordList = re.sub("[^\w]", " ", mystr).split() How it works: From the docs :

How to split a string into a list?

回眸只為那壹抹淺笑 提交于 2019-11-25 22:16:26
问题 I want my Python function to split a sentence (input) and store each word in a list. My current code splits the sentence, but does not store the words as a list. How do I do that? def split_line(text): # split the text words = text.split() # for each word in the line: for word in words: # print the word print(words) 回答1: text.split() This should be enough to store each word in a list. words is already a list of the words from the sentence, so there is no need for the loop. Second, it might be