text-segmentation

Sentence segmentation tools to use when input sentence has no punctuation (is normalized)

微笑、不失礼 提交于 2020-01-01 11:57:20
问题 Suppose there is a sentence like "find me some jazz music and play it", where all the text is normalized and there are no punctuation marks (output of a speech recognition library). What online/offline tools can be used to do "sentence segmentation" other than the naive approach of splitting on conjunctions ? Input: find me some jazz music and play it Output: find me some jazz music play it 回答1: A dependence parser should help. 回答2: You can use a semantic role tagger like mate tools etc...

A Viable Solution for Word Splitting Khmer?

狂风中的少年 提交于 2019-12-30 03:55:13
问题 I am working on a solution to split long lines of Khmer (the Cambodian language) into individual words (in UTF-8). Khmer does not use spaces between words. There are a few solutions out there, but they are far from adequate (here and here), and those projects have fallen by the wayside. Here is a sample line of Khmer that needs to be split (they can be longer than this): ចូរសរសើរដល់ទ្រង់ដែលទ្រង់បានប្រទានការទាំងអស់នោះមកដល់រូបអ្នកដោយព្រោះអង្គព្រះយេស៊ូវ

How to uppercase the first letter in a sentence in PHP? [duplicate]

前提是你 提交于 2019-12-25 05:17:05
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: How do I display the first letter as uppercase? PHP capitalize first letter of first word in a sentence I want to uppercase the first letter in a sentence and after a period. Can anyone suggest how to do? For example, //I have the following in a language class. "%s needs to identify areas of strength and weakness. %s sets goals for self-improvement."; // in a view $contone=$this->lang->line($colstr);// e.g get

counting the number of sentences in a paragraph in c

不打扰是莪最后的温柔 提交于 2019-12-24 03:22:26
问题 As part of my course, I have to learn C using Turbo C (unfortunately). Our teacher asked us to make a piece of code that counts the number of characters, words and sentences in a paragraph (only using printf, getch() and a while loop.. he doesn't want us to use any other commands yet). Here is the code I wrote: #include <stdio.h> #include <conio.h> void main(void) { clrscr(); int count = 0; int words = 0; int sentences = 0; char ch; while ((ch = getch()) != '\n') { printf("%c", ch); while (

Split paragraph into sentences with titles and numbers

你说的曾经没有我的故事 提交于 2019-12-22 08:41:22
问题 I'm using the BreakIterator class in Java to break paragraph into sentences. This is my code : public Map<String, Double> breakSentence(String document) { sentences = new HashMap<String, Double>(); BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US); bi.setText(document); Double tfIdf = 0.0; int start = bi.first(); for(int end = bi.next(); end != BreakIterator.DONE; start = end, end = bi.next()) { String sentence = document.substring(start, end); sentences.put(sentence, tfIdf); }

Split paragraph into sentences with titles and numbers

两盒软妹~` 提交于 2019-12-22 08:41:05
问题 I'm using the BreakIterator class in Java to break paragraph into sentences. This is my code : public Map<String, Double> breakSentence(String document) { sentences = new HashMap<String, Double>(); BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US); bi.setText(document); Double tfIdf = 0.0; int start = bi.first(); for(int end = bi.next(); end != BreakIterator.DONE; start = end, end = bi.next()) { String sentence = document.substring(start, end); sentences.put(sentence, tfIdf); }

How can I split a sentence into words and punctuation marks?

你说的曾经没有我的故事 提交于 2019-12-22 08:17:08
问题 For example, I want to split this sentence: I am a sentence. Into an array with 5 parts; I , am , a , sentence , and . . I'm currently using preg_split after trying explode , but I can't seem to find something suitable. This is what I've tried: $sentence = explode(" ", $sentence); /* returns array(4) { [0]=> string(1) "I" [1]=> string(2) "am" [2]=> string(1) "a" [3]=> string(8) "sentence." } */ And also this: $sentence = preg_split("/[.?!\s]/", $sentence); /* returns array(5) { [0]=> string(1

Anyone know an example algorithm for word segmentation using dynamic programming? [closed]

喜夏-厌秋 提交于 2019-12-21 21:34:24
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . If you search google for word segmentation there really are no very good descriptions of it and I'm just trying to fully understand the process a dynamic programming algorithm takes to find a segmentation of a string into individual words. Does anyone know a place where there is a good description of a word

Word-Counter in some hieroglyphics languages?

只愿长相守 提交于 2019-12-19 11:54:17
问题 Is there any available library for word-counting of some hieroglyphics language (ex: chinese, japanese, korean...)? I found that MS Word count effectively texts in these languages. Can I add reference to MS Word libraries in my .NET application to implement this function? Or is there any other solutions to achieve this purpose? 回答1: s there any available library for word-counting of some hieroglyphics language (ex: chinese, japanese, korean...)? Hieroglyphics ? No, they're not. They're

Formatting sentences in a string using C#

女生的网名这么多〃 提交于 2019-12-19 03:09:14
问题 I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word. eg ."this is some code. the code is in C#. " The ouput must be "This is some code. The code is in C#". one way would be to split the string based on '.' and then capitalize the first letter and then rejoin. Is there a better solution? 回答1: In my opinion, when it comes to potentially complex rules-based string matching and replacing - you