text-extraction

Extract Dates in any format from Text in R

纵然是瞬间 提交于 2020-01-03 06:35:31
问题 I want to Extract Dates from the Given Text , Dates can be in any format April 10 2018, 10-04-2018 , 10/04/2018, 2018/04/10, 04.10.2018 like in other formats .... I have news data and want to extract dates from the text for example : My Friend is coming on july 10 2018 or 10/07/2018 i want to extract date from the given text Please Help Thanks in advance 回答1: we extract it using str_extract and then with anydate get the format library(anytime) library(stringr) anydate(str_extract_all(str1, "[

Text Extraction from Notebook

心不动则不痛 提交于 2020-01-03 05:24:06
问题 I am trying to extract handwritten text from images. I use python with opencv functions such us find_contours. It was all going pretty well when I used images like this one: It works fine because I have a plain background. But then I tested it with this image: Because of the notebook's lines in the background, I am not able to extract the text only. Although the text is red, I turn all images to grayscale or sometimes threshold so it all turns black just like the notebook lines. That way the

Extract a Word from String Containing a Specific Character within Substring

浪子不回头ぞ 提交于 2019-12-30 11:55:10
问题 In MS Excel I would like to use a formula to extract only the word from a cell that contains a specific character ("=") within the text. A2: Dolly made me a homemade=cake and some muffins A3: we had cheese=cake for dinner A4: Everyone loves how the bakery makes some awesome=cakes A5: Johnny made his own dinner=lastnight and then cleaned the kitchen A6: There was a tremendous amount of raing State=Oklahoma I would like the following from in column (A2:A4) to provide the following results in

php: Get plain text from html - simplehtmldom or php strip_tags?

主宰稳场 提交于 2019-12-30 11:15:18
问题 I am looking at getting the plain text from html. Which one should I choose, php strip_tags or simplehtmldom plaintext extraction? One pro for simplehtmldom is support of invalid html, is that sufficient in itself? 回答1: You should probably use smiplehtmldom for the reason you mentioned and that strip_tags may also leave you non-text elements like javascript or css contained within script/style blocks You would also be able to filter text from elements that aren't displayed (inline style

List the words in a vocabulary according to occurrence in a text corpus , Scikit-Learn

笑着哭i 提交于 2019-12-29 04:21:19
问题 I have fitted a CountVectorizer to some documents in scikit-learn . I would like to see all the terms and their corresponding frequency in the text corpus, in order to select stop-words. For example 'and' 123 times, 'to' 100 times, 'for' 90 times, ... and so on Is there any built-in function for this? 回答1: If cv is your CountVectorizer and X is the vectorized corpus, then zip(cv.get_feature_names(), np.asarray(X.sum(axis=0)).ravel()) returns a list of (term, frequency) pairs for each distinct

Extract data of a cell after colon [duplicate]

天大地大妈咪最大 提交于 2019-12-26 02:45:44
问题 This question already has answers here : Excel Formula needed in separate column (2 answers) Closed last year . my data looks like this: Department code ABCD : ZERT ABCD : ZERT : TYUI ABCD : ZERT : TYUI_1 ABCD : ZERT : TYUI_2 ABCD : ZERT : TYOP ABCD : ZERT : TYOM ABCD : ZERT : TYOM : WXCV Basicaly I am looking for a function in excel that will permit me to only extract the last characters placed after the last colon. Expected data would look like this : Department code Expected data ABCD :

Repeat text extraction with Python

蹲街弑〆低调 提交于 2019-12-25 04:25:07
问题 I have the following code which I would like to use to extract texts information between <font color='#FF0000'> and </font> . It works fine but it only extracts one unit (the first one) whereas I would like to extract all textual units between these tags. I tried to do this with a bash loop code but it didn't work. import os directory_path ='C:\\My_folder\\tmp' for files in os.listdir(directory_path): print(files) path_for_files = os.path.join(directory_path, files) text = open(path_for_files

Extract file in PHP and display content in chart

这一生的挚爱 提交于 2019-12-25 02:41:07
问题 I have this a file named "test.txt" which has the following content: BEGIN_SESSION 7 1h+ 47 30mn-1h 20 15mn-30mn 16 5mn-15mn 43 2mn-5mn 29 30s-2mn 35 0s-30s 170 END_SESSION And thanks to the user wumm it help me to find a way to extract the data and display it.So the problem is :I've witten this function that display data in a pie chart : function awstats_extract_session($session) { # Session range - Number of visits $session = explode("\n", $session) ; unset($session[(count($session)-1)]) ;

How to get NN andNNS from a text?

删除回忆录丶 提交于 2019-12-25 00:42:57
问题 I want to get NN or NNS from a sample text as given within the script below. To this end, when I use the code below, the output is: types synchronization phase synchronization -RSB- synchronization -LSB- -RSB- projection synchronization Here why am I getting [-RSB-] or [-LSB-] ? Should I use a different pattern to get NN or NNS at the same time? atic = "So far, many different types of synchronization have been investigated, such as complete synchronization [8], generalized synchronization [9]

Select text between key words

被刻印的时光 ゝ 提交于 2019-12-24 23:05:25
问题 This is a follow on question to Select block of text and merge into new document I have a SGM document with comments added and comments in my sgm file. I need to extract the strings in between the start/stop comments so I can put them in a temporary file for modification. Right now it's selecting everything including the start/stop comments and data outside of the start/stop comments. Dim DirFolder As String = txtDirectory.Text Dim Directory As New IO.DirectoryInfo(DirFolder) Dim allFiles As