Get the date from each sentence in a variable

落花浮王杯 提交于 2020-01-04 05:48:48

问题


I have the following text:

This sentence contains a certain date which is 06-08-2003.  
On this date, 29-12-1945 my grandmother was born.  
12-04-1997 was an important year for celebrations.  

I would like to get the date in a variable but the substr() function does not seem to work?


回答1:


You do not show us your code so we cannot tell you what is wrong with substr(). That said, the substr() function works as intended if you know the position of the desired item in a string.

In this case, the dates appear in different places within each string. One way to get the desired output is to use the strpos() function to find where the hyphen is. Then you can use this as a reference point to calculate the starting position of the date in each string:

clear
set obs 3

input str60 string
"This sentence contains a certain date which is 06-08-2003."
"On this date, 29-12-1945 my grandmother was born."
"12-04-1997 was an important year for celebrations."
end

generate new_string = ""

forvalues i = 1 / 3 {
    local pos = strpos(string[`i'], "-") - 2
    replace new_string = substr(string, `pos', 10) in `i'
}


list string new_string

   +-------------------------------------------------------------------------+
   |                                                     string   new_string |
   |-------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003.   06-08-2003 |
2. |          On this date, 29-12-1945 my grandmother was born.   29-12-1945 |
3. |         12-04-1997 was an important year for celebrations.   12-04-1997 |                                                                        
   +-------------------------------------------------------------------------+

This approach assumes that the dates in your strings are consistent. That is, they all have the same format and there are no mistakes. However, in practice this will often not be the case.

A better way of obtaining the desired output is by using regex and regexs:

generate new_string = regexs(1) + "-" + regexs(2) + "-" + regexs(3)+ regexs(4) if ///
regex(string,"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])")

The above regular expression not only finds each date in each string, but also does so using some logical criteria to check whether the former is a valid one. For example:

replace string = "On this date, 29-131945 my grandmother was born." in 2

drop new_string

generate new_string = regexs(1) + "-" + regexs(2) + "-" + regexs(3)+ regexs(4) if ///
regex(string,"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])")


list string new_string

   +-------------------------------------------------------------------------+
   |                                                     string   new_string |
   |-------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003.   06-08-2003 |
2. |           On this date, 29-131945 my grandmother was born.              |
3. |         12-04-1997 was an important year for celebrations.   12-04-1997 |
   +-------------------------------------------------------------------------+

As you can see, if the date in the second string is 29-13-1945, or 29-131945, the corresponding observation is empty. Thus, this approach will often prevent you from getting non-sensical results, while also identifying problematic cases.

Note however, that even this approach is not bulletproof and you will have to introduce additional flexibility by altering the regular expression if you want to handle more complex cases.



来源:https://stackoverflow.com/questions/49966317/get-the-date-from-each-sentence-in-a-variable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!