问题
I have the following text:
This sentence contains a certain date which is 06-08-2003.
On this date, 29-12-1945 my grandmother was born.
12-04-1997 was an important year for celebrations.
I would like to get the date
in a variable but the substr()
function
does not seem to work?
回答1:
You do not show us your code so we cannot tell you what is wrong with substr()
.
That said, the substr()
function works as intended if you know the position of the desired item in a string
.
In this case, the dates
appear in different places within each string
. One way to get the desired output is to use the strpos()
function to find where the
hyphen is. Then you can use this as a reference point to calculate the starting position of the date
in each string:
clear
set obs 3
input str60 string
"This sentence contains a certain date which is 06-08-2003."
"On this date, 29-12-1945 my grandmother was born."
"12-04-1997 was an important year for celebrations."
end
generate new_string = ""
forvalues i = 1 / 3 {
local pos = strpos(string[`i'], "-") - 2
replace new_string = substr(string, `pos', 10) in `i'
}
list string new_string
+-------------------------------------------------------------------------+
| string new_string |
|-------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003. 06-08-2003 |
2. | On this date, 29-12-1945 my grandmother was born. 29-12-1945 |
3. | 12-04-1997 was an important year for celebrations. 12-04-1997 |
+-------------------------------------------------------------------------+
This approach assumes that the dates
in your strings
are consistent. That is, they all have the same format and there are no mistakes. However, in practice this will often not be the case.
A better way of obtaining the desired output is by using regex
and regexs
:
generate new_string = regexs(1) + "-" + regexs(2) + "-" + regexs(3)+ regexs(4) if ///
regex(string,"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])")
The above regular expression not only finds each date
in each string
, but also does so using some logical criteria to check whether the former is a valid one. For example:
replace string = "On this date, 29-131945 my grandmother was born." in 2
drop new_string
generate new_string = regexs(1) + "-" + regexs(2) + "-" + regexs(3)+ regexs(4) if ///
regex(string,"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])")
list string new_string
+-------------------------------------------------------------------------+
| string new_string |
|-------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003. 06-08-2003 |
2. | On this date, 29-131945 my grandmother was born. |
3. | 12-04-1997 was an important year for celebrations. 12-04-1997 |
+-------------------------------------------------------------------------+
As you can see, if the date
in the second string
is 29-13-1945
, or 29-131945
, the corresponding observation is empty. Thus, this approach will often prevent you from getting non-sensical results, while also identifying problematic cases.
Note however, that even this approach is not bulletproof and you will have to introduce additional flexibility by altering the regular expression if you want to handle more complex cases.
来源:https://stackoverflow.com/questions/49966317/get-the-date-from-each-sentence-in-a-variable