问题
I now have the following text:
Two important events took place on 19/11/1923 and 30/02/1934 respectively.
I would like to extract both dates
but i want them saved in different variables.
I have already tried the regex
solution described in a previous question of mine, but in this case it is not working as expected.
Is it possible to save both dates?
回答1:
It is important whenever you ask a question to provide the code you have tried and a reproducible example. Please read this page for tips on how to ask a good question.
Consider your current and previous examples:
clear
input str80 string
"This sentence contains a certain date which is 06-08-2003."
"Two important events took place on 19-11-1923 and 30-02-1934 respectively."
"On this date, 29-12-1945 my grandmother was born."
"12-04-1997 was an important year for celebrations."
end
list string
+----------------------------------------------------------------------------+
| string |
|----------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003. |
2. | Two important events took place on 19-11-1923 and 30-02-1934 respectively. |
3. | On this date, 29-12-1945 my grandmother was born. |
4. | 12-04-1997 was an important year for celebrations. |
+----------------------------------------------------------------------------+
Yes, it is possible to extract both dates by combining regex
with assert
in a for
loop:
clonevar temp_string = string
generate date1 = ""
generate date2 = ""
local reg_ex "(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])"
forvalues i = 1 / 4 {
local dates
local j = 0
while `j' == 0 {
capture assert regex(temp_string[`i'],"`reg_ex'")
if _rc == 0 {
local dates = "`dates' " + regexs(1) + "-" + regexs(2) + "-" + regexs(3) + regexs(4)
replace temp_string = regexr(temp_string[`i'], "`reg_ex'", "null") in `i'
}
else {
local dates_n : word count `dates'
if `dates_n' == 1 {
replace date1 = trim("`dates'") in `i'
}
else {
tokenize `dates'
replace date1 = "`1'" in `i'
replace date2 = "`2'" in `i'
}
local j = 1
}
}
}
drop temp_string
What essentially this block of code is doing, is to check whether each string
contains more than one date. If False
, it saves the date in a variable date1
. If True
, the second date is saved in a separate variable date2
. In this case:
list date1 date2
+-------------------------+
| date1 date2 |
|-------------------------|
1. | 06-08-2003 |
2. | 19-11-1923 30-02-1934 |
3. | 29-12-1945 |
4. | 12-04-1997 |
+-------------------------+
You can easily adapt this example to extract more dates.
来源:https://stackoverflow.com/questions/49968830/get-both-dates-from-each-sentence-in-different-variables