Get both dates from each sentence in different variables

六眼飞鱼酱① 提交于 2019-12-11 01:25:14

问题


I now have the following text:

Two important events took place on 19/11/1923 and 30/02/1934 respectively. 

I would like to extract both dates but i want them saved in different variables.

I have already tried the regex solution described in a previous question of mine, but in this case it is not working as expected.

Is it possible to save both dates?


回答1:


It is important whenever you ask a question to provide the code you have tried and a reproducible example. Please read this page for tips on how to ask a good question.

Consider your current and previous examples:

clear

input str80 string
"This sentence contains a certain date which is 06-08-2003."
"Two important events took place on 19-11-1923 and 30-02-1934 respectively."
"On this date, 29-12-1945 my grandmother was born."
"12-04-1997 was an important year for celebrations."
end

list string

   +----------------------------------------------------------------------------+
   |                                                                     string |
   |----------------------------------------------------------------------------|
1. |                 This sentence contains a certain date which is 06-08-2003. |
2. | Two important events took place on 19-11-1923 and 30-02-1934 respectively. |
3. |                          On this date, 29-12-1945 my grandmother was born. |
4. |                         12-04-1997 was an important year for celebrations. |
   +----------------------------------------------------------------------------+

Yes, it is possible to extract both dates by combining regex with assert in a for loop:

clonevar temp_string = string
generate date1 = ""
generate date2 = ""

local reg_ex "(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])"

forvalues i = 1 / 4 {
    local dates
    local j = 0
    while `j' == 0 {
        capture assert regex(temp_string[`i'],"`reg_ex'")

        if _rc == 0 {
            local dates = "`dates' " + regexs(1) + "-" + regexs(2) + "-" + regexs(3) + regexs(4)
            replace temp_string = regexr(temp_string[`i'], "`reg_ex'", "null") in `i'
        }

        else {
            local dates_n : word count `dates' 

            if `dates_n' == 1 {
                replace date1 = trim("`dates'") in `i'
            }

            else {
                tokenize `dates'
                replace date1 = "`1'" in `i'
                replace date2 = "`2'" in `i'
            }

            local j = 1
        }
    }
}

drop temp_string

What essentially this block of code is doing, is to check whether each string contains more than one date. If False, it saves the date in a variable date1. If True, the second date is saved in a separate variable date2. In this case:

list date1 date2

   +-------------------------+
   |      date1        date2 |
   |-------------------------|
1. | 06-08-2003              |
2. | 19-11-1923   30-02-1934 |
3. | 29-12-1945              |
4. | 12-04-1997              |
   +-------------------------+

You can easily adapt this example to extract more dates.



来源:https://stackoverflow.com/questions/49968830/get-both-dates-from-each-sentence-in-different-variables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!