Splitting strings in R

我与影子孤独终老i 提交于 2019-12-12 00:53:07

问题


I have a following line

    x<-"CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:"

I want to extract "CUST_Id_8", "Mr. Praveen Kumar" and anything written after DOB: Mother's name: Contact Num: and so on stored in variables like Customer Id, Name, DOB and so on.

Please help.

I used

    strsplit(x, ":")

But the result is a list containing the texts. But I need blanks if there is nothing after the variable name.

Can any1 tell how to extract the string between two words. Like if I want to extract "Mr. Praveen Kumar" between Name: and DOB


回答1:


You can use regexec and regmatches to pull out the various data items as substrings. Here's a worked example:

Sample data

txt <- c("CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:",
         "CUST_Id_15Name:Mr.Joe JohnsonDOB:01/02/1973Mother's Name:BarbaraContact Num:0123 456789Email address:joe@joesville.comOwns Car:YesProducts held with Bank:Savings, CurrentCompany Name:Joes villeSalary per. month:$100000Background:shady")

Pattern to match:

pattern <- "CUST_Id_(.*)Name:(.*)DOB:(.*)Mother's Name:(.*)Contact Num:(.*)Email address:(.*)Owns Car:(.*)Products held with Bank:(.*)Company Name:(.*)Salary per. month:(.*)Background:(.*)"
var_names <- strsplit(pattern, "[:_]\\(\\.\\*\\)")[[1]]

Run the match:

data <- as.data.frame(do.call("rbind", regmatches(txt, regexec(pattern, txt))))[, -1]
colnames(data) <- var_names

Output:

#  CUST_Id             Name        DOB Mother's Name Contact Num
#1       8 Mr.Praveen Kumar                                     
#2      15   Mr.Joe Johnson 01/02/1973       Barbara 0123 456789
#      Email address Owns Car Products held with Bank Company Name
#1                                                                
#2 joe@joesville.com      Yes        Savings, Current   Joes ville
#  Salary per. month Background
#1                             
#2           $100000      shady



回答2:


If you know the keys beforehand, you could extract the values like this:

keys <- c("CUST_Id_8Name", "DOB", "Mother's Name", 
  "Contact Num", "Email address", "Owns Car", "Products held with Bank", 
  "Company Name", "Salary per. month", "Background")
cbind(keys, values = sub("^:", "", strsplit(x, paste0(keys, collapse = "|"))[[1]][-1]))
#                 keys                      values            
# [1,] "CUST_Id_8Name"           "Mr.Praveen Kumar"
# [2,] "DOB"                     ""                
# [3,] "Mother's Name"           ""                
# [4,] "Contact Num"             ""                
# [5,] "Email address"           ""                
# [6,] "Owns Car"                ""                
# [7,] "Products held with Bank" ""                
# [8,] "Company Name"            ""                
# [9,] "Salary per. month"       ""                
# [10,] "Background"              ""   


来源:https://stackoverflow.com/questions/31513552/splitting-strings-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!