R substr function on multiple columns

懵懂的女人 提交于 2021-02-05 11:57:23

问题


I have 3 columns. First column has unique ID, second and third columns have string data and some NA data. I need to extract info from column 2 and put it in separate columns and do the same thing for column 3. I am building a function as follows, using for loops. I need to split the columns after the third letter. [For example in the V1 column below, I need to break AAAbbb as AAA and bbb and put them in separate columns. I know I can use substr to do this. I am new to R, please help.


UID * V1 * V2 *


Z001NL * AAAbbb * IADSFO *


Z001NP * IADSFO * NA *


Z0024G * SFOHNL * NLSFO0 *


Here's my code.

test=read.csv("c:/some/path/in/windows/test.csv", header=TRUE)

substring_it = function(test)
{
for(i in 1:3){
for(j in 2:3){
answer = transform(test, code 1 = substr((test[[j,i]]), 1, 3), code2 = substr((test[j,i]), 4, 6))

}
}
return(answer)

}

hello = substring_it(test)

test will be my data frame that I will read in.

I need this as my output


UID * V1.1 * V1.2 * V2.1 * V2.2


Z001NL * AAA * bbb * IAD * SFO


Z001NP * IAD * SFO * NA * NA


Z0024G * SFO * HNL * NLS * SFO



回答1:


You can use sapply to apply a function to each element of a vector - this could be useful here, since you could use sapply on the columns of your original data frame (test) to create the columns for your new data frame.

Here's a solution that does this:

test = data.frame(UID = c('Z001NL', 'Z001NP', 'Z0024G'), 
  V1 = c('AAAbbb', 'IADSFO', 'SFOHNL'),
  V2 = c('IADSFO', NA, 'NLSFO0'))

substring_it = function(x){
  # x is a data frame
  c1 = sapply(x[,2], function(x) substr(x, 1, 3))
  c2 = sapply(x[,2], function(x) substr(x, 4, 6))
  c3 = sapply(x[,3], function(x) substr(x, 1, 3))
  c4 = sapply(x[,3], function(x) substr(x, 4, 6))
  return(data.frame(UID=x[,1], c1, c2, c3, c4))
}

substring_it(test)
# returns:
#     UID  c1  c2   c3   c4
#1 Z001NL AAA bbb  IAD  SFO
#2 Z001NP IAD SFO <NA> <NA>
#3 Z0024G SFO HNL  NLS  FO0

EDIT: here's a way to loop over columns if you have to do this a bunch of times. I'm not sure what order your original data frame's columns are in and what order you want the new data frame's columns to end up in, so you may need to play around with the "pos" counter. I also assumed the columns to be split were columns 2 thru 201 ("colindex"), so you'll probably have to change that.

newcolumns = list()
pos = 1 #counter for column index of new data frame
for(colindex in 2:201){
    newcolumns[[pos]] = sapply(test[,colindex], function(x) substr(x, 1, 3))
    newcolumns[[pos+1]] = sapply(test[,colindex], function(x) substr(x, 4, 6))
    pos = pos+2
}
newdataframe = data.frame(UID = test[,1], newcolumns)
# update "names(newdataframe)" as needed


来源:https://stackoverflow.com/questions/20783034/r-substr-function-on-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!