Trim part of a string in dataframe

橙三吉。 提交于 2019-11-29 16:10:25

If the strings are all the same style at the start, three characters before the underline, this will work:

df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt", 
                              "CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))

> substr(df1$Col, 1, 3)
[1] "AA1" "BB2" "CCC"

You could try sub

sub('_.*', '', df1$Col)
#[1] "AA1" "BB2" "CCC"

data

df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt", 
"CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))

You could also read the column again, using comment.char = "_" to flush the rest of the line. Y

df <- data.frame(x = c("AA1_123.zip", "BB2_456.txt", "CCC_789.doc"))

read.table(text = as.character(df$x), comment.char="_")
#    V1
# 1 AA1
# 2 BB2
# 3 CCC

Or you can use scan()

scan(text = as.character(df$x), what = "", comment.char="_")
# Read 3 items
# [1] "AA1" "BB2" "CCC"
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!