Remove column names in a DataFrame

这一生的挚爱 提交于 2019-12-13 21:07:37

问题


In sparkR I have a DataFrame data. When I type head(data) we get this output

  C0      C1               C2         C3
1 id user_id foreign_model_id machine_id 
2  1   3145                4         12 
3  2   4079                1          8 
4  3   1174                7          1    
5  4   2386                9          9    
6  5   5524                1          7

I want to remove C0,C1,C2,C3 because they give me problems later one. For example when I use the filter function:

filter(data,data$machine_id==1)

can't run because of this.


I have read the data like this

data <- read.df(sqlContext, "/home/ole/.../data", "com.databricks.spark.csv")

回答1:


SparkR made the header into the first row and gave the DataFrame a new header because the default for the header option is "false". Set the header option to header="true" and then you won't have to handle with this problem.

data <- read.df(sqlContext, "/home/ole/.../data", "com.databricks.spark.csv", header="true")



回答2:


Try

colnames(data) <- unlist(data[1,])
data <- data[-1,]
> data
#  id user_id foreign_model_id machine_id
#2  1    3145                4         12
#3  2    4079                1          8
#4  3    1174                7          1
#5  4    2386                9          9
#6  5    5524                1          7

If you wish, you can add rownames(data) <- NULL to correct for the row numbers after the deletion of the first row.

After this manipulation, you can select rows that correspond to certain criteria, like

subset(data, data$machine_id==1)
#  id user_id foreign_model_id machine_id
#4  3    1174                7          1

In base R, the function filter() suggested in the OP is part of the stats namespace and is usually reserved for the analysis of time series.

data

data <- structure(list(C0 = structure(c(6L, 1L, 2L, 3L, 4L, 5L), 
      .Label = c("1", "2", "3", "4", "5", "id"), class = "factor"), 
       C1 = structure(c(6L, 3L, 4L, 1L, 2L, 5L), .Label = c("1174", "2386", 
      "3145", "4079", "5524", "user_id"), class = "factor"), 
      C2 = structure(c(5L, 2L, 1L, 3L, 4L, 1L), 
     .Label = c("1", "4", "7", "9", "foreign_model_id"), class = "factor"), 
      C3 = structure(c(6L, 2L, 4L, 1L, 5L, 3L), 
      .Label = c("1", "12", "7", "8", "9", "machine_id"), class = "factor")), 
     .Names = c("C0", "C1", "C2", "C3"), class = "data.frame", 
     row.names = c("1", "2", "3", "4", "5", "6"))



回答3:


try this

names <- c()
for (i in seq(along = names(data))) {
   names <- c(names, toString(data[1,i]))
}

names(data) <- names
data <- data[-1,]



回答4:


I simply can't use the answers because in sparkR it can't run: object of type 'S4' is not subsettable. I solved the problem this way, however, I think there is a better way to solve it.

data <- withColumnRenamed(data, "C0","id")
data <- withColumnRenamed(data, "C1","user_id")
data <- withColumnRenamed(data, "C2","foreign_model_id")
data <- withColumnRenamed(data, "C3","machine_id")

And now I can successfully use the filter function as I want to.



来源:https://stackoverflow.com/questions/35840073/remove-column-names-in-a-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!