Concatenate rows of a data frame

前端 未结 4 1507
醉梦人生
醉梦人生 2020-11-30 07:28

I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single el

4条回答
  •  没有蜡笔的小新
    2020-11-30 07:49

    if you want to start with

    df <- data.frame(letters = LETTERS[1:5], numbers = 1:5, stringsAsFactors=TRUE)
    

    .. then there is no general rule about how df$letters will be interpreted by any given function. It's a factor for modelling functions, character for some and integer for some others. Even the same function such as paste may interpret it differently, depending on how you use it:

    paste(df[1,], collapse="") # "11"
    apply(df, 1, paste, collapse="") # "A1" "B2" "C3" "D4" "E5"
    

    No logic in it except that it will probably make sense once you know the internals of every function.

    The factors seem to be converted to integers when an argument is converted to vector (as you know, data frames are lists of vectors of equal length, so the first row of a data frame is also a list, and when it is forced to be a vector, something like this happens:)

    df[1,]
    #    letters numbers
    # 1       A       1
    unlist(df[1,])
    # letters numbers 
    #  1       1 
    

    I don't know how apply achieves what it does (i.e., factors are represented by character values) -- if you're interested, look at its source code. It may be useful to know, though, that you can trust (in this specific sense) apply (in this specific occasion). More generally, it is useful to store every piece of data in a sensible format, that includes storing strings as strings, i.e., using stringsAsFactors=FALSE.

    Btw, every introductory R book should have this idea in a subtitle. For example, my plan for retirement is to write "A (not so) gentle introduction to the zen of data fishery with R, the stringsAsFactors=FALSE way".

提交回复
热议问题