Restructuring a data frame for 3D plots in R

匆匆过客 提交于 2020-01-05 08:07:33

问题


I realize often times that 3D plots are not the most efficient way to present a set of data, but previous 2D plots I've made for a particular dataset seem to indicate that a 3D plot would help to break the information into more distinct clusters for analysis. That being said, I've never done this in R and I'm having trouble restructuring my data frame before making a 3D scatterplot using plot3d().

At the moment, my data frame has 2 columns and a few thousand rows of information. Column one is an identifier, A,B,C... and Column 2 is one measured feature for that identifier.

Ex

ID Area 
A   1.2
A   3.0
A   2.7
B   1.4
B   2.5
C   4.3
C   2.1
C   1.7

I will plot the area on the Y axis. Using a function like table(), I can get the number of times A, B, or C occur: (A=3,B=2,C=3) and this value will become the x coordinate for all the IDs with that result. But what I would like to do is have that information also put into a third column that assigns a unique z for the given x coordinate. In other words, Z should represent how many times a given X has shown up, and would increase by 1 for each new instance of a particular X. Ultimately, the reason is so that area values (y) for all the objects within a particular ID are stacked above each other over a unique x,z coordinate. This is where I am stuck. Essentially, I would want the final data frame output given the above input to look like this:

ID(x) Area(y)  Z
    3    1.2   1
    3    3.0   1
    3    2.7   1
    2    1.4   1
    2    2.5   1
    3    4.3   2
    3    2.1   2
    3    1.7   2 

回答1:


We could do this in a couple of ways.

1. base R - aggregate/ave

We can use aggregate to get the length of each elements ('IDx') in 'ID' column, transform the output dataset ('dfN') by creating the 'Z' column based on the duplicate elements in the 'IDx' and 'merge' the 'dfN' with the original dataset 'df1'

dfN <- aggregate(cbind(IDx=seq_along(ID))~ID, df1, FUN=length)
dfN$Z <- with(dfN, ave(IDx, IDx, FUN=function(x) cumsum(duplicated(x))+1L))
 merge(df1, dfN, by='ID')[-1]
 #  Area IDx Z
 #1  1.2   3 1
 #2  3.0   3 1
 #3  2.7   3 1
 #4  1.4   2 1
 #5  2.5   2 1
 #6  4.3   3 2
 #7  2.1   3 2
 #8  1.7   3 2

2. base R - ave/rle

We can create the 'IDx' column with ave and then use `rle/inverse.rle' to create the 'Z' column

 df1$IDx <- with(df1, ave(seq_along(ID), ID, FUN=length))
 v1 <- with(df1, paste0(ID, IDx))
 df1$Z <- inverse.rle(within.list(rle(v1), values <-ave(lengths, 
             lengths, FUN=function(x) cumsum(duplicated(x))+1L)))
 df1
 #  ID Area IDx Z
 #1  A  1.2   3 1
 #2  A  3.0   3 1
 #3  A  2.7   3 1
 #4  B  1.4   2 1
 #5  B  2.5   2 1
 #6  C  4.3   3 2
 #7  C  2.1   3 2
 #8  C  1.7   3 2

3. data.table

Convert the 'data.frame' to 'data.table' (setDT), create the 'IDx' i.e the nrows (.N), grouped by 'ID'. Based on the duplicate elements in 'IDx', we can create the 'Z' column. Set the key as 'ID' (setkey), join with 'df1', and assign the unnecessary column to NULL (ID:= NULL)

library(data.table)
setkey(setDT(df1)[, list(IDx=.N), by = ID][, IDx1:= IDx][,
     list(ID,Z=cumsum(duplicated(IDx1))+1L) , IDx], ID)[df1][, ID := NULL][]

#   IDx Z Area
#1:   3 1  1.2
#2:   3 1  3.0
#3:   3 1  2.7
#4:   2 1  1.4
#5:   2 1  2.5
#6:   3 2  4.3
#7:   3 2  2.1
#8:   3 2  1.7

4. dplyr

The idea is similar as above. Instead of 'merge', we use left_join

library(dplyr)
left_join(df1, 
            df1 %>% 
              group_by(ID) %>% 
              summarise(IDx=n()) %>% 
              group_by(IDx) %>%
              mutate(Z=cumsum(duplicated(IDx))+1L), by='ID') %>% 
              select(-ID)
 #  Area IDx Z
 #1  1.2   3 1
 #2  3.0   3 1
 #3  2.7   3 1
 #4  1.4   2 1
 #5  2.5   2 1
 #6  4.3   3 2
 #7  2.1   3 2
 #8  1.7   3 2

NOTE: Tested this with another dataset 'df2'

data

df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "C", "C", "C"), 
Area = c(1.2, 3, 2.7, 1.4, 2.5, 4.3, 2.1, 1.7)), .Names = c("ID", 
"Area"), class = "data.frame", row.names = c(NA, -8L))

df2 <-  structure(list(ID = c("A", "A", "A", "B", "B", "C", "C", "C", 
"D", "D", "D", "E", "E", "F"), Area = c(1.2, 3, 2.7, 1.4, 2.5, 
4.3, 2.1, 1.7, 1.2, 1.4, 2.1, 1.2, 1.5, 2.3)), .Names = c("ID", 
"Area"), class = "data.frame", row.names = c(NA, -14L))


来源:https://stackoverflow.com/questions/29337599/restructuring-a-data-frame-for-3d-plots-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!