R - Wrong error message - Error: Duplicate identifiers for rows [duplicate]

江枫思渺然 提交于 2020-01-06 06:16:12

问题


I have a problem with a dataframe that I need to reshape.

I have this command:

library(tidyverse)
df1 = df1 %>% gather(Day, value, Day01:Day31) %>% spread(Station, value)

And I get this error:

Error: Duplicate identifiers for rows (130933, 131029), (389113, 389209), (647293, 647389), (905473, 905569), (1163653, 1163749), (1421833, 1421929), (1680013, 1680109), (1938193, 1938289), (2196373, 2196469), (2454553, 2454649), (2712733, 2712829), (2970913, 2971009), (3229093, 3229189), (3487273, 3487369), (3745453, 3745549), (4003633, 4003729), (4261813, 4261909), (4519993, 4520089), (4778173, 4778269), (5036353, 5036449), (5294533, 5294629), (5552713, 5552809), (5810893, 5810989), (6069073, 6069169), (6327253, 6327349), (6585433, 6585529), (6843613, 6843709), (7101793, 7101889), (7359973, 7360069), (7618153, 7618249), (7876333, 7876429), (130934, 131030), (389114, 389210), (647294, 647390), (905474, 905570), (1163654, 1163750), (1421834, 1421930), (1680014, 1680110), (1938194, 1938290), (2196374, 2196470), (2454554, 2454650), (2712734, 2712830), (2970914, 2971010), (3229094, 3229190), (3487274, 3487370), (3745454, 3745550), (4003634, 4003730), (4261814, 4261910), (4519994, 4520090

The strange thing is that I also get this results:

library(dplyr)
test = rownames_to_column(df1, "VALUE")
length(unique(test$VALUE)) ### Result 258180 = Same as number of rows
length(unique(test$VALUE)) == nrow(test) #### Result TRUE

As you see the error message also contains rows that do not even exist in my dataframe.

The command works fine on all other dataframes I have, that have 1:1 the same structure. They only have less rows.

I dont know how to provide the dataframe for you since its so huge. I uploaded it on my university, so you can download the dataframe.

Here is the link (I hope its allowed to post it like that)

https://megastore.uni-augsburg.de/get/pmAS15z6TN/


回答1:


This ought to work. As a comment noted, this is because spread tries to combine rows that are no longer uniquely identified after the gather. rowid_to_column is a simple function that converts the row ids into a column. The reason the numbers are larger than the size of the original dataset is because after gathering you have a data frame with 8003580 rows.

data2 <- data %>%
    gather(Day, value, Day01:Day31) %>%
    tibble::rowid_to_column() %>%
    spread(Station, value)

I ran into memory issues trying to actually do this on my laptop though.



来源:https://stackoverflow.com/questions/47878700/r-wrong-error-message-error-duplicate-identifiers-for-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!