fill=TRUE will fail when different number of column occurr after 5 rows in read.table? [duplicate]

时光毁灭记忆、已成空白 提交于 2019-12-07 13:13:24

问题


Let's say we have a file name test.txt which contains unknown number of columns:

1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5
1   2   3   4   5   6   7   8
1   2   3   4   5
1   2   3   4   5   6
1   2   3   4   5   6
1   2   3   4   5   6

fill=T fails when line 8 has more than 5 columns:

read.table('test.txt', header=F, sep='\t', fill=T)

results:

   V1 V2 V3 V4 V5
1   1  2  3  4  5
2   1  2  3  4  5
3   1  2  3  4  5
4   1  2  3  4  5
5   1  2  3  4  5
6   1  2  3  4  5
7   1  2  3  4  5
8   1  2  3  4  5
9   6  7  8 NA NA
10  1  2  3  4  5
11  1  2  3  4  5
12  6 NA NA NA NA
13  1  2  3  4  5
14  6 NA NA NA NA
15  1  2  3  4  5
16  6 NA NA NA NA

But with skip=3, everything works fine

read.table('test.txt', header=F, sep='\t', fill=T, skip=3)

We got what we expected:

  V1 V2 V3 V4 V5 V6 V7 V8
1  1  2  3  4  5 NA NA NA
2  1  2  3  4  5 NA NA NA
3  1  2  3  4  5 NA NA NA
4  1  2  3  4  5 NA NA NA
5  1  2  3  4  5  6  7  8
6  1  2  3  4  5 NA NA NA
7  1  2  3  4  5  6 NA NA
8  1  2  3  4  5  6 NA NA
9  1  2  3  4  5  6 NA NA

Why would this happen? Was it because fill=T only check the first 5 rows? Is there any way to work around this?


回答1:


use col.names = paste0("V",seq_len(N)) within read.table where N is the maximum number of columns.




回答2:


I've found the answers right in the Examples of read.table.

ncol <- max(count.fields('test.txt', sep = "\t"))
read.table('test.txt', header=F, sep='\t', fill=T, col.names=paste0('V', seq_len(ncol)))

It did because of fill=T only checks the first five rows. The solution is to specify col.names.



来源:https://stackoverflow.com/questions/32066049/fill-true-will-fail-when-different-number-of-column-occurr-after-5-rows-in-read

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!