SparkR collect() and head() error for Spark DataFrame: arguments imply differing number of rows

痴心易碎 提交于 2019-12-21 05:34:12

问题


I read a parquet file from HDFS system:

path<-"hdfs://part_2015"
AppDF <- parquetFile(sqlContext, path)
printSchema(AppDF)

root
 |-- app: binary (nullable = true)
 |-- category: binary (nullable = true)
 |-- date: binary (nullable = true)
 |-- user: binary (nullable = true)

class(AppDF)

[1] "DataFrame"
attr(,"package")
[1] "SparkR"

collect(AppDF)
.....error:
arguments imply differing number of rows: 46021, 39175, 62744, 27137

head(AppDF)
.....error:
arguments imply differing number of rows: 36, 30, 48

I've read some thread about this problem. But it's not my case. In fact, I just read a table from the parquet file, and head() or collect() it. My parquet table is like the following:

app   category  date        user
aaa   test      20150101    123
aaa   test      20150102    345
aaa   test      20150103    678
aaaa  testA     20150104    123
aaaa  testA     20150105    234
aaaa  testA     20150106    4345
bbbb  testB     20150101    5435

I'm using spark-1.4.0-bin-hadoop2.6 And I run this on cluster by using

./sparkR --master yarn--client

I've also tried it in local, there is the same problem.

showDF(AppDF)

+-----------+-----------+-----------+-----------+
|        app|   category|       date|       user|
+-----------+-----------+-----------+-----------+
|[B@217fa749|[B@43bfbacd|[B@60810b7a|[B@3818a815|
|[B@5ac31778|[B@3e39f5d5|[B@4f3a92dd| [B@e8013ce|
|[B@7a9440d1|[B@1b2b9836|[B@4b160f29|[B@153d7342|
|[B@7559fcf2|[B@66edb00e|[B@7ec19bec|[B@58e3e3f7|
|[B@598b9ab8|[B@5c5ad3f5|[B@4f11a931|[B@107af885|
|[B@7951ec36|[B@716b0b73|[B@2abce531|[B@576b09e2|
|[B@34560144|[B@7a6d3233|[B@16faf110|[B@34e85d39|
| [B@3406452|[B@787a4528|[B@235282e3|[B@7e0f1732|
|[B@10bc1446|[B@2bd7083f|[B@325e7695|[B@57bb4a08|
|[B@48f98037|[B@7450c04e|[B@61817c8a|[B@7c177a08|
|[B@694ce2dd|[B@36c2512d| [B@f5f7d71|[B@46248d99|
|[B@479dee25|[B@517de3de|[B@1ffb2d9e|[B@236ff079|
|[B@52ac196f|[B@20b9f0d0| [B@f70f879|[B@41c8d7da|
|[B@68d34af3| [B@7ddcd49|[B@72d077a7|[B@545fafd4|
|[B@5610b292|[B@623bbb62|[B@3f8b5150|[B@53877bc7|
|[B@63cf70a8|[B@47ed58c9|[B@2f601903|[B@4e0a2c41|
|[B@7ddf876d|[B@5e3445aa|[B@39c9cc37|[B@6f7e4c84|
|[B@4cd1a74b|[B@583e5453|[B@64124267|[B@6ac5ab84|
|[B@577f9ddf|[B@7b55c859|[B@3cd48a51|[B@25c4eb0a|
|[B@2322f0e5|[B@4af55c68|[B@3285d64a|[B@70b7ae2f|
+-----------+-----------+-----------+-----------+

I've alse tried to read this parquet file in Scala.And do a collect() operation. It seems that everything works well. So it should be an issue specific for SparkR

来源:https://stackoverflow.com/questions/31555667/sparkr-collect-and-head-error-for-spark-dataframe-arguments-imply-differing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!