发表新帖

发表新帖

Spark Execution of TB file in memory

后端未结

关注

 2  1106

死守一世寂寞 2021-01-01 01:25

Let us assume i have one Tb data file. Each Node memory in ten node cluster is 3GB.

I want to process the file using spark. But how does the One TeraByte fits in mem

2条回答

南方客 (楼主)

2021-01-01 02:27

By default storage level is MEMORY_ONLY, which will try to fit the data in the memory. It will fail with out of memory issues if the data cannot be fit into memory.

It supports other storage levels such as MEMORY_AND_DISK, DISK_ONLY etc. You can go through Spark documentation to understand different storage levels. You can invoke persist function on RDD to use different storage level.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题