Spark Execution of TB file in memory

后端 未结 2 1106
死守一世寂寞
死守一世寂寞 2021-01-01 01:25

Let us assume i have one Tb data file. Each Node memory in ten node cluster is 3GB.

I want to process the file using spark. But how does the One TeraByte fits in mem

2条回答
  •  南方客
    南方客 (楼主)
    2021-01-01 02:27

    By default storage level is MEMORY_ONLY, which will try to fit the data in the memory. It will fail with out of memory issues if the data cannot be fit into memory.

    It supports other storage levels such as MEMORY_AND_DISK, DISK_ONLY etc. You can go through Spark documentation to understand different storage levels. You can invoke persist function on RDD to use different storage level.

提交回复
热议问题