How to set Parquet file encoding in Spark
Parquet documentation describe few different encodings here Is it changes somehow inside file during read/write, or I can set it? Nothing about it in Spark documentation. Only found slides from speach by Ryan Blue from Netflix team. He sets parquet configurations to sqlContext sqlContext.setConf("parquet.filter.dictionary.enabled", "true") Looks like it's not about plain dictionary encoding in Parquet files. So I found an answer to my question on twitter engineering blog . Parquet has an automatic dictionary encoding enabled when a number of unique values < 10^5. Here is a post announcing