How to set PYTHONHASHSEED on AWS EMR
Is there any way to set an environment variable on all nodes of an EMR cluster? I am getting an error when trying to use reduceByKey() in Python3 PySpark, and getting an error regarding the hash seed. I can see this is a known error, and that the environment varialbe PYTHONHASHSEED needs to be set to the same value on all nodes of the cluster, but I haven't had any luck with it. I have tried adding a variable to spark-env through the cluster configuration: [ { "Classification": "spark-env", "Configurations": [ { "Classification": "export", "Properties": { "PYSPARK_PYTHON": "/usr/bin/python3",