How to load java properties file and use in Spark?

北战南征 提交于 2019-11-27 12:16:15

问题


I want to store the Spark arguments such as input file, output file into a Java property files and pass that file into Spark Driver. I'm using spark-submit for submitting the job but couldn't find a parameter to pass the properties file. Have you got any suggestions?


回答1:


here i found one solution:

props file : (mypropsfile.conf) // note: prefix your key with "spark." else props will be ignored.

spark.myapp.input /input/path
spark.myapp.output /output/path

launch

$SPARK_HOME/bin/spark-submit --properties-file  mypropsfile.conf

how to call in code :( inside code)

sc.getConf.get("spark.driver.host")  // localhost
sc.getConf.get("spark.myapp.input")       // /input/path
sc.getConf.get("spark.myapp.output")      // /output/path



回答2:


The previous answer's approach has the restriction that is every property should start with spark in property file-

e.g.

spark.myapp.input
spark.myapp.output

If suppose you have a property which doesn't start with spark:

job.property:

app.name=xyz

$SPARK_HOME/bin/spark-submit --properties-file  job.property

Spark will ignore all properties doesn't have prefix spark. with message:

Warning: Ignoring non-spark config property: app.name=test

How I manage property file in application's driver and executor:

${SPARK_HOME}/bin/spark-submit --files job.properties

Java code to access the cache file (job.properties):

import java.util.Properties;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkFiles;
import java.io.InputStream;
import java.io.FileInputStream;

//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get("job.properties")
Configuration hdfsConf = new Configuration();
FileSystem fs = FileSystem.get(hdfsConf);

//THe file name contains absolute path of file
FSDataInputStream is = fs.open(new Path(fileName));

// Or use java IO
InputStream is = new FileInputStream("/res/example.xls");

Properties prop = new Properties();
//load properties
prop.load(is)
//retrieve properties
prop.getProperty("app.name");

If you have environment specific properties (dev/test/prod) then supply APP_ENV custom java environment variable in spark-submit:

${SPARK_HOME}/bin/spark-submit --conf \
"spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" \
--properties-file  dev.property

Replace your driver or executor code:

//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties")


来源:https://stackoverflow.com/questions/31115881/how-to-load-java-properties-file-and-use-in-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!