apache-spark | 易学教程

PySpark: Getting output layer neuron values for Spark ML Multilayer Perceptron Classifier

阅读更多关于 PySpark: Getting output layer neuron values for Spark ML Multilayer Perceptron Classifier

问题 I am doing binary classification using Spark ML Multilayer Perceptron Classifier. mlp = MultilayerPerceptronClassifier(labelCol="evt", featuresCol="features", layers=[inputneurons,(inputneurons*2)+1,2]) The output layer has of two neurons as it is a binary classification problem. Now I would like get the values two neurons for each of the rows in the test set instead of just getting the prediction column containing either 0 or 1. I could not find anything to get that in the API document. 回答1:

Connecting oracle database using spark with Kerberos authentication?

阅读更多关于 Connecting oracle database using spark with Kerberos authentication?

问题 My jdbc is connected to oracle database using krb5loginmodule without any issue, by giving keytab file or ticket cache.But, due to performance, I want to connect my oracle database using Spark. If I use simple username and password, I am able to connect my spark application to Oracle database using below snippet: Dataset<Row> empDF = sparkSession.read().format("jdbc") .option("url", "jdbc:oracle:thin:hr/1234@//127.0.0.1:1522/orcl") .option("dbtable", "hr.employees") //.option("user", "hr") //

Connecting oracle database using spark with Kerberos authentication?

阅读更多关于 Connecting oracle database using spark with Kerberos authentication?

Delete functionality with spark sql dataframe

阅读更多关于 Delete functionality with spark sql dataframe

问题 I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", "user" -> "user" , "password" -> "xxxxxx" , "table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load() To delete the data, I am writing direct sql instead of using dataframes delete from employee where emp_id > 1000 The question is , is there

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

阅读更多关于 Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

问题 I am trying to fetch rows from a lookup table (3 rows and 3 columns) and iterate row by row and pass values in each row to a SPARK SQL as parameters. DB | TBL | COL ---------------- db | txn | ID db | sales | ID db | fee | ID I tried this in spark shell for one row, it worked. But I am finding it difficult to iterate over rows. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val db_name:String = "db" val tbl_name:String = "transaction" val unique_col:String = "transaction_number" val

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

阅读更多关于 Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

Spark CSV 2.1 File Names

阅读更多关于 Spark CSV 2.1 File Names

问题 i'm trying to save DataFrame into CSV using the new spark 2.1 csv option df.select(myColumns: _*).write .mode(SaveMode.Overwrite) .option("header", "true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .csv(absolutePath) everything works fine and i don't mind haivng the part-000XX prefix but now seems like some UUID was added as a suffix i.e part-00032-10309cf5-a373-4233-8b28-9e10ed279d2b.csv.gz ==> part-00032.csv.gz Anyone knows how i can remove this file ext and stay only with

Spark CSV 2.1 File Names

阅读更多关于 Spark CSV 2.1 File Names

Mocking SparkSession for unit testing

阅读更多关于 Mocking SparkSession for unit testing

问题 I have a method in my spark application that loads the data from a MySQL database. the method looks something like this. trait DataManager { val session: SparkSession def loadFromDatabase(input: Input): DataFrame = { session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0", input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties) } } The method does nothing else other than executing jdbc method and loads data from the database. How can I test this method? The

Mocking SparkSession for unit testing

阅读更多关于 Mocking SparkSession for unit testing