apache-spark-sql

Spark Dataset : data transformation

旧城冷巷雨未停 提交于 2021-02-07 10:19:06
问题 I have a Spark Dataset of the format - +--------------+--------+-----+ |name |type |cost | +--------------+--------+-----+ |AAAAAAAAAAAAAA|XXXXX |0.24| |AAAAAAAAAAAAAA|YYYYY |1.14| |BBBBBBBBBBBBBB|XXXXX |0.78| |BBBBBBBBBBBBBB|YYYYY |2.67| |BBBBBBBBBBBBBB|ZZZZZ |0.15| |CCCCCCCCCCCCCC|XXXXX |1.86| |CCCCCCCCCCCCCC|YYYYY |1.50| |CCCCCCCCCCCCCC|ZZZZZ |1.00| +--------------+--------+----+ I want to transform this into an object of type - public class CostPerName { private String name; private Map

Spark Dataset : data transformation

泪湿孤枕 提交于 2021-02-07 10:18:18
问题 I have a Spark Dataset of the format - +--------------+--------+-----+ |name |type |cost | +--------------+--------+-----+ |AAAAAAAAAAAAAA|XXXXX |0.24| |AAAAAAAAAAAAAA|YYYYY |1.14| |BBBBBBBBBBBBBB|XXXXX |0.78| |BBBBBBBBBBBBBB|YYYYY |2.67| |BBBBBBBBBBBBBB|ZZZZZ |0.15| |CCCCCCCCCCCCCC|XXXXX |1.86| |CCCCCCCCCCCCCC|YYYYY |1.50| |CCCCCCCCCCCCCC|ZZZZZ |1.00| +--------------+--------+----+ I want to transform this into an object of type - public class CostPerName { private String name; private Map

Connecting oracle database using spark with Kerberos authentication?

本小妞迷上赌 提交于 2021-02-07 09:01:15
问题 My jdbc is connected to oracle database using krb5loginmodule without any issue, by giving keytab file or ticket cache.But, due to performance, I want to connect my oracle database using Spark. If I use simple username and password, I am able to connect my spark application to Oracle database using below snippet: Dataset<Row> empDF = sparkSession.read().format("jdbc") .option("url", "jdbc:oracle:thin:hr/1234@//127.0.0.1:1522/orcl") .option("dbtable", "hr.employees") //.option("user", "hr") //

Connecting oracle database using spark with Kerberos authentication?

独自空忆成欢 提交于 2021-02-07 08:57:15
问题 My jdbc is connected to oracle database using krb5loginmodule without any issue, by giving keytab file or ticket cache.But, due to performance, I want to connect my oracle database using Spark. If I use simple username and password, I am able to connect my spark application to Oracle database using below snippet: Dataset<Row> empDF = sparkSession.read().format("jdbc") .option("url", "jdbc:oracle:thin:hr/1234@//127.0.0.1:1522/orcl") .option("dbtable", "hr.employees") //.option("user", "hr") //

Delete functionality with spark sql dataframe

泪湿孤枕 提交于 2021-02-07 08:47:36
问题 I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", "user" -> "user" , "password" -> "xxxxxx" , "table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load() To delete the data, I am writing direct sql instead of using dataframes delete from employee where emp_id > 1000 The question is , is there

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

浪尽此生 提交于 2021-02-07 08:44:35
问题 I am trying to fetch rows from a lookup table (3 rows and 3 columns) and iterate row by row and pass values in each row to a SPARK SQL as parameters. DB | TBL | COL ---------------- db | txn | ID db | sales | ID db | fee | ID I tried this in spark shell for one row, it worked. But I am finding it difficult to iterate over rows. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val db_name:String = "db" val tbl_name:String = "transaction" val unique_col:String = "transaction_number" val

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

此生再无相见时 提交于 2021-02-07 08:43:23
问题 I am trying to fetch rows from a lookup table (3 rows and 3 columns) and iterate row by row and pass values in each row to a SPARK SQL as parameters. DB | TBL | COL ---------------- db | txn | ID db | sales | ID db | fee | ID I tried this in spark shell for one row, it worked. But I am finding it difficult to iterate over rows. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val db_name:String = "db" val tbl_name:String = "transaction" val unique_col:String = "transaction_number" val

How to delete rows in a table created from a Spark dataframe?

不羁岁月 提交于 2021-02-07 07:01:47
问题 Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error: pyspark.sql.utils.ParseException: u"\nmissing 'FROM' at 'a'(line 2, pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a \n-----------------------^^^\n" These is the script that I'm using: sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate() adsquare = sq.read.csv(f, schema

How to convert map to dataframe?

冷暖自知 提交于 2021-02-07 06:52:43
问题 m is a map as following: scala> m res119: scala.collection.mutable.Map[Any,Any] = Map(A-> 0.11164610291904906, B-> 0.11856755943424617, C -> 0.1023171832681312) I want to get: name score A 0.11164610291904906 B 0.11856755943424617 C 0.1023171832681312 How to get the final dataframe? 回答1: First covert it to a Seq , then you can use the toDF() function. val spark = SparkSession.builder.getOrCreate() import spark.implicits._ val m = Map("A"-> 0.11164610291904906, "B"-> 0.11856755943424617, "C" -

Spark & Scala: saveAsTextFile() exception

橙三吉。 提交于 2021-02-07 03:31:45
问题 I'm new to Spark & Scala and I got exception after calling saveAsTextFile(). Hope someone can help... Here is my input.txt: Hello World, I'm a programmer Hello World, I'm a programmer This is the info after running "spark-shell" on CMD: C:\Users\Nhan Tran>spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://DLap:4040 Spark context available as 'sc' (master = local[