apache-spark-sql | 易学教程

Spark Dataset : data transformation

阅读更多关于 Spark Dataset : data transformation

问题 I have a Spark Dataset of the format - +--------------+--------+-----+ |name |type |cost | +--------------+--------+-----+ |AAAAAAAAAAAAAA|XXXXX |0.24| |AAAAAAAAAAAAAA|YYYYY |1.14| |BBBBBBBBBBBBBB|XXXXX |0.78| |BBBBBBBBBBBBBB|YYYYY |2.67| |BBBBBBBBBBBBBB|ZZZZZ |0.15| |CCCCCCCCCCCCCC|XXXXX |1.86| |CCCCCCCCCCCCCC|YYYYY |1.50| |CCCCCCCCCCCCCC|ZZZZZ |1.00| +--------------+--------+----+ I want to transform this into an object of type - public class CostPerName { private String name; private Map

Spark Dataset : data transformation

阅读更多关于 Spark Dataset : data transformation

Connecting oracle database using spark with Kerberos authentication?

阅读更多关于 Connecting oracle database using spark with Kerberos authentication?

问题 My jdbc is connected to oracle database using krb5loginmodule without any issue, by giving keytab file or ticket cache.But, due to performance, I want to connect my oracle database using Spark. If I use simple username and password, I am able to connect my spark application to Oracle database using below snippet: Dataset<Row> empDF = sparkSession.read().format("jdbc") .option("url", "jdbc:oracle:thin:hr/1234@//127.0.0.1:1522/orcl") .option("dbtable", "hr.employees") //.option("user", "hr") //

Connecting oracle database using spark with Kerberos authentication?

阅读更多关于 Connecting oracle database using spark with Kerberos authentication?

Delete functionality with spark sql dataframe

阅读更多关于 Delete functionality with spark sql dataframe

问题 I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", "user" -> "user" , "password" -> "xxxxxx" , "table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load() To delete the data, I am writing direct sql instead of using dataframes delete from employee where emp_id > 1000 The question is , is there

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

阅读更多关于 Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

问题 I am trying to fetch rows from a lookup table (3 rows and 3 columns) and iterate row by row and pass values in each row to a SPARK SQL as parameters. DB | TBL | COL ---------------- db | txn | ID db | sales | ID db | fee | ID I tried this in spark shell for one row, it worked. But I am finding it difficult to iterate over rows. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val db_name:String = "db" val tbl_name:String = "transaction" val unique_col:String = "transaction_number" val

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

阅读更多关于 Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

How to delete rows in a table created from a Spark dataframe?

阅读更多关于 How to delete rows in a table created from a Spark dataframe?

问题 Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error: pyspark.sql.utils.ParseException: u"\nmissing 'FROM' at 'a'(line 2, pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a \n-----------------------^^^\n" These is the script that I'm using: sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate() adsquare = sq.read.csv(f, schema

How to convert map to dataframe?

阅读更多关于 How to convert map to dataframe?

问题 m is a map as following: scala> m res119: scala.collection.mutable.Map[Any,Any] = Map(A-> 0.11164610291904906, B-> 0.11856755943424617, C -> 0.1023171832681312) I want to get: name score A 0.11164610291904906 B 0.11856755943424617 C 0.1023171832681312 How to get the final dataframe? 回答1: First covert it to a Seq , then you can use the toDF() function. val spark = SparkSession.builder.getOrCreate() import spark.implicits._ val m = Map("A"-> 0.11164610291904906, "B"-> 0.11856755943424617, "C" -

Spark & Scala: saveAsTextFile() exception

阅读更多关于 Spark & Scala: saveAsTextFile() exception

问题 I'm new to Spark & Scala and I got exception after calling saveAsTextFile(). Hope someone can help... Here is my input.txt: Hello World, I'm a programmer Hello World, I'm a programmer This is the info after running "spark-shell" on CMD: C:\Users\Nhan Tran>spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://DLap:4040 Spark context available as 'sc' (master = local[