scala

How to detect Parquet files?

家住魔仙堡 提交于 2021-02-20 02:13:40
问题 I have a script I am writing that will use either plain text or Parquet files. If it is a parquet file it will read it in using a dataframe and a few other things. On my cluster I am working on the first solution was the easiest and was if the extension of a file was .parquet if (parquetD(1) == "parquet") { if (args.length != 2) { println(usage2) System.exit(1) println(args) } } it would read it in with the dataframe. The problem is I have a bunch of files some people have created with no

How to detect Parquet files?

拥有回忆 提交于 2021-02-20 02:00:39
问题 I have a script I am writing that will use either plain text or Parquet files. If it is a parquet file it will read it in using a dataframe and a few other things. On my cluster I am working on the first solution was the easiest and was if the extension of a file was .parquet if (parquetD(1) == "parquet") { if (args.length != 2) { println(usage2) System.exit(1) println(args) } } it would read it in with the dataframe. The problem is I have a bunch of files some people have created with no

How to skip first and last line from a dat file and make it to dataframe using scala in databricks

老子叫甜甜 提交于 2021-02-19 08:59:30
问题 H|*|D|*|PA|*|BJ|*|S|*|2019.05.27 08:54:24|##| H|*|AP_ATTR_ID|*|AP_ID|*|OPER_ID|*|ATTR_ID|*|ATTR_GROUP|*|LST_UPD_USR|*|LST_UPD_TSTMP|##| 779045|*|Sar|*|SUPERVISOR HIERARCHY|*|Supervisor|*|2|*|128|*|2019.05.14 16:48:16|##| 779048|*|KK|*|SUPERVISOR HIERARCHY|*|Supervisor|*|2|*|116|*|2019.05.14 16:59:02|##| 779054|*|Nisha - A|*|EXACT|*|CustomColumnRow120|*|2|*|1165|*|2019.05.15 12:11:48|##| T|*||*|2019.05.27 08:54:28|##| file name is PA.dat. I need to skip first line and also last line of the

Error with spark Row.fromSeq for a text file

久未见 提交于 2021-02-19 08:25:07
问题 import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ import org.apache.spark._ import org.apache.spark.sql.types._ import org.apache.spark.sql._ object fixedLength { def main(args:Array[String]) { def getRow(x : String) : Row={ val columnArray = new Array[String](4) columnArray(0)=x.substring(0,3) columnArray(1)=x.substring(3,13) columnArray(2)=x.substring(13,18) columnArray(3)=x.substring(18,22) Row.fromSeq(columnArray) }

Scala tail recursive method has an divide and remainder error

旧街凉风 提交于 2021-02-19 07:47:28
问题 I'm currently computing the binomial coefficient of two natural numbers by write a tail recursion in Scala. But my code has something wrong with the dividing numbers, integer division by k like I did as that will give you a non-zero remainder and hence introduce rounding errors. So could anyone help me figure it out, how to fix it ? def binom(n: Int, k: Int): Int = { require(0 <= k && k <= n) def binomtail(n: Int, k: Int, ac: Int): Int = { if (n == k || k == 0) ac else binomtail(n - 1, k - 1,

How to register the Java SPark UDF in spark shell?

三世轮回 提交于 2021-02-19 07:35:34
问题 Below is my java udf code, package com.udf; import org.apache.spark.sql.api.java.UDF1; public class SparkUDF implements UDF1<String, String> { @Override public String call(String arg) throws Exception { if (validateString(arg)) return arg; return "INVALID"; } public static boolean validateString(String arg) { if (arg == null | arg.length() != 11) return false; else return true; } } I am building the Jar with this class as SparkUdf-1.0-SNAPSHOT.jar I am having a table name as sample in hive

How to register the Java SPark UDF in spark shell?

烈酒焚心 提交于 2021-02-19 07:35:10
问题 Below is my java udf code, package com.udf; import org.apache.spark.sql.api.java.UDF1; public class SparkUDF implements UDF1<String, String> { @Override public String call(String arg) throws Exception { if (validateString(arg)) return arg; return "INVALID"; } public static boolean validateString(String arg) { if (arg == null | arg.length() != 11) return false; else return true; } } I am building the Jar with this class as SparkUdf-1.0-SNAPSHOT.jar I am having a table name as sample in hive

Set current project to default-6c6f02 (in build file:/home/user_name/Videos/ [duplicate]

可紊 提交于 2021-02-19 07:21:37
问题 This question already has an answer here : Why is sbt current project name “default” in 0.10? (1 answer) Closed 6 years ago . what does it mean when running sbt command in command line in scala . Set current project to default-6c6f02 (in build file:/home/user_name/Videos/ what should i set after this statement? 回答1: This happens if you call sbt command in the folder where you don't have built. sbt or project/Build.scala , as i understand in your case it's /home/user_name/Videos/ . And because

How to set type parameter bound in scala to make generic function for numerics?

非 Y 不嫁゛ 提交于 2021-02-19 07:12:58
问题 I want to make a sum function that works with all Numeric types. This works: object session { def mapReduce[A](f: A => A, combine: (A, A) => A, zero: A, inc: A) (a: A,b: A) (implicit num:Numeric[A]): A = { def loop(acc: A, a: A) = if (num.gt(a, b)) acc else combine(f(a), mapReduce(f, combine, zero, inc)(num.plus(a, inc), b)) loop(zero, a) } def sum(f: Int => Int) (a: Int, b: Int) : Int = { mapReduce(f, (x: Int, y: Int) => x + y, 0, 1)(a, b)} sum(x => x)(3, 4) //> res0: Int = 7 def product(f:

enable macro paradise to expand macro annotations

孤人 提交于 2021-02-19 06:42:38
问题 I wanted to check some examples with annotations in macro paradise and I am getting the error, as is specified in: this example I have related the projects, the other scala macros (not with annotations) are working very well. I have included the library paradise_2.11.6-2.1.0-M5 (in both projects also :( ). I think, I do not get what means with *to enable* . !? bthw, I am using Scala IDE in Eclipse. 回答1: By enable, I meant adding it as a compiler plugin, as e.g. in https://github.com