apache-spark-1.2

Run Spark with build-in Hive and Configuring a remote PostgreSQL database for the Hive Metastore

落花浮王杯 提交于 2019-12-07 16:04:02
问题 I am new to Spark and Hive. I am running Spark v1.0.1 with build-in Hive (Spark install with SPARK_HIVE=true sbt/sbt assembly/assembly) I also config Hive to store Metastore in PostgreSQL database as instruction: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html I could config Hive (not build-in with Spark) to use PostgreSQL but I don't know how to get it work with Hive in Spark In the instruction, I see that I need to put

Run Spark with build-in Hive and Configuring a remote PostgreSQL database for the Hive Metastore

你。 提交于 2019-12-05 19:26:14
I am new to Spark and Hive. I am running Spark v1.0.1 with build-in Hive (Spark install with SPARK_HIVE=true sbt/sbt assembly/assembly) I also config Hive to store Metastore in PostgreSQL database as instruction: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html I could config Hive (not build-in with Spark) to use PostgreSQL but I don't know how to get it work with Hive in Spark In the instruction, I see that I need to put or link postgresql-jdbc.jar to hive/lib so that Hive could include the postgresql-jdbc when it run $

How to encode categorical features in Apache Spark

妖精的绣舞 提交于 2019-11-30 07:04:02
问题 I have a set of data based on which I want to create a classification model. Each row has the following form: user1,class1,product1 user1,class1,product2 user1,class1,product5 user2,class1,product2 user2,class1,product5 user3,class2,product1 There are about 1M users, 2 classes, and 1M products. What I would like to do next is create the sparse vectors (something already supported by MLlib) BUT in order to apply that function I have to create the dense vectors (with the 0s), first. In other

How to encode categorical features in Apache Spark

点点圈 提交于 2019-11-29 00:24:23
I have a set of data based on which I want to create a classification model. Each row has the following form: user1,class1,product1 user1,class1,product2 user1,class1,product5 user2,class1,product2 user2,class1,product5 user3,class2,product1 There are about 1M users, 2 classes, and 1M products. What I would like to do next is create the sparse vectors (something already supported by MLlib) BUT in order to apply that function I have to create the dense vectors (with the 0s), first. In other words, I have to binarize my data. What's the easiest (or most elegant) way of doing that? Given that I