Hadoop | 易学教程

Select if table exists in Apache Hive

阅读更多关于 Select if table exists in Apache Hive

问题 I have a hive query which is of the format, select . . . from table1 left join (select . . . from table2) on (some_condition) The table2 might not be present depending on the environment. So I would like to join if only table2 is present otherwise just ignore the subquery. The below query returns the table_name if it exists, show tables in {DB_NAME} like '{table_name}' But I dont know how I can integrate this into my query to select only if it exists. Is there a way in hive query to check if

Select if table exists in Apache Hive

阅读更多关于 Select if table exists in Apache Hive

hadoop hdfs points to file:/// not hdfs://

阅读更多关于 hadoop hdfs points to file:/// not hdfs://

问题 So I installed Hadoop via Cloudera Manager cdh3u5 on CentOS 5. When I run cmd hadoop fs -ls / I expected to see the contents of hdfs://localhost.localdomain:8020/ However, it had returned the contents of file:/// Now, this goes without saying that I can access my hdfs:// through hadoop fs -ls hdfs://localhost.localdomain:8020/ But when it came to installing other applications such as Accumulo, accumulo would automatically detect Hadoop Filesystem in file:/// Question is, has anyone ran into

Running shell script from oozie through Hue

阅读更多关于 Running shell script from oozie through Hue

问题 I am invoking a bash shell script using oozie editor in Hue. I used the shell action in the workflow and tried below different options in shell command: Uploaded the shell script using 'choose a file' Gave local directory path where shell script is present Gave HDFS path where shell script is present But all these options gave following error: Cannot run program "sec_test_oozie.sh" (in directory "/data/hadoop/yarn/local/usercache/user/appcache/application_1399542362142_0086/container

Simple User/Password authentication for HiveServer2 (without Kerberos/LDAP)

阅读更多关于 Simple User/Password authentication for HiveServer2 (without Kerberos/LDAP)

问题 How to provide a simple propertyfile or database user/password authentication for HiveServer2? I already found this presentation about this, but it's not in English :(. On the Cloudera reference manual they talk about the hive.server2.authentication property. It supports CUSTOM implementations of the interface hive.server2.custom.authentication . How to implement that? 回答1: In essence, you have to provide a java application that can perform your authentication. Maybe you're authing to a mysql

Simple User/Password authentication for HiveServer2 (without Kerberos/LDAP)

阅读更多关于 Simple User/Password authentication for HiveServer2 (without Kerberos/LDAP)

Spark + Hive : Number of partitions scanned exceeds limit (=4000)

阅读更多关于 Spark + Hive : Number of partitions scanned exceeds limit (=4000)

问题 We upgraded our Hadoop Platform (Spark; 2.3.0, Hive: 3.1), and I'm facing this exception when reading some Hive tables in Spark : "Number of partitions scanned on table 'my_table' exceeds limit (=4000)". Tables we are working on : table1 : external table with a total of ~12300 partitions, partitioned by(col1: String, date1: String) , (ORC compressed ZLIB) table2 : external table with a total of 4585 partitions, partitioned by(col21: String, date2: Date, col22: String) (ORC uncompressed) [A]

How to read and write Parquet files efficiently?

阅读更多关于 How to read and write Parquet files efficiently?

问题 I am working on a utility which reads multiple parquet files at a time and writing them into one single output file. the implementation is very straightforward. This utility reads parquet files from the directory, reads Group from all the file and put them into a list .Then uses ParquetWrite to write all these Groups into a single file. After reading 600mb it throws Out of memory error for Java heap space. It also takes 15-20 minutes to read and write 500mb of data. Is there a way to make

How to convert an Iterable to an RDD

阅读更多关于 How to convert an Iterable to an RDD

问题 To be more specific, how can i convert a scala.Iterable to a org.apache.spark.rdd.RDD ? I have an RDD of (String, Iterable[(String, Integer)]) and i want this to be converted into an RDD of (String, RDD[String, Integer]) , so that i can apply a reduceByKey function to the internal RDD . e.g i have an RDD where key is 2-lettered prefix of a person's name and the value is List of pairs of Person name and hours that they spent in an event my RDD is : ("To", List(("Tom",50),("Tod","30"),("Tom",70

What determines the number of mappers/reducers to use given a specified set of data [closed]

阅读更多关于 What determines the number of mappers/reducers to use given a specified set of data [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 8 years ago . What are the factors which decide the number of mappers and reducers to use for a given set of data to achieve optimal performance? I