HDFS

Can I have different block placement policies in HDFS?

偶尔善良 提交于 2020-07-20 04:29:24
问题 I.e. one cluster that has multiple apps and each app has different requirements in terms of where copies are located - can I set it up so to support these multiple apps? 回答1: Yes, it is possible to do so. CAUTION: Proceed at your own risk. Writing block placement strategy is extremely complicated and risky. It's seems a code smell that your apps need to determine how replicas are placed. Think about if you really really need to write block placement strategies. Having warned you, proceed if

Load a keytab from HDFS

情到浓时终转凉″ 提交于 2020-07-20 03:43:08
问题 I want to use Oozie with a Java Action which needs to use Kerberos. I have my keytab in HDFS. How could I say that the file is in HDFS? Configuration conf = new Configuration(); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab(kerberosPrincipal, kerberosKeytab); I have tried with a path like hdfs://xxxx:8020/tmp/myKeytab.keytab and I set conf.set("fs.defaultFS", "hdfs://server:8020"); as well but it

Load a keytab from HDFS

北战南征 提交于 2020-07-20 03:43:06
问题 I want to use Oozie with a Java Action which needs to use Kerberos. I have my keytab in HDFS. How could I say that the file is in HDFS? Configuration conf = new Configuration(); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab(kerberosPrincipal, kerberosKeytab); I have tried with a path like hdfs://xxxx:8020/tmp/myKeytab.keytab and I set conf.set("fs.defaultFS", "hdfs://server:8020"); as well but it

Kafka to hdfs sink Missing required configuration “confluent.topic.bootstrap.servers” which has no default value

纵然是瞬间 提交于 2020-06-23 16:45:35
问题 Status My HDFS was installed via ambari, HDP. I'm Currently trying to load kafka topics into HDFS sink. Kafka and HDFS was installed in the same machine x.x.x.x. I didn't change much stuff from the default settings, except some port that according to my needs. Here is how i execute kafka: /usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh /etc/kafka/connect-standalone.properties /etc/kafka-connect-hdfs/quickstart-hdfs.properties Inside connect-standalone.properties bootstrap.servers=x.x.x.x

Kafka to hdfs sink Missing required configuration “confluent.topic.bootstrap.servers” which has no default value

只愿长相守 提交于 2020-06-23 16:45:06
问题 Status My HDFS was installed via ambari, HDP. I'm Currently trying to load kafka topics into HDFS sink. Kafka and HDFS was installed in the same machine x.x.x.x. I didn't change much stuff from the default settings, except some port that according to my needs. Here is how i execute kafka: /usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh /etc/kafka/connect-standalone.properties /etc/kafka-connect-hdfs/quickstart-hdfs.properties Inside connect-standalone.properties bootstrap.servers=x.x.x.x

How to use webhdfs rest api to copy a file and store it in the another directory?

廉价感情. 提交于 2020-06-18 12:36:30
问题 How to use httpfs to copy a file in to another directory in hdfs. for example, I can use (http://1.12.134.1234:2020/webhdfs/v1/user/xx/x?op=create&user=hello) to create a file. I did not find in https://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#File_and_Directory_Operations 回答1: There is no webhdfs 'copy'command. Use hadoop distcp command (or retrieve contents, create new file, write the contents if the file is expected to be a smallish one). Copy a file

HDFS space consumed: “hdfs dfs -du /” vs “hdfs dfsadmin -report”

本秂侑毒 提交于 2020-06-12 04:59:08
问题 Which tool is the right one to measure HDFS space consumed? When I sum up the output of "hdfs dfs -du /" I always get less amount of space consumed compared to "hdfs dfsadmin -report" ("DFS Used" line). Is there data that du does not take into account? 回答1: Hadoop file systems provides a relabel storage, by putting a copy of data to several nodes. The number of copies is replication factor, usually it is greate then one. Command hdfs dfs -du / shows space consume your data without

While writing to hdfs path getting error java.io.IOException: Failed to rename

夙愿已清 提交于 2020-06-04 04:40:47
问题 I am using spark-sql-2.4.1v which is using hadoop-2.6.5.jar version . I need to save my data first on hdfs and move to cassandra later. Hence I am trying to save the data on hdfs as below: String hdfsPath = "/user/order_items/"; cleanedDs.createTempViewOrTable("source_tab"); givenItemList.parallelStream().forEach( item -> { String query = "select $item as itemCol , avg($item) as mean groupBy year"; Dataset<Row> resultDs = sparkSession.sql(query); saveDsToHdfs(hdfsPath, resultDs ); }); public

Sqoop import postgres to S3 failing

只谈情不闲聊 提交于 2020-06-01 07:22:05
问题 I'm currently importing postgres data to hdfs. I'm planning to move the storage from hdfs to S3. When i'm trying to provide S3 location, the sqoop job is failing. I'm running it on EMR(emr-5.27.0) cluster and I've read/write access to that s3 bucket from all nodes in the cluster. sqoop import \ --connect "jdbc:postgresql://<machine_ip>:<port>/<database>?sslfactory=org.postgresql.ssl.NonValidatingFactory&ssl=true" \ --username <username> \ --password-file <password_file_path> \ --table

Sqoop import postgres to S3 failing

有些话、适合烂在心里 提交于 2020-06-01 07:21:07
问题 I'm currently importing postgres data to hdfs. I'm planning to move the storage from hdfs to S3. When i'm trying to provide S3 location, the sqoop job is failing. I'm running it on EMR(emr-5.27.0) cluster and I've read/write access to that s3 bucket from all nodes in the cluster. sqoop import \ --connect "jdbc:postgresql://<machine_ip>:<port>/<database>?sslfactory=org.postgresql.ssl.NonValidatingFactory&ssl=true" \ --username <username> \ --password-file <password_file_path> \ --table