HDFS | 易学教程

Can I have different block placement policies in HDFS?

阅读更多关于 Can I have different block placement policies in HDFS?

问题 I.e. one cluster that has multiple apps and each app has different requirements in terms of where copies are located - can I set it up so to support these multiple apps? 回答1: Yes, it is possible to do so. CAUTION: Proceed at your own risk. Writing block placement strategy is extremely complicated and risky. It's seems a code smell that your apps need to determine how replicas are placed. Think about if you really really need to write block placement strategies. Having warned you, proceed if

Load a keytab from HDFS

阅读更多关于 Load a keytab from HDFS

问题 I want to use Oozie with a Java Action which needs to use Kerberos. I have my keytab in HDFS. How could I say that the file is in HDFS? Configuration conf = new Configuration(); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab(kerberosPrincipal, kerberosKeytab); I have tried with a path like hdfs://xxxx:8020/tmp/myKeytab.keytab and I set conf.set("fs.defaultFS", "hdfs://server:8020"); as well but it

Load a keytab from HDFS

阅读更多关于 Load a keytab from HDFS

Kafka to hdfs sink Missing required configuration “confluent.topic.bootstrap.servers” which has no default value

阅读更多关于 Kafka to hdfs sink Missing required configuration “confluent.topic.bootstrap.servers” which has no default value

问题 Status My HDFS was installed via ambari, HDP. I'm Currently trying to load kafka topics into HDFS sink. Kafka and HDFS was installed in the same machine x.x.x.x. I didn't change much stuff from the default settings, except some port that according to my needs. Here is how i execute kafka: /usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh /etc/kafka/connect-standalone.properties /etc/kafka-connect-hdfs/quickstart-hdfs.properties Inside connect-standalone.properties bootstrap.servers=x.x.x.x

Kafka to hdfs sink Missing required configuration “confluent.topic.bootstrap.servers” which has no default value

阅读更多关于 Kafka to hdfs sink Missing required configuration “confluent.topic.bootstrap.servers” which has no default value

How to use webhdfs rest api to copy a file and store it in the another directory?

阅读更多关于 How to use webhdfs rest api to copy a file and store it in the another directory?

问题 How to use httpfs to copy a file in to another directory in hdfs. for example, I can use (http://1.12.134.1234:2020/webhdfs/v1/user/xx/x?op=create&user=hello) to create a file. I did not find in https://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#File_and_Directory_Operations 回答1: There is no webhdfs 'copy'command. Use hadoop distcp command (or retrieve contents, create new file, write the contents if the file is expected to be a smallish one). Copy a file

HDFS space consumed: “hdfs dfs -du /” vs “hdfs dfsadmin -report”

阅读更多关于 HDFS space consumed: “hdfs dfs -du /” vs “hdfs dfsadmin -report”

问题 Which tool is the right one to measure HDFS space consumed? When I sum up the output of "hdfs dfs -du /" I always get less amount of space consumed compared to "hdfs dfsadmin -report" ("DFS Used" line). Is there data that du does not take into account? 回答1: Hadoop file systems provides a relabel storage, by putting a copy of data to several nodes. The number of copies is replication factor, usually it is greate then one. Command hdfs dfs -du / shows space consume your data without

While writing to hdfs path getting error java.io.IOException: Failed to rename

阅读更多关于 While writing to hdfs path getting error java.io.IOException: Failed to rename

问题 I am using spark-sql-2.4.1v which is using hadoop-2.6.5.jar version . I need to save my data first on hdfs and move to cassandra later. Hence I am trying to save the data on hdfs as below: String hdfsPath = "/user/order_items/"; cleanedDs.createTempViewOrTable("source_tab"); givenItemList.parallelStream().forEach( item -> { String query = "select $item as itemCol , avg($item) as mean groupBy year"; Dataset<Row> resultDs = sparkSession.sql(query); saveDsToHdfs(hdfsPath, resultDs ); }); public

Sqoop import postgres to S3 failing

阅读更多关于 Sqoop import postgres to S3 failing

问题 I'm currently importing postgres data to hdfs. I'm planning to move the storage from hdfs to S3. When i'm trying to provide S3 location, the sqoop job is failing. I'm running it on EMR(emr-5.27.0) cluster and I've read/write access to that s3 bucket from all nodes in the cluster. sqoop import \ --connect "jdbc:postgresql://<machine_ip>:<port>/<database>?sslfactory=org.postgresql.ssl.NonValidatingFactory&ssl=true" \ --username <username> \ --password-file <password_file_path> \ --table

Sqoop import postgres to S3 failing

阅读更多关于 Sqoop import postgres to S3 failing