What is --direct mode in sqoop?

爱⌒轻易说出口 提交于 2019-12-05 23:35:43

问题


As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE.

And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run.

My doubt is what is sqoop direct and what when to go with sqoop direct option?


回答1:


Just read the Sqoop documentation!

  • General principles are located here for imports and there for exports

Some databases can perform imports in a more high-performance fashion by using database-specific data movement tools (...)


Some databases provides a direct mode for exports as well (...)

Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25
  • Section 25 under MySQL
  • Section 25 under Oracle data connector for Hadoop
  • etc.

Bottom line: "direct mode" means different things for different databases.
For MySQL or PostgreSQL it relates to bulk loader/unloader utilities (i.e. completetely bypassing JDBC); while for Oracle it relates to "direct path INSERT" i.e. with JDBC but in a non-transactional mode (so you'd better use a temp table, or you might end up with duplicates in a PK and a corrupt table).




回答2:


To be short and precise,its the mode for fast import which doesn't runs any mappers or reducers.

sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES --direct

Notes:

  1. --direct is only supported in mysql and postgresql.
  2. Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns.



回答3:


From Managing Big Data in Clusters and Cloud Storage

By default, Sqoop uses JDBC to connect to the database. However, depending on the database, there may be a faster, database-specific connector available, which you can use by using the --direct option.

So, you go with --direct option when you want to use a different database connector than the default.




回答4:


--direct - Use direct import fast path

By supplying the --direct argument, you are specifying that Sqoop should attempt the direct import channel. This channel may be higher performance than using JDBC.

For MySQL:

MySQL Direct Connector allows faster import and export to/from MySQL using mysqldump and mysqlimport tools functionality instead of SQL selects and inserts.

Details about use of direct mode with each specific RDBMS, installation requirements, available options and limitations can be found in Section 25, “Notes for specific connectors”.




回答5:


You can improve the performance by giving --direct option in sqoop

But do not use it for non-priority jobs as more usage of direct may bring down the source/target DB

http://archive.cloudera.com/docs-backup/sqoop/_direct_mode_imports.html



来源:https://stackoverflow.com/questions/39150465/what-is-direct-mode-in-sqoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!