Hive Create Multi small files for each insert in HDFS

前端 未结 3 644
轻奢々
轻奢々 2020-12-14 13:19

following is already been achieved

  1. Kafka Producer pulling data from twitter using Spark Streaming.
  2. Kafka Consumer ingesting data into Hive External t
3条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-14 14:14

    Hive was designed for massive batch processing, not for transactions. That's why you have at least one data file for each LOAD or INSERT-SELECT command. And that's also why you have no INSERT-VALUES command, hence the lame syntax displayed in your post as a necessary workaround.

    Well... that was true until transaction support was introduced. In a nutshell you need (a) Hive V0.14 and later (b) an ORC table (c) transaction support enabled on that table (i.e. locks, periodic background compaction, etc)

    The wiki about Streaming data ingest in Hive might be a good start.

提交回复
热议问题