How do I output the results of a HiveQL query to CSV using a shell script?

前端 未结 2 2023
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-15 17:26

I would like to run multiple Hive queries, preferably in parallel rather than sequentially, and store the output of each query into a csv file. For example, query1

2条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-15 17:33

    With GNU Parallel it looks like this:

    doit() {
      id="$1"
      hive -e "SELECT * FROM db.table$id;" | tr "\t" "," > example"$id".csv
    }
    export -f doit
    parallel --bar doit ::: 1 2 3 4
    

    If your queries do not share the same template you can do:

    queries.txt:
    SELECT * FROM db.table1;
    SELECT id,name FROM db.person;
    ... other queries ...
    
    cat queries.txt | parallel --bar 'hive -e {} | tr "\t" "," > example{#}.csv'
    

    Spend 15 minute on reading chapter 1+2 of https://doi.org/10.5281/zenodo.1146014 to learn the basics and chapter 7 to learn more on how to run more jobs in parallel.

提交回复
热议问题