How to Access Hive via Python?

后端 未结 16 879
小蘑菇
小蘑菇 2020-11-30 17:11

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:

export PYTHONP         


        
16条回答
  •  心在旅途
    2020-11-30 17:34

    pyhs2 is no longer maintained. A better alternative is impyla

    Don't be confused that some of the above examples below about Impala; just change port to 10000 (default) for HiveServer2, and it'll work the same way as with Impala examples. It's the same protocol (Thrift) that is used for both Impala and Hive.

    https://github.com/cloudera/impyla

    It has many more features over pyhs2, for example, it has Kerberos authentication, which is a must for us.

    from impala.dbapi import connect
    conn = connect(host='my.host.com', port=10000)
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    print cursor.description  # prints the result set's schema
    results = cursor.fetchall()
    
    ##
    cursor.execute('SELECT * FROM mytable LIMIT 100')
    for row in cursor:
        process(row)
    

    Cloudera is putting more effort now on hs2 client https://github.com/cloudera/hs2client which is a C/C++ HiveServer2/Impala client. Might be a better option if you push a lot of data to/from python. (has Python binding too - https://github.com/cloudera/hs2client/tree/master/python )

    Some more information on impyla:

    • http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/
    • https://github.com/cloudera/impyla/blob/master/README.md

提交回复
热议问题