How to Access Hive via Python?

后端 未结 16 875
小蘑菇
小蘑菇 2020-11-30 17:11

https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-Python appears to be outdated.

When I add this to /etc/profile:

export PYTHONP         


        
16条回答
  •  鱼传尺愫
    2020-11-30 17:56

    I believe the easiest way is to use PyHive.

    To install you'll need these libraries:

    pip install sasl
    pip install thrift
    pip install thrift-sasl
    pip install PyHive
    

    Please note that although you install the library as PyHive, you import the module as pyhive, all lower-case.

    If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager for your distribution. For Windows there are some options on GNU.org, you can download a binary installer. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install in Terminal)

    After installation, you can connect to Hive like this:

    from pyhive import hive
    conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
    

    Now that you have the hive connection, you have options how to use it. You can just straight-up query:

    cursor = conn.cursor()
    cursor.execute("SELECT cool_stuff FROM hive_table")
    for result in cursor.fetchall():
      use_result(result)
    

    ...or to use the connection to make a Pandas dataframe:

    import pandas as pd
    df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
    

提交回复
热议问题