Access Hive Data Using Python

前端 未结 4 1681
有刺的猬
有刺的猬 2020-12-09 19:13

I have some data in HDFS,i need to access that data using python,can anyone tell me how data is accessed from hive using python?

4条回答
  •  再見小時候
    2020-12-09 19:52

    To install you'll need these libraries:

    pip install sasl
    pip install thrift
    pip install thrift-sasl
    pip install PyHive
    

    If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install)

    After installation, you can execute a hive query like this:

    from pyhive import hive
    conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
    

    Now that you have the hive connection, you have options how to use it. You can just straight-up query:

    cursor = conn.cursor()
    cursor.execute("SELECT cool_stuff FROM hive_table")
    for result in cursor.fetchall():
      use_result(result)
    

    ...or to use the connection to make a Pandas dataframe:

    import pandas as pd
    df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
    

提交回复
热议问题