Multiple query execution in cloudera impala

拟墨画扇 提交于 2020-01-04 14:11:31

问题


Is it possible to execute multiple queries at the same time in impala ? If yes, how does impala handle it?


回答1:


I would certainly do some tests on your own, but I was not able to get multiple queries to execute: I was using Impala connection, and reading query from a .sql file. This works for single commands.

from impala.dbapi import connect
# actual server and port changed for this post for security
conn=connect(host='impala server', port=11111,auth_mechanism="GSSAPI")
cursor = conn.cursor()
cursor.execute((open("sandbox/z_temp.sql").read()))

This is the error I received.

 HiveServer2Error: AnalysisException: Syntax error in line 2:

This is what the SQL looked like in the .sql file.

Select * FROM database1.table1;
Select * FROM database1.table2;

I was able to run multiple commands with the SQL commands in separate .sql files iterating over all .sql files in a specified folder.

#Create list of file names for recon .sql files this will be sorted
#Numbers at begining of filename are important to sort so that files will be executed in correct order

file_names = glob.glob('folder/.sql')

asc_names = sorted(file_names, reverse = False)
filename = ""
for file_name in asc_names:
  str_filename = str(file_name)
  print(filename)
  query = (open(str_filename).read())

  cursor = conn.cursor()

# creates an error log dataframe to print, or write to file at end of job.

  try:
# Each SQL command must be executed seperately
    cursor.execute(query)
    df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'PASS'}])
    df_log = df_log.append(df_id, ignore_index=True)

  except:
    df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'FAIL'}])
    df_log = df_log.append(df_id, ignore_index=True)
    continue

Another way to do this would be to have all of the SQL statements in one .sql file separated by ; then loop thru the .sql file splitting statements out by ; running one at a time.

from impala.dbapi import connect
from impala.util import as_pandas

conn=connect(host='impalaserver', port=11111, auth_mechanism='GSSAPI')
cursor = conn.cursor()

# split SQL statements from one file seperated by ';', Note: last command will not have semicolon at end.

sql_file = open("sandbox/temp.sql").read()
sql = sql_file.split(';')
for cmd in sql:
# This gets rid of the non printing characters you may have
      cmd = cmd.replace('/r','')
      cmd = cmd.replace('/n','')
# This runs your SQL commands one at a time.
      cursor.execute(cmd)
  print(cmd)



回答2:


Impala can execute multiple queries at the same time as long as it doesn't hit the memory cap.




回答3:


You can issue a command like impala-shell -f <<file_name>>, where the file has multiple queries each complete query separated by a semi colon (;)




回答4:


If you are a python geek, you can even try the impyla package to create multiple connections and run all your queries at once.

pip install impyla



来源:https://stackoverflow.com/questions/17519385/multiple-query-execution-in-cloudera-impala

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!