Why is predicate pushdown not working?

℡╲_俬逩灬. 提交于 2019-12-24 14:09:52

问题


Programm Sketch

  • I create a HiveContext hiveContext.
  • With that context, I create a DataFrame df from a JDBC relational table.
  • I register the DataFrame df via df.registerTempTable("TESTTABLE").
  • I start a HiveThriftServer2 via HiveThriftServer2.startWithContext(hiveContext).

The TESTTABLE contains 1,000,000 entries, columns are ID (INT) and NAME (VARCHAR)

+-----+--------+
| ID  |  NAME  |
+-----+--------+
| 1   | Hello  |
| 2   | Hello  |
| 3   | Hello  |
| ... | ...    |

With Beeline I access the SQL Endpoint (at port 10000) of the HiveThriftServer and perform a query. E.g.

SELECT * FROM TESTTABLE WHERE ID='3'

When I inspect the QueryLog of the DB with the SQL Statements executed I see

/*SQL #:1000000 t:657*/  SELECT \"ID\",\"NAME\" FROM test;

So there happens no predicate pushdown , as the where clause is missing.

Questions

This gives raise to the following questions:

  • Why is no predicate pushdown performed?
  • Can this be changed by not using registerTempTable?
  • If so, how? Or is this a known restriction of the HiveThriftServer?

Counterexample

If I create a DataFrame df in Spark SQLContext and call

df.filter( df("ID") === 3).show()

I observe

/*SQL #:1*/SELECT \"ID\",\"NAME\" FROM test WHERE ID = 3;

as expected.


回答1:


It's probably too late to answer. In this scenario, this did not work because ID is defined as Int and in the original query you are passing a string ('3'). Predicate pushdown looks for the same column name and the type as well.



来源:https://stackoverflow.com/questions/33462386/why-is-predicate-pushdown-not-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!