presto

AWS Athena: Delete partitions between date range

不打扰是莪最后的温柔 提交于 2019-12-25 01:37:11
问题 I have an athena table with partition based on date like this: 20190218 I want to delete all the partitions that are created last year. I tried the below query, but it didnt work. ALTER TABLE tblname DROP PARTITION (partition1 < '20181231'); ALTER TABLE tblname DROP PARTITION (partition1 > '20181010'), Partition (partition1 < '20181231'); 回答1: According to https://docs.aws.amazon.com/athena/latest/ug/alter-table-drop-partition.html, ALTER TABLE tblname DROP PARTITION takes a partition spec,

How do I convert a string which is actually a date with timezone to a timestamp in Presto?

烂漫一生 提交于 2019-12-24 20:00:56
问题 Example : 2017-12-24 23:59:59.000 PST This does not work select date_parse('2017-12-24 23:59:59.000 PST','%Y-%m-%d %T.%f %x') Sure I can truncate the TZ which will solve select date_parse(substr('2017-12-24 23:59:59.000 PST',1,23),'%Y-%m-%d %T.%f') Is there a way to do this without truncating the TZ ? 回答1: date_parse doesn't seem to support time zones, use parse_datetime instead: presto> select parse_datetime('2017-12-24 23:59:59.000 PST', 'YYYY-MM-dd HH:mm:ss.SSS z'); _col0 -----------------

SQL query to get min, max rows

人走茶凉 提交于 2019-12-24 19:44:24
问题 I have following sample data, I want to get min and max time of every consecutive status. cat subcat status logtime fruits apple 0 30-10-2017 06:00 fruits apple 0 30-10-2017 06:03 fruits apple 0 30-10-2017 06:06 fruits apple 0 30-10-2017 06:09 fruits apple 0 30-10-2017 06:12 fruits apple 0 30-10-2017 06:15 fruits apple 0 30-10-2017 06:18 fruits apple 0 30-10-2017 06:21 fruits apple 0 30-10-2017 06:24 fruits apple 0 30-10-2017 06:27 fruits apple 0 30-10-2017 06:30 fruits apple 0 30-10-2017 06

How to connect to Presto JDBC in PySpark?

Deadly 提交于 2019-12-24 08:12:38
问题 I want to connect to Presto server using JDBC in PySpark. I followed a tutorial which is written in Java. I am trying to do the same in my Python3 code but getting an error: : java.sql.SQLException: No suitable driver I have tried to execute the following code: jdbcDF = spark.read \ .format("jdbc") \ .option("url", "jdbc:presto://my_machine_ip:8080/hive/default") \ .option("user", "airflow") \ .option("dbtable", "may30_1") \ .load() It should be noted that I am using Spark on EMR and so,

mismatched input 'ROW' expecting <EOF> error while creating hive table

 ̄綄美尐妖づ 提交于 2019-12-24 05:36:06
问题 I am trying to create a hive table using java. Here is my code: import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Connection; import java.sql.Statement; import java.sql.DriverManager; public class HiveCreateTable { private static String driverName = "com.facebook.presto.jdbc.PrestoDriver"; public static void main(String[] args) throws SQLException { // Register driver and create driver instance try { Class.forName(driverName); } catch (ClassNotFoundException e) { //

How to deduplicate in Presto

落花浮王杯 提交于 2019-12-24 05:14:14
问题 I have a Presto table assume it has [id, name, update_time] columns and data (1, Amy, 2018-08-01), (1, Amy, 2018-08-02), (1, Amyyyyyyy, 2018-08-03), (2, Bob, 2018-08-01) Now, I want to execute a sql and the result will be (1, Amyyyyyyy, 2018-08-03), (2, Bob, 2018-08-01) Currently, my best way to deduplicate in Presto is below. select t1.id, t1.name, t1.update_time from table_name t1 join (select id, max(update_time) as update_time from table_name group by id) t2 on t1.id = t2.id and t1.update

How to connect a Presto database to MySQL

随声附和 提交于 2019-12-24 03:59:11
问题 Is it possible to make joins between two tables from two different Presto catalogs? Assume I have a product table in MySQL and another table sales_order in hive; can I join them? If yes, then how can I specify the catalog for MySQL and what would my query look like? 回答1: EDIT: Since v0.76 Presto adds a MySQL connector and a PostgreSQL connector. For now, prestodb is not capable to connect to a relational database like mysql. More info in presto FAQ: Does Presto connect to MySQL / PostgreSQL /

Is there a way to use Facebook Presto 0.131 with Cassandra 3.0.0?

て烟熏妆下的殇ゞ 提交于 2019-12-23 18:11:06
问题 When querying a Cassandra 3.0.0 cluster using Presto 0.131 I get: All host(s) tried for query failed [..snip...] InvalidQueryException: unconfigured table schema_keyspaces I assume this is due to some change in Cassandra's system schemas? If so is there a workaround or should we wait for Presto to support Cassandra 3.0.0? 2016-11-05 Update : Newer versions of PrestoDB work fine with Cassandra 3+. Currently using 0.147 but the latest version is 0.156 and they all work fine with the new

How to execute Presto query using Java API? [closed]

て烟熏妆下的殇ゞ 提交于 2019-12-23 06:21:24
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . I am using Presto in Qubole Data Service on Azure. I want to execute Presto query from Java program. How can I execute query in Presto cluster which is on Qubole Data Service on Azure from Java Program? 回答1: Presto offers a normal JDBC driver that allows you to run SQL queries. All you have to do is to include

Reusing subqueries in AWS Athena generate large amount of data scanned

笑着哭i 提交于 2019-12-23 03:44:10
问题 On AWS Athena, I am trying to reuse computed data using a WITH clause, e.g. WITH temp_table AS (...) SELECT ... FROM temp_table t0, temp_table t1, temp_table t2 WHERE ... If the query is fast, the "Data scanned" goes through the roof. As if temp_table is computed for each time it is reference in the FROM clause. I don't see the issue if I create a temp table separately and use it multiple times in the query. Is there a way to really reuse a subquery multiple times without any penalty? 来源: