presto

Extract substrings from field sql/presto

左心房为你撑大大i 提交于 2020-01-05 03:46:10
问题 I have columns in my database that contains values separated by /. I am trying to extract certain values from columns and create new row with them. Example of data look like below; user/values2/class/year/subject/18/9/2000291.csv holiday/booking/type/1092/1921/1.csv drink/water/juice/1/232/89.json drink/water1/soft/90091/2/89.csv car/type/1/001/1.json game/mmo/1/2/3.json I want to extract the last 3 numbers from the data e.g., from user/values2/class/year/subject/18/9/2000291.csv I want x =

How to cast varchar to MAP(VARCHAR,VARCHAR) in presto

与世无争的帅哥 提交于 2020-01-04 05:36:52
问题 I have table in presto, one column named ("mappings") have key-value pair as string select mappings from hello; Ex: {"foo": "baar", "foo1": "bar1" } I want to cast "mappings" column into a MAP like select CAST("mappings" as MAP) from hello; This will throw error in presto. How can we translate this to map? 回答1: There is no canonical string representation for a MAP in Presto, so so there's no way to cast it directly to MAP(VARCHAR, VARCHAR) . But, if your string contains a JSON map, you can

Presto unnest json

我们两清 提交于 2020-01-02 11:05:52
问题 follwing this question: how to cross join unnest a json array in presto I tried to run the example provided but I get and error while doing so the SQL command: select x.n from unnest(cast(json_extract('{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]}','$.payload') as array<varchar>)) as x(n) the error I got: Value cannot be cast to array<varchar> java.lang.RuntimeException: java.lang.NullPointerException: string is null 回答1: SELECT JSON_EXTRACT('{"payload":[{"type":"b","value"

reduce the amount of data scanned by Athena when using aggregate functions

浪尽此生 提交于 2020-01-02 07:34:10
问题 The below query scans 100 mb of data. select * from table where column1 = 'val' and partition_id = '20190309'; However the below query scans 15 GB of data (there are over 90 partitions) select * from table where column1 = 'val' and partition_id in (select max(partition_id) from table); How can I optimize the second query to scan the same amount of data as the first? 回答1: There are two problems here. The efficiency of the the scalar subquery above select max(partition_id) from table , and the

Presto server - Cannot connect to discovery server for announce

别来无恙 提交于 2020-01-02 05:45:11
问题 Trying to run Presto with standalone Coordinator/several worker nodes. Coordinator node starts, but can not announce itself to the Discovery service (running on the same node). Starting presto worker on another node also fails to announce to the Discovery service and thus this problem when qyerying: failed: No nodes available to run query . Coordinator/Discovery node config: coordinator=true datasources=jmx http-server.http.port=8000 presto-metastore.db.type=h2 presto-metastore.db.filename

Setup Standalone Hive Metastore Service For Presto and AWS S3

泄露秘密 提交于 2020-01-01 05:03:33
问题 I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive

Setup Standalone Hive Metastore Service For Presto and AWS S3

亡梦爱人 提交于 2020-01-01 05:03:32
问题 I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive

Casting not working correctly in Amazon Athena (Presto)?

不羁的心 提交于 2019-12-31 05:15:06
问题 I have a doctor license registry dataset which includes the total_submitted_charge_amount for each doctor as well as the number of entitlements with medicare & medicaid . I used the query from the answer suggested below: with datamart AS (SELECT npi, provider_last_name, provider_first_name, provider_mid_initial, provider_address_1, provider_address_2, provider_city, provider_zipcode, provider_state_code, provider_country_code, provider_type, number_of_services, CASE WHEN REPLACE(num

How to convert Java timestamp stored as bigint to timestamp in Presto?

对着背影说爱祢 提交于 2019-12-31 01:44:10
问题 I've had little luck searching for this over a couple days. If my avro schema for data in a hive table is: { "type" : "record", "name" : "messages", "namespace" : "com.company.messages", "fields" : [ { "name" : "timeStamp", "type" : "long", "logicalType" : "timestamp-millis" }, { … and I use presto to query this, I do not get formatted timestamps. select "timestamp", typeof("timestamp") as type, current_timestamp as "current_timestamp", typeof(current_timestamp) as current_type from db

Presto- get timestamp difference

╄→гoц情女王★ 提交于 2019-12-25 03:24:45
问题 I am new to PrestoDB and want to write a query which will compare two timestamps, the first row date will be compare with the immediate next date row and if the difference is greater than 15 mins, then it will print that row. I have written below query but while executing it is throwing the error: "unexpected parameter(timestamp with timezone) for function from_iso8601_timestamp". SELECT mt.logical_name, mt.cable_name, mt.dt, mt.met_date, date_diff('second', from_iso8601_timestamp(met_date),