presto | 易学教程

Extract substrings from field sql/presto

阅读更多关于 Extract substrings from field sql/presto

问题 I have columns in my database that contains values separated by /. I am trying to extract certain values from columns and create new row with them. Example of data look like below; user/values2/class/year/subject/18/9/2000291.csv holiday/booking/type/1092/1921/1.csv drink/water/juice/1/232/89.json drink/water1/soft/90091/2/89.csv car/type/1/001/1.json game/mmo/1/2/3.json I want to extract the last 3 numbers from the data e.g., from user/values2/class/year/subject/18/9/2000291.csv I want x =

How to cast varchar to MAP(VARCHAR,VARCHAR) in presto

阅读更多关于 How to cast varchar to MAP(VARCHAR,VARCHAR) in presto

问题 I have table in presto, one column named ("mappings") have key-value pair as string select mappings from hello; Ex: {"foo": "baar", "foo1": "bar1" } I want to cast "mappings" column into a MAP like select CAST("mappings" as MAP) from hello; This will throw error in presto. How can we translate this to map? 回答1: There is no canonical string representation for a MAP in Presto, so so there's no way to cast it directly to MAP(VARCHAR, VARCHAR) . But, if your string contains a JSON map, you can

Presto unnest json

阅读更多关于 Presto unnest json

问题 follwing this question: how to cross join unnest a json array in presto I tried to run the example provided but I get and error while doing so the SQL command: select x.n from unnest(cast(json_extract('{"payload":[{"type":"b","value":"9"}, {"type":"a","value":"8"}]}','$.payload') as array<varchar>)) as x(n) the error I got: Value cannot be cast to array<varchar> java.lang.RuntimeException: java.lang.NullPointerException: string is null 回答1: SELECT JSON_EXTRACT('{"payload":[{"type":"b","value"

reduce the amount of data scanned by Athena when using aggregate functions

阅读更多关于 reduce the amount of data scanned by Athena when using aggregate functions

问题 The below query scans 100 mb of data. select * from table where column1 = 'val' and partition_id = '20190309'; However the below query scans 15 GB of data (there are over 90 partitions) select * from table where column1 = 'val' and partition_id in (select max(partition_id) from table); How can I optimize the second query to scan the same amount of data as the first? 回答1: There are two problems here. The efficiency of the the scalar subquery above select max(partition_id) from table , and the

Presto server - Cannot connect to discovery server for announce

阅读更多关于 Presto server - Cannot connect to discovery server for announce

问题 Trying to run Presto with standalone Coordinator/several worker nodes. Coordinator node starts, but can not announce itself to the Discovery service (running on the same node). Starting presto worker on another node also fails to announce to the Discovery service and thus this problem when qyerying: failed: No nodes available to run query . Coordinator/Discovery node config: coordinator=true datasources=jmx http-server.http.port=8000 presto-metastore.db.type=h2 presto-metastore.db.filename

Setup Standalone Hive Metastore Service For Presto and AWS S3

阅读更多关于 Setup Standalone Hive Metastore Service For Presto and AWS S3

问题 I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive

Setup Standalone Hive Metastore Service For Presto and AWS S3

阅读更多关于 Setup Standalone Hive Metastore Service For Presto and AWS S3

Casting not working correctly in Amazon Athena (Presto)?

阅读更多关于 Casting not working correctly in Amazon Athena (Presto)?

问题 I have a doctor license registry dataset which includes the total_submitted_charge_amount for each doctor as well as the number of entitlements with medicare & medicaid . I used the query from the answer suggested below: with datamart AS (SELECT npi, provider_last_name, provider_first_name, provider_mid_initial, provider_address_1, provider_address_2, provider_city, provider_zipcode, provider_state_code, provider_country_code, provider_type, number_of_services, CASE WHEN REPLACE(num

How to convert Java timestamp stored as bigint to timestamp in Presto?

阅读更多关于 How to convert Java timestamp stored as bigint to timestamp in Presto?

问题 I've had little luck searching for this over a couple days. If my avro schema for data in a hive table is: { "type" : "record", "name" : "messages", "namespace" : "com.company.messages", "fields" : [ { "name" : "timeStamp", "type" : "long", "logicalType" : "timestamp-millis" }, { … and I use presto to query this, I do not get formatted timestamps. select "timestamp", typeof("timestamp") as type, current_timestamp as "current_timestamp", typeof(current_timestamp) as current_type from db

Presto- get timestamp difference

阅读更多关于 Presto- get timestamp difference

问题 I am new to PrestoDB and want to write a query which will compare two timestamps, the first row date will be compare with the immediate next date row and if the difference is greater than 15 mins, then it will print that row. I have written below query but while executing it is throwing the error: "unexpected parameter(timestamp with timezone) for function from_iso8601_timestamp". SELECT mt.logical_name, mt.cable_name, mt.dt, mt.met_date, date_diff('second', from_iso8601_timestamp(met_date),