presto

How can I get result format JSON from Athena in AWS?

白昼怎懂夜的黑 提交于 2021-02-19 03:18:58
问题 I want to get result value format JSON from Athena in AWS. When I select from the Athena then the result format like this. {test.value={report_1=test, report_2=normal, report_3=hard}} Is there any way to get JSON format result without replacing "=" to ":" ? The column format is map<string,map<string,string>> 回答1: select mycol from mytable ; +--------------------------------------------------------------+ | mycol | +--------------------------------------------------------------+ | {test.value=

How can I get result format JSON from Athena in AWS?

不羁岁月 提交于 2021-02-19 03:17:38
问题 I want to get result value format JSON from Athena in AWS. When I select from the Athena then the result format like this. {test.value={report_1=test, report_2=normal, report_3=hard}} Is there any way to get JSON format result without replacing "=" to ":" ? The column format is map<string,map<string,string>> 回答1: select mycol from mytable ; +--------------------------------------------------------------+ | mycol | +--------------------------------------------------------------+ | {test.value=

How to get all database tables referenced in a presto query using presto-parser

丶灬走出姿态 提交于 2021-02-11 14:52:38
问题 I'm using presto-parser to figure out which database tables are being queried in any given Presto query. I implemented a class that extends DefaultTraversalVisitor like this public class ProcessTables extends DefaultTraversalVisitor<Void, Void> { private Set<String> tables; ProcessTables() { this.tables = new LinkedHashSet<>(); } Set<String> getTables() { return tables; } @Override protected Void visitTable(Table node, Void context) { this.tables.add(node.getName().toString()); return

Presto SQL / Athena: select between times across different days

让人想犯罪 __ 提交于 2021-02-11 14:21:37
问题 I have a database that contains a series of events and their timestamp. I find myself needing to select all events that happen between 11:00 and 11:10 and 21:00 and 21:05, for all days. So what I would do is I extract from timestamp the hour and the minute, and: SELECT * WHERE (hour = 11 AND minute <= 10) OR (hour = 21 AND minute <= 05) However, I was wondering if there's a simpler / less verbose way to do this, such as when you query between dates: SELECT * WHERE date BETWEEN '2020-07-01'

Presto (Athena) loading of a CSV file with quote-escaped commas

别来无恙 提交于 2021-02-10 13:37:16
问题 Consider the following row in a CSV file: 1,0,True,"{""foo"":null,""bar"":null}",0,1 ▲ The highlighted , is part of a column . That is, this full text: " {""foo"":null,""bar"":null}" is the value of a single column. However AWS Athena is interpreting the highlighted , as a column-delimiting comma , incorrectly splitting that text into multiple columns. I know I could change the column delimiter to something else to avoid this problem. My question is: Is this a bug in AWS Athena / Presto? How

亿级数据,秒级响应,Smartbi究竟如何做到?

ぃ、小莉子 提交于 2021-02-09 11:57:34
关于 Smartbi,似乎有很多标签:真Excel、复杂报表、性能、自助分析、数据挖掘、NLP….其中,一个“性能”标签,江湖上就有很多的传说,例如应用于火星探测器飞行数据的分析,应用于某省的经济普查,应用于某银行的大规模数据挖掘等等。 数据处理的性能,对于一款 BI软件 来说,是最基本的要求。然而,恰恰最基本的要求,却最能体现产品的品质,使其在众多竞品中脱颖而出。 那么, Smartbi又是如何做到数据处理性能如此强悍呢? 一、 支持列式数据库 传统行式数据库的存储格式按照 ‘行’的方式把一行各个字段的数据存储在一起,一行行连续存储。对于把一行的数据写到数据库中,或者对一行数据中的某些字段进行修改,或者删除整行数据这些事务型的数据库操作来说,既直观也高效。 但是,在行式数据库上做 统计分析 的时候,这种存储格式效率并不高。例如:统计各地区的销售额和利润同比变化、统计各部门的业绩完成情况等等,都是在其中某些字段上的操作,但行式数据库却需要读取每一行的所有字段。在只分析销售额和利润的时候,把其它字段的数据如客户名称,签约时间,客户经理等等也统统都读了进来,浪费了大量资源。虽然通过 “索引”有一定的改善,但大量的索引所带来的存储空间浪费以及为维护这些索引所带来的时间浪费都会以指数级别增长。 图源:网络 列式数据库将同一个数据 “列”的各个值存放在一起,插入某一行数据时

How S3 select pricing works? What is data returned and scanned in s3 select means

左心房为你撑大大i 提交于 2021-02-07 13:40:38
问题 I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select? There is less documentation on these terms of S3 select 回答1: To keep things simple lets forget for some time that S3 reads in a columnar way. Suppose you have the following data: | City | Last Updated Date | |------------|---------------------| | London | 1st Jan | | London | 2nd Jan | | New Delhi | 2nd Jan | A query for fetching the latest update date forces

Presto check if NULL and return default (NVL analog)

[亡魂溺海] 提交于 2021-02-06 14:48:18
问题 Is there any analog of NVL in Presto DB? I need to check if a field is NULL and return a default value. I solve this somehow like this: SELECT CASE WHEN my_field is null THEN 0 ELSE my_field END FROM my_table But I'm curious if there is something that could simplify this code. 回答1: The ISO SQL function for that is COALESCE coalesce(my_field,0) https://prestodb.io/docs/current/functions/conditional.html P.S. COALESCE can be used with multiple arguments. It will return the first (from the left)

AWS Athena (Presto) how to transpose map to columns

℡╲_俬逩灬. 提交于 2021-02-05 10:51:49
问题 AWS Athena query question; I have a nested map in my rows, of which I would like to transpose the keys to columns. I could name the columns explicitly like items['label_a'] , but in this case the keys are actually dynamic... From these rows: {id=1, items={label_a=foo, label_b=foo}} {id=2, items={label_a=bar, label_c=bar}} {id=3, items={label_b=baz, label_c=baz}} I would like to get a table like so: | id | label_a | label_b | label_c | ------------------------------------ | 1 | foo | foo | | |