presto | 易学教程

Presto equivalent of MySQL group_concat

阅读更多关于 Presto equivalent of MySQL group_concat

问题 I'm new to Presto and looking to get the same functionality as the group_concat function in MySQL. Are the following two equivalent? If not, any suggestions for how I can recreate the group_concat functionality in Presto? MySQL: select a, group_concat(b separator ',') from table group by a Presto: select a, array_join(array_agg(b), ',') from table group by a (Found this as a suggested Presto workaround here when searching group_concat functionality.) 回答1: Try using this in place of group

extract json in array in AWS Athena

阅读更多关于 extract json in array in AWS Athena

问题 I have sent logs from kubernetes to an S3 bucket and want to query it using Athena The log looks like this [{ "date":1589895855.077230, "log":"192.168.85.35 - - [19/May/2020:13:44:15 +0000] \"GET /healthz HTTP/1.1\" 200 3284 \"-\" \"ELB-HealthChecker/2.0\" \"-\"", "stream":"stdout", "time":"2020-05-19T13:44:15.077230187Z", "kubernetes":{ "pod_name":"myapp-deployment-cd984ffb-kjfbm", "namespace_name":"master", "pod_id":"eace0175-99cd-11ea-95e4-0aee746ae5d6", "labels":{ "app":"myapp", "pod

Extract nested nested JSON array in Presto

阅读更多关于 Extract nested nested JSON array in Presto

问题 Say I have a JSON object that looks like this: {"attributes":{"blah":"bleh","transactionlist":[{"ids":["a","b","c","d"]}]}} I've attempted to extract the ids (a,b,c,d) into rows in Presto. From looking at other resources, it seems I should be casting the "ids" element into a map and then array, and unnest eventually. However, I am having some trouble doing this as the "ids" element is nested within a nested element. Anyone have any tips? Thanks! 回答1: Since the ids element in an JSON array

Extract nested nested JSON array in Presto

阅读更多关于 Extract nested nested JSON array in Presto

Presto 架构和原理简介（转）

阅读更多关于 Presto 架构和原理简介（转）

Presto 是 Facebook 推出的一个基于Java开发的大数据分布式 SQL 查询引擎，可对从数 G 到数 P 的大数据进行交互式的查询，查询的速度达到商业数据仓库的级别，据称该引擎的性能是 Hive 的 10 倍以上。Presto 可以查询包括 Hive、Cassandra 甚至是一些商业的数据存储产品，单个 Presto 查询可合并来自多个数据源的数据进行统一分析。Presto 的目标是在可期望的响应时间内返回查询结果，Facebook 在内部多个数据存储中使用 Presto 交互式查询，包括 300PB 的数据仓库，超过 1000 个 Facebook 员工每天在使用 Presto 运行超过 3 万个查询，每天扫描超过 1PB 的数据。目录： presto架构 presto低延迟原理 presto存储插件 presto执行过程 presto引擎对比 Presto架构 Presto查询引擎是一个Master-Slave的架构，由下面三部分组成: 一个Coordinator节点一个Discovery Server节点多个Worker节点 Coordinator: 负责解析SQL语句，生成执行计划，分发执行任务给Worker节点执行 Discovery Server: 通常内嵌于Coordinator节点中 Worker节点: 负责实际执行查询任务

Hive sql和Presto sql的一些对比

阅读更多关于 Hive sql和Presto sql的一些对比

最近由于工作上和生活上的一些事儿好久没来博客园了，但是写博客的习惯还是得坚持，新的一年需要更加努力，困知勉行，终身学习,每天都保持空杯心态.废话不说，写一些最近使用到的Presto SQL和Hive SQL的体会和对比. ###一.JSON处理对比 Hive select get_json_object(json, '$.book'); Presto select json_extract_scalar(json, '$.book'); 注意这里Presto中json_extract_scalar返回值是一个string类型,其还有一个函数json_extract是直接返回一个json串，所以使用的时候你得自己知道取的到底是一个什么类型的值. ###二.列转行对比 Hive select student, score from tests lateral view explode(split(scores, ',')) t as score; Presto select student, score from tests cross json unnest(split(scores, ',') as t (score); 简单的讲就是将scores字段中以逗号隔开的分数列比如 80,90,99,80 这种单列的值转换成和student列一对多的行的值映射. ###三

Kafka kSQL sql查询

阅读更多关于 Kafka kSQL sql查询

背景 kafka早期作为一个日志消息系统，很受运维欢迎的，配合ELK玩起来很happy，在kafka慢慢的转向流式平台的过程中，开发也慢慢介入了，一些业务系统也开始和kafka对接起来了，也还是很受大家欢迎的，由于业务需要，一部分小白也就免不了接触kafka了，这些小白总是会安奈不住好奇心，要精确的查看kafka中的某一条数据，作为服务提供方，我也很方啊，该怎么怼？业务方不敢得罪啊，只能写consumer去消费，然后人肉查询。需求有什么方法能直接查询kafka中已有的数据呢？那时候presto就映入眼帘了，初步探索后发现presto确实强大，和我们在用的impala有的一拼，支持的数据源也更多，什么redis、mongo、kafka都可以用sql来查询，真是救星啊，这样那群小白就可以直接使用presto来查询里面的数据了。不过presto在不开发插件的情况下，对kafka的数据有格式要求，支持json、avro。关于presto的调研见 presto实战。但是我只是想用sql查询kafka，而presto功能过于强大，必然整个框架就显得比较厚重了，功能多嘛。有什么轻量级的工具呢？介绍某一天，kafka的亲儿子KSQL就诞生了，KSQL是一个用于Apache kafka的流式SQL引擎，KSQL降低了进入流处理的门槛，提供了一个简单的、完全交互式的SQL接口

Flink 新场景：OLAP 引擎性能优化及应用案例

阅读更多关于 Flink 新场景：OLAP 引擎性能优化及应用案例

摘要：本文由阿里巴巴技术专家贺小令（晓令）分享，主要介绍 Apache Flink 新场景 OLAP 引擎，内容分为以下四部分：背景介绍 Flink OLAP 引擎案例介绍未来计划一、背景介绍 1.OLAP 及其分类 OLAP 是一种让用户可以用从不同视角方便快捷的分析数据的计算方法。主流的 OLAP 可以分为3类：多维 OLAP ( Multi-dimensional OLAP )、关系型 OLAP ( Relational OLAP ) 和混合 OLAP ( Hybrid OLAP ) 三大类。（1）多维 OLAP ( MOLAP ) 传统的 OLAP 分析方式数据存储在多维数据集中（2）关系型 OLAP ( ROLAP ) 以关系数据库为核心，以关系型结构进行多维数据的表示通过 SQL 的 where 条件以呈现传统 OLAP 的切片、切块功能（3）混合 OLAP ( HOLAP ) 将 MOLAP 和 ROLPA 的优势结合起来，以获得更快的性能以下将详细介绍每种分类的具体特征。 ■ 多维 OLAP ( MOLAP ) MOLAP 的典型代表是 Kylin 和 Druid。 MOLAP 处理流程首先，对原始数据做数据预处理；然后，将预处理后的数据存至数据仓库，用户的请求通过 OLAP server 即可查询数据仓库中的数据。 MOLAP 的优点和缺点

【异常】Cannot construct instance of `com.facebook.presto.jdbc.internal.client.QueryResults`, problem...

阅读更多关于【异常】Cannot construct instance of `com.facebook.presto.jdbc.internal.client.QueryResults`, problem...

一、异常内容 Caused by: com.facebook.presto.jdbc.internal.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `com.facebook.presto.jdbc.internal.client.QueryResults`, problem: stats is null 二、解决方式设置以下下面的session熟悉就可以了 connection.setSessionProperty("enable_hive_syntax","true"); 来源： oschina 链接： https://my.oschina.net/u/4353702/blog/4260621

Flink 新场景：OLAP 引擎性能优化及应用案例

阅读更多关于 Flink 新场景：OLAP 引擎性能优化及应用案例

订阅 presto