presto

Query fails on presto-cli for a table created in hive in orc format with data residing in s3

旧城冷巷雨未停 提交于 2020-02-25 03:55:37
问题 I set up an Amazon EMR instance which includes 1 Master & 1 Core (m4 Large) with the following version details: EMR : 5.5.0 Presto: Presto 0.170 Hadoop 2.7.3 HDFS Hive 2.1.1 Metastore My Spark app wrote out the data in ORC to Amazon S3. Then I created the table in hive ( create external table TABLE ... partition() stored as ORC location 's3a"//' ), and tried to query from presto-cli, and I get the following error for query SELECT * from TABLE : Query 20170615_033508_00016_dbhsn failed: com

Can't read data in Presto - can in Hive

≯℡__Kan透↙ 提交于 2020-01-24 04:51:07
问题 I have a Hive DB - I created a table, compatible to Parquet file type. CREATE EXTERNAL TABLE `default.table`( `date` date, `udid` string, `message_token` string) PARTITIONED BY ( `dt` date) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://Bucket/Folder') I added partitions to this table,

Presto array contains an element that likes some pattern

╄→гoц情女王★ 提交于 2020-01-22 12:53:26
问题 For example, one column in my table is an array, I want to check if that column contains an element that contains substring "denied" (so elements like "denied at 12:00 pm", "denied by admin" will all count, I believe I will have to use "like" to identify the pattern). How to write sql for this? 回答1: Use presto's array functions: filter() , which returns elements that satisfy the given condition cardinality() , which returns the size of an array: Like this: where cardinality(filter(myArray, x

Presto array contains an element that likes some pattern

╄→尐↘猪︶ㄣ 提交于 2020-01-22 12:50:05
问题 For example, one column in my table is an array, I want to check if that column contains an element that contains substring "denied" (so elements like "denied at 12:00 pm", "denied by admin" will all count, I believe I will have to use "like" to identify the pattern). How to write sql for this? 回答1: Use presto's array functions: filter() , which returns elements that satisfy the given condition cardinality() , which returns the size of an array: Like this: where cardinality(filter(myArray, x

如何开发Presto Listener

折月煮酒 提交于 2020-01-18 08:07:46
开发 新建2个类,分别实现 EventListenerFactory 和 EventListener 自定义一个Plugin,实现 com.facebook.presto.spi.Plugin ,重写 getEventListenerFactories ,返回自定义EventListenerFactory的实例 在 resources 目录下新建文件夹 META-INF/services ,在该目录下新建文件 com.facebook.presto.spi.Plugin ,并追加自定义Listener的全类名 配置 在presto根目录的etc目录下新建event-listener.properties,并追加 // 必须,和EventListenerFactory.getName()保持一致 event-listener.name=custom-event-listener // 非必须,key和value会被作为EventListenerFactory.create的入参,方便自己实现一些逻辑,不需要可不写,也可以有多个 custom.key1=value2 custom.key2=value2 部署 $ # 将jar包部署到presto 主节点上,只需要是主节点就行,无需分配至所有节点 $ cd $PRESTO_HOME $ cd plugin $ mkdir custom

Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

偶尔善良 提交于 2020-01-17 08:11:28
问题 I was doing some benchmarking that consists of the following data flow: Kafka --> Spark Streaming --> Cassandra --> Prestodb Infrastructure : My spark streaming application runs on 4 executors (2 cores 4g of memory each). Each executor runs on a datanode wherein Cassandra is installed. 4 PrestoDB workers are also co-located in the datanodes. My cluster has 5 nodes, each of them with an Intel core i5, 32GB of DDR3 RAM, 500GB SSD and 1gigabit network. Spark streaming application : My Spark

Get specific values from JSON column in Presto

老子叫甜甜 提交于 2020-01-16 08:17:12
问题 I have a table with a JSON column points with one of the rows as: {"0": 0.2, "1": 1.2, "2": 0.5, "15": 1.2, "20": 0.7} I want to get the values for keys "1" and "20" and store them as an alias like first and second in a query. What I've done till now is: SELECT points, k, v from rewards CROSS JOIN UNNEST(SPLIT_TO_MAP(points, ',', ':')) AS m(k,v) where name='John' But this query gives me all the rows of k, v. How do I select only those two values corresponding to "1" and "20"? 回答1: JSON

presto对接cassandra

最后都变了- 提交于 2020-01-15 00:58:16
因为业务需要而cassandra查询功能缺少全局排序,测试presto+cassandra查询的方案 测试时使用的cassandra版本为Cassandra 3.11.3 测试时使用的presto版本为presto-server-0.230 测试环境:三个cantos节点: 10.28.3.137 cluster1 localhost 10.28.3.142 cluster2 localhost 10.28.3.144 cluster3 localhost 1.下载和安装cassandra 因测试集群已经安装号cassandra服务,具体安装步骤请参考cassandra官网或者其他cassandra安装的博客 2.下载和安装presto 1.下载地址请前往presto官网 https://prestodb.io/download.html presto-server-0.230.tar.gz presto-cli-0.230-executable.jar 并上传至服务器/usr/local路径下 2.解压 tar -xzvf presto-server-0.230.tar.gz 解压后的目录为【presto-server-0.230】 3.配置 估摸这presto重点都在配置上了。。。。 3.1 首先在【presto-server-0.230】目录下创建【etc】文件夹

splitting a text string to matching columns in presto

你说的曾经没有我的故事 提交于 2020-01-06 07:53:31
问题 I have a report from a presto query that gives me information in a string The raw data looks something like this: c_pre=CI2UhdX95uACFcKIdwodZ8QETQ;gtm=2od241;auiddc=*;u1=cz;u10=Not Available;u11=Not Available;u12=1;u13=Not Available;u14=SGD;u15=Not Available;u3=pdp;u4=undefined;u6=Not Available;~oref=https://www.bbc.com/ I found a excel workaround that splits this into seperate columns. screenshot attached for reference This process still takes quite a long time to do, and I was hoping to use

splitting a text string to matching columns in presto

末鹿安然 提交于 2020-01-06 07:53:08
问题 I have a report from a presto query that gives me information in a string The raw data looks something like this: c_pre=CI2UhdX95uACFcKIdwodZ8QETQ;gtm=2od241;auiddc=*;u1=cz;u10=Not Available;u11=Not Available;u12=1;u13=Not Available;u14=SGD;u15=Not Available;u3=pdp;u4=undefined;u6=Not Available;~oref=https://www.bbc.com/ I found a excel workaround that splits this into seperate columns. screenshot attached for reference This process still takes quite a long time to do, and I was hoping to use