presto | 易学教程

Query fails on presto-cli for a table created in hive in orc format with data residing in s3

阅读更多关于 Query fails on presto-cli for a table created in hive in orc format with data residing in s3

问题 I set up an Amazon EMR instance which includes 1 Master & 1 Core (m4 Large) with the following version details: EMR : 5.5.0 Presto: Presto 0.170 Hadoop 2.7.3 HDFS Hive 2.1.1 Metastore My Spark app wrote out the data in ORC to Amazon S3. Then I created the table in hive ( create external table TABLE ... partition() stored as ORC location 's3a"//' ), and tried to query from presto-cli, and I get the following error for query SELECT * from TABLE : Query 20170615_033508_00016_dbhsn failed: com

Can't read data in Presto - can in Hive

阅读更多关于 Can't read data in Presto - can in Hive

问题 I have a Hive DB - I created a table, compatible to Parquet file type. CREATE EXTERNAL TABLE `default.table`( `date` date, `udid` string, `message_token` string) PARTITIONED BY ( `dt` date) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://Bucket/Folder') I added partitions to this table,

Presto array contains an element that likes some pattern

阅读更多关于 Presto array contains an element that likes some pattern

问题 For example, one column in my table is an array, I want to check if that column contains an element that contains substring "denied" (so elements like "denied at 12:00 pm", "denied by admin" will all count, I believe I will have to use "like" to identify the pattern). How to write sql for this? 回答1: Use presto's array functions: filter() , which returns elements that satisfy the given condition cardinality() , which returns the size of an array: Like this: where cardinality(filter(myArray, x

Presto array contains an element that likes some pattern

阅读更多关于 Presto array contains an element that likes some pattern

如何开发Presto Listener

阅读更多关于如何开发Presto Listener

开发新建2个类，分别实现 EventListenerFactory 和 EventListener 自定义一个Plugin，实现 com.facebook.presto.spi.Plugin ，重写 getEventListenerFactories ，返回自定义EventListenerFactory的实例在 resources 目录下新建文件夹 META-INF/services ，在该目录下新建文件 com.facebook.presto.spi.Plugin ，并追加自定义Listener的全类名配置在presto根目录的etc目录下新建event-listener.properties，并追加 // 必须，和EventListenerFactory.getName()保持一致 event-listener.name=custom-event-listener // 非必须，key和value会被作为EventListenerFactory.create的入参，方便自己实现一些逻辑，不需要可不写，也可以有多个 custom.key1=value2 custom.key2=value2 部署 $ # 将jar包部署到presto 主节点上，只需要是主节点就行，无需分配至所有节点 $ cd $PRESTO_HOME $ cd plugin $ mkdir custom

Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

阅读更多关于 Spark Streaming App stucks while writing and reading to/from Cassandra simultaneously

问题 I was doing some benchmarking that consists of the following data flow: Kafka --> Spark Streaming --> Cassandra --> Prestodb Infrastructure : My spark streaming application runs on 4 executors (2 cores 4g of memory each). Each executor runs on a datanode wherein Cassandra is installed. 4 PrestoDB workers are also co-located in the datanodes. My cluster has 5 nodes, each of them with an Intel core i5, 32GB of DDR3 RAM, 500GB SSD and 1gigabit network. Spark streaming application : My Spark

Get specific values from JSON column in Presto

阅读更多关于 Get specific values from JSON column in Presto

问题 I have a table with a JSON column points with one of the rows as: {"0": 0.2, "1": 1.2, "2": 0.5, "15": 1.2, "20": 0.7} I want to get the values for keys "1" and "20" and store them as an alias like first and second in a query. What I've done till now is: SELECT points, k, v from rewards CROSS JOIN UNNEST(SPLIT_TO_MAP(points, ',', ':')) AS m(k,v) where name='John' But this query gives me all the rows of k, v. How do I select only those two values corresponding to "1" and "20"? 回答1: JSON

presto对接cassandra

阅读更多关于 presto对接cassandra

因为业务需要而cassandra查询功能缺少全局排序，测试presto+cassandra查询的方案测试时使用的cassandra版本为Cassandra 3.11.3 测试时使用的presto版本为presto-server-0.230 测试环境：三个cantos节点： 10.28.3.137 cluster1 localhost 10.28.3.142 cluster2 localhost 10.28.3.144 cluster3 localhost 1.下载和安装cassandra 因测试集群已经安装号cassandra服务，具体安装步骤请参考cassandra官网或者其他cassandra安装的博客 2.下载和安装presto 1.下载地址请前往presto官网 https://prestodb.io/download.html presto-server-0.230.tar.gz presto-cli-0.230-executable.jar 并上传至服务器/usr/local路径下 2.解压 tar -xzvf presto-server-0.230.tar.gz 解压后的目录为【presto-server-0.230】 3.配置估摸这presto重点都在配置上了。。。。 3.1 首先在【presto-server-0.230】目录下创建【etc】文件夹

splitting a text string to matching columns in presto

阅读更多关于 splitting a text string to matching columns in presto

问题 I have a report from a presto query that gives me information in a string The raw data looks something like this: c_pre=CI2UhdX95uACFcKIdwodZ8QETQ;gtm=2od241;auiddc=*;u1=cz;u10=Not Available;u11=Not Available;u12=1;u13=Not Available;u14=SGD;u15=Not Available;u3=pdp;u4=undefined;u6=Not Available;~oref=https://www.bbc.com/ I found a excel workaround that splits this into seperate columns. screenshot attached for reference This process still takes quite a long time to do, and I was hoping to use

splitting a text string to matching columns in presto

阅读更多关于 splitting a text string to matching columns in presto