ClickHouse

How to create primary keys in ClickHouse

不打扰是莪最后的温柔 提交于 2019-12-05 11:28:46
I did found few examples in the documentation where primary keys are created by passing parameters to ENGINE section. But I did not found any description about any argument to ENGINE, what it means and how do I create a primary key. Thanks in advance. It would be great to add this info to the documentation it it's not present. Primary key is supported for MergeTree storage engines family. https://clickhouse.yandex/reference_en.html#MergeTree Note that for most serious tasks, you should use engines from the MergeTree family. It is specified as parameters to storage engine. The engine accepts

ClickHouse Kafka Performance

梦想的初衷 提交于 2019-12-04 08:22:11
Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/ I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table. Here the structure of my tables: CREATE TABLE games ( UserId UInt32, ActivityType UInt8, Amount Float32, CurrencyId UInt8, Date String ) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3'); CREATE TABLE tests.games_transactions ( day Date, UserId UInt32, Amount

Why am I getting a line Feed Error when loading data via Powershell into clickhouse in Docker on Windows?

懵懂的女人 提交于 2019-12-02 23:22:55
问题 I am attempting to load data into clickhouse in a docker container built in windows docker desktop. I have my mock data prepared in R, written as a csv and my table created in clickhouse (i'm ommiting the connections): library(dplyr) library(data.table) library(clickhouse) setwd("C:/Users/xxxx/Documents/testing_load") my_df = data.table(datetime = as.character(c("2018-01-01 11:21:00", "2019-01-01 11:45:00"))) c(2018, 2019) %>% lapply(function(y) { print(y) fwrite(my_df[substr(datetime,1,4) ==

Why am I getting a line Feed Error when loading data via Powershell into clickhouse in Docker on Windows?

廉价感情. 提交于 2019-12-02 13:24:31
I am attempting to load data into clickhouse in a docker container built in windows docker desktop. I have my mock data prepared in R, written as a csv and my table created in clickhouse (i'm ommiting the connections): library(dplyr) library(data.table) library(clickhouse) setwd("C:/Users/xxxx/Documents/testing_load") my_df = data.table(datetime = as.character(c("2018-01-01 11:21:00", "2019-01-01 11:45:00"))) c(2018, 2019) %>% lapply(function(y) { print(y) fwrite(my_df[substr(datetime,1,4) == y], paste("test_",y,".csv"), row.names = F, col.names = F ) }) dbSendQuery(con, paste( "CREATE TABLE

Clickhouse as time-series storage

跟風遠走 提交于 2019-11-30 22:46:37
I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name. Have not found any information about dynamical table extend in official documentation. So can this case be implemented in Clickhouse? UPD: After some benchmarks we found out that ClickHouse writes new data faster than our current time-series storage, but reads data much more

Clickhouse as time-series storage

谁说我不能喝 提交于 2019-11-30 17:21:48
问题 I just wonder if ClickHouse can be used for storing time-series data in the case like this: schema with columns: "some_entity_id", "timestamp", "metric1", "metric2", "metric3", ..., "metricN". Where each new column containing metric name can be added to the table dynamically, while adding entry with this metric name. Have not found any information about dynamical table extend in official documentation. So can this case be implemented in Clickhouse? UPD: After some benchmarks we found out that

Clickhouse单机部署以及从mysql增量同步数据

天涯浪子 提交于 2019-11-30 03:56:28
背景: 随着数据量的上升,OLAP一直是被讨论的话题,虽然druid,kylin能够解决OLAP问题,但是druid,kylin也是需要和hadoop全家桶一起用的,异常的笨重,再说我也搞不定,那只能找我能搞定的技术。故引进clickhoue,关于clickhoue在17年本人就开始关注,并且写了一些入门的介绍,直到19年clickhoue功能慢慢的丰富才又慢慢的关注,并且编写了同步程序,把mysql数据实时同步到clickhoue,并且最终在线上使用起来。 关于clickhouse是什么请自行查阅官网: https://clickhouse.yandex/ clickhouse官方性能测试: https://clickhouse.yandex/benchmark.html clickhouse面对海量数据,比如单表过百亿可以使用集群(复制+分片),如果数据量比较小,比如单表10-20亿使用单机就足以满足查询需求。如果使用复制需要使用zk,更多集群的请自行查阅官方资料。 单机部署(以前的文章也有写过单机部署) : 在2016年clickhouse刚开始开源的时候对Ubuntu支持非常友好,一个apt命令就可以安装了。对于centos等系统 支持就比较差,需要自己编译,而且不一定能够成功。随着使用人群的扩大,目前对于centos支持也是非常的友好 了,有rpm包可以直接安装

你需要的不是实时数仓 | 你需要的是一款强大的OLAP数据库(下)

六眼飞鱼酱① 提交于 2019-11-29 15:53:50
在上一章节中,我们讲到实时数仓的建设,互联网大数据技术发展到今天,各个领域基本已经成熟,有各式各样的解决方案可以供我们选择。 在实时数仓建设中,解决方案成熟,消息队列Kafka、Redis、Hbase鲜有敌手,几乎已成垄断之势。而OLAP的选择则制约整个实时数仓的能力。开源盛世的今天,可以供我们选择和使用的OLAP数据库令人眼花缭乱,这章我们选取了几个最常用的OLAP开源数据引擎进行分析,希望能给正在做技术选型和未来架构升级的你提供一些帮助。 本文给出了常用的开源OLAP引擎的性能测评: https://blog.csdn.net/oDaiLiDong/article/details/86570211 OLAP百家争鸣 OLAP简介 OLAP,也叫联机分析处理(Online Analytical Processing)系统,有的时候也叫DSS决策支持系统,就是我们说的数据仓库。与此相对的是OLTP(on-line transaction processing)联机事务处理系统。 联机分析处理 (OLAP) 的概念最早是由关系数据库之父E.F.Codd于1993年提出的。OLAP的提出引起了很大的反响,OLAP作为一类产品同联机事务处理 (OLTP) 明显区分开来。 Codd认为联机事务处理(OLTP)已不能满足终端用户对数据库查询分析的要求

ClickHouse

烈酒焚心 提交于 2019-11-27 21:03:02
clickhouse在上一篇转载的博客中已初步介绍,最近在公司项目中,遇到了数据库大量数据查询慢的问题,借此来实战clickhouse,本文重点介绍数据同步。 接下来重点讲一下,使用flume同步oracle数据至clickhouse。 安装flume 1. 下载flume wget http://www.apache.org/dist/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz 2. 解压安装 tar zxvf apache-flume-1.5.2-bin.tar.gz 打包flume-ng-sql-source 1. 在github上下载源码 https://github.com/keedio/flume-ng-sql-source 2. 本地编译打包 (1)为了能正确同步到clickhouse,需修改代码,如图: 将默认分隔符由‘,’改为‘\t’; (不改的话,插入数据到clickhouse会报错) (2)编译打包:mvn package -Dmaven.test.skip=true (3)将打包的jar包flume-ng-sql-source-1.5.2.jar,上传至flume的lib目录下。 打包flume-clickhouse-sink 这个相对麻烦一些,网上没有详细的资料,我这里尽量详细叙述,有问题可以联系我。 1.