背景:
公司的各个微服务在逐步接入ES APM 这个监控体系,但是metrics写入量较大(每个metrics的长度很小,但是频率很高),通过logstash往ES写数据时候频繁报写入队列已满,写入拒绝,运维侧需要对ES做写入优化。
优化措施
1、调整ES的索引持久化参数
主要是调整下面4个参数:
"index.translog.durability" : "async",
"index.translog.flush_threshold_size" : "512mb",
"index.translog.sync_interval" : "120s",
"index.refresh_interval" : "120s"
index.refresh_interval 这个参数需要特别注意,如果你们公司对索引实时性要求很高,就不要像我上面这样设置了(默认这个参数是1秒钟,建议允许的话,改大点,日志系统可以建议设置到60s,能大幅提升性能)。
具体含义不再赘述,参考官方文档。
需要说明的是,这个动作是索引级别的,因此我们每当有新索引创建完成后都要执行这个操作,不然对新索引是不会生效的。
所以,还需要有个update_settings.sh 的脚本,来定期操作。
# 调整 es的索引的写入参数,牺牲持久性来换取高写入性能
curl -s -HContent-Type:application/json --user elastic:'xxxxxx' -XPUT http://1.2.3.4:9200/_all/_settings?preserve_existing=true -d '
{
"index.translog.durability" : "async",
"index.translog.flush_threshold_size" : "512mb",
"index.translog.sync_interval" : "120s",
"index.refresh_interval" : "120s"
}
' | jq .
2、调整logstash运行参数
主要调整如下3个参数:
pipeline.workers: 8
pipeline.batch.size: 4000
pipeline.batch.delay: 50
logstash的配置文件放到configmap里面,如下 cat logstash-configmap.yaml :
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-configmap
data:
logstash.yml: |
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
http.port: 9600-9700
log.level: info
path.logs: /var/log/logstash
pipeline.workers: 8
pipeline.batch.size: 4000
pipeline.batch.delay: 50
logstash.conf: |
input {
kafka {
group_id => "logstash-kafka-apm-new"
bootstrap_servers => "10.10.1.14:9092,10.10.1.13:9092,10.10.1.12:9092"
topics => ["elastic-apm"]
auto_offset_reset => "latest"
max_partition_fetch_bytes => "10485760"
codec => "json"
}
}
output {
elasticsearch {
hosts => ["1.2.3.4:9200"]
manage_template => false
index => "apm-7.4.0-%{[processor][event]}-%{+YYYY.MM.dd}"
user => elastic
password => "xxxxxxxxxxxx"
}
}
deployment 配置如下 cat logstash-7.4-apm-deployment.yaml :
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
generation: 1
labels:
app: logstash-apm-prod
name: logstash-apm-prod
namespace: logging
spec:
replicas: 6
selector:
matchLabels:
app: logstash-apm-prod
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: logstash-apm-prod
spec:
containers:
- command:
- /usr/share/logstash/bin/logstash
image: logstash:7.4.0
imagePullPolicy: IfNotPresent
name: logstash
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: config-volume
mountPath: /usr/share/logstash/config
- name: logstash-pipeline-volume
mountPath: /usr/share/logstash/pipeline
hostAliases:
- ip: "10.10.1.12"
hostnames:
- "kafka-01"
- ip: "10.10.1.13"
hostnames:
- "kafka-02"
- ip: "10.10.1.14"
hostnames:
- "kafka-03"
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
volumes:
- name: config-volume
configMap:
name: logstash-configmap
items:
- key: logstash.yml
path: logstash.yml
- name: logstash-pipeline-volume
configMap:
name: logstash-configmap
items:
- key: logstash.conf
path: logstash.conf
性能
硬件配置:
5台 8C32G ES -普通SSD磁盘
调整后,ES写入性能有大幅提升。
日常消费:ES消费能力大约是110w每分钟。
极限测试:通过开12个logstash来消费测试,索引ES的写入峰值能达到220w左右每分钟,此时logstash侧有bulk写入报错,提示ES write queue full。
来源:oschina
链接:https://my.oschina.net/u/4360870/blog/4683749