【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>>
马上十一了,吾等无心工作,一心只想尽快的为祖国母亲庆生,等待之余放出来一个我解决logstash日期filter的实践。
使用logstash @timestamp 取出来的日志格式为UTC时间,也就是说比中国的用户早了8个小时,这样导致我们在查询的时候的时候不能按照我们我们自己时间进行查询,还得做把这个时间减去8个小时。带来了很大的不便,尝试了设定时区依然无法更改这个日期,所以只能自己通过其他方式diy了。
既然这样,采用不使用它的@timestamp的方法,那就自己新增字段,曲线救国。
nginx的日志配置格式为:
log_format access ‘$remote_addr – $remote_user [$time_local] "$request" "$status $body_bytes_sent "$http_referer" '"$http_user_agent" $http_x_forwarded_for';
首先放出来一条常见的nginx日志记录
127.0.0.1 - - [30/Sep/2016:14:18:33 +0800] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"
看到了吧,30/Sep/2016:14:18:33 +0800 这一段就是访问的真正时间,也就是$time_local 这个字段匹配出来的,这个格式是ISO8601格式,但是我们常用的是yyyy-MM-dd格式的,nginx的日志格式又没法变更,如果想变更的话,只能修改nginx的源码,重新打包nginx安装,比较麻烦。既然这样,我们就手工的去解析这个字符串,然后组装成我们需要的格式,再进行输出出去。
首先增加三个字段,年月日均指向%{timestamp},
add_field => {"access_year" => "%{timestamp}"}
add_field => {"access_month" => "%{timestamp}"}
add_field => {"access_day" => "%{timestamp}"}
这三个字段可以在任何的filter中,但是要在grok filter下面,例如我放在了urldecode中,如下
urldecode {
add_field => {"access_year" => "%{timestamp}"}
add_field => {"access_month" => "%{timestamp}"}
add_field => {"access_day" => "%{timestamp}"}
all_fields => true
}
然后通过定义正则表达式,分别把年月日匹配出来
mutate{
gsub =>[
"access_year","[\W\w]*/|:[\s\S]*",""
]
gsub => [
"access_month","[(\d+/)|(/\d+)]|:[\s\S]*",""
]
gsub =>[
"access_day","/[\s\S]*|:[\s\S]*",""
]
}
实际上以上正则表达式也就是字符串30/Sep/2016:14:18:33 +0800中的年月日匹配出来
匹配出年
匹配出月
匹配出日
但是这时候我们匹配出来的月份是用英文表示的而不是数字,可以通过translate来进行转换
translate{
exact => true
regex => true
dictionary => [
"Jan","01",
"Feb","02",
"Mar","03",
"Apr","04",
"May","05",
"Jun","06",
"Jul","07",
"Aug","08",
"Sep","09",
"Oct","10",
"Nov","11",
"Dec","12"
]
field => "access_month"
destination => "access_month_temp"
}
最后就可以增加一个我们最终显示的字段,通过把以上临时字段进行任意的组装
alter{
add_field => {"access_date"=>"%{access_year}-%{access_month_temp}-%{access_day}"}
remove_field=>["access_year","access_month","access_day","access_month_temp","bytes","ident","auth"]
remove_tag=>["tags"]
}
上述增加了一个access_date字段,这个字段出来的格式就是yyyy-MM-dd的,然后通过remove_fileld把中间的临时字段都给删除掉,这样通过logstash添加到redis或者mongodb中的access_date字段就是我们想要的格式了。这个格式可以根据我们的需求随便定义和拼装。
我使用的部分完整文件如下
input { stdin { }
file {
path => "/usr/local/nginx/logs/gateway_access.log"
start_position => beginning
}
}
filter{
grok { #通过GROK来自动解析APACHE日志格式
match => { "message" => "%{COMMONAPACHELOG}" }
}
#date{
# locale => "en"
# timezone => "Asia/Shanghai"
# match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
#}
kv {
source => "request"
field_split => "&?"
value_split => "="
}
urldecode {
add_field => {"access_year" => "%{timestamp}"}
add_field => {"access_month" => "%{timestamp}"}
add_field => {"access_day" => "%{timestamp}"}
all_fields => true
}
mutate{
gsub =>[
"access_year","[\W\w]*/|:[\s\S]*",""
]
gsub => [
"access_month","[(\d+/)|(/\d+)]|:[\s\S]*",""
]
gsub =>[
"access_day","/[\s\S]*|:[\s\S]*",""
]
}
translate{
exact => true
regex => true
dictionary => [
"Jan","01",
"Feb","02",
"Mar","03",
"Apr","04",
"May","05",
"Jun","06",
"Jul","07",
"Aug","08",
"Sep","09",
"Oct","10",
"Nov","11",
"Dec","12"
]
field => "access_month"
destination => "access_month_temp"
}
alter{
add_field => {"access_date"=>"%{access_year}-%{access_month_temp}-%{access_day}"}
remove_field=>["access_year","access_month","access_day","access_month_temp","bytes","ident","auth"]
remove_tag=>["tags"]
}
}
output {
stdout { codec => rubydebug }
mongodb {
collection => "pagelog"
database => "statistics"
uri => "mongodb://192.168.1.52:27017"
}
}
来源:oschina
链接:https://my.oschina.net/u/2457218/blog/753799