Logstash Grok Pattern vs Python Regex?

只愿长相守 提交于 2019-12-13 00:45:43

问题


I am trying to configure logstash to manage my various log sources, one of which is Mongrel2. The format used by Mongrel2 is tnetstring, where a log message will take the form

86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]

I want to write my own grok patterns to extract certain fields from the above format. I started by testing my regex on the above message here, the regex is

^(?:[^:]*\:){2}([^,]*)

this matches localhost. When I use the same regex as a grok pattern in the form

TEST ^(?:[^:]*\:){2}([^,]*)
MONGREL %{TEST:test}

and configure logstash with

filter {
  grok {
    match => [ "message", "%{MONGREL}" ]
  }
}

the same regex results in the match 86:9:localhost. I can't figure out where I am going wrong? Is is that the regex engine I was using to test is based on Python but the grok filter regex is based on Onigurama?

Currently testing it in grokdebug with the following input

86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]

and the following pattern

(?<hostname>^(?:[^:]*\:){2}([^,]*))

resulting in

{
  "hostname": [
    [
      "86:9:localhost"
    ]
  ]
}

where I want

{
  "hostname": [
    [
      "localhost"
    ]
  ]
}

回答1:


A pattern like this will extract the host name:

^(\d+)?:(\d+)?:(?<hostname>[^,]+),

Or writing it in a similar manner that you already wrote it:

^(?:[^:]*\:){2}(?<hostname>[^,]*)

The capture name needs to be inside the parenthesis that you want to capture... your pattern was capturing everything up to that point.




回答2:


Give http://grokdebug.herokuapp.com/ a try. This is the best way to debug grok patterns that don't result in hair loss.



来源:https://stackoverflow.com/questions/26013516/logstash-grok-pattern-vs-python-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!