Fluentd is not filtering as intended before writing to Elasticsearch

问题

Using:

Elasticsearch 7.5.1.
Fluentd 1.11.2
Fluent-plugin-elasticsearch 4.1.3
Springboot 2.3.3

I have a Springboot artifact with Logback configured with an appender that, in addition to the app STDOUT, sends logs to Fluentd:

<appender name="FLUENT_TEXT"
          class="ch.qos.logback.more.appenders.DataFluentAppender">
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
        <level>INFO</level>
    </filter>
    
    <tag>myapp</tag>
    <label>myservicename</label>
    <remoteHost>fluentdservicename</remoteHost>
    <port>24224</port>
    <useEventTime>false</useEventTime>
</appender>

Fluentd config file looks like this:

<ROOT>
  <source>
    @type forward
    port 24224
    bind "0.0.0.0"
  </source>

  <filter myapp.**>
    @type parser
    key_name "message"
    reserve_data true
    remove_key_name_field false
    <parse>
      @type "json"
    </parse>
  </filter>

  <match myapp.**>
    @type copy
    <store>
      @type "elasticsearch"
      host "elasticdb"
      port 9200
      logstash_format true
      logstash_prefix "applogs"
      logstash_dateformat "%Y%m%d"
      include_tag_key true
      type_name "app_log"
      tag_key "@log_name"
      flush_interval 1s
      user "elastic"
      password xxxxxx
      <buffer>
        flush_interval 1s
      </buffer>
    </store>
    <store>
      @type "stdout"
    </store>
  </match>
</ROOT>

So it just adds a filter to parse the information (a Json string) to a structured way and then writes it to Elasticsearch (as well as to Fluentd's STDOUT). Check how I add the myapp.** regexp to make it match in the filter and in the match blocks.

Everyting is up and running properly in Openshift. Springboot sends properly the logs to Fluentd, and Fluentd writes in Elasticsearch.

But the problem is that every log generated from the app is also written. This means that every INFO log with, for example, the initial Spring configuration or any other information that the app sends to through Logback is also written.

Example of "wanted" log:

2020-11-04 06:33:42.312840352 +0000 myapp.myservice: {"traceId":"bf8195d9-16dd-4e58-a0aa-413d89a1eca9","spanId":"f597f7ffbe722fa7","spanExportable":"false","X-Span-Export":"false","level":"INFO","X-B3-SpanId":"f597f7ffbe722fa7","idOrq":"bf8195d9-16dd-4e58-a0aa-413d89a1eca9","logger":"es.organization.project.myapp.commons.services.impl.LoggerServiceImpl","X-B3-TraceId":"f597f7ffbe722fa7","thread":"http-nio-8085-exec-1","message":"{\"traceId\":\"bf8195d9-16dd-4e58-a0aa-413d89a1eca9\",\"inout\":\"IN\",\"startTime\":1604471622281,\"finishTime\":null,\"executionTime\":null,\"entrySize\":5494.0,\"exitSize\":null,\"differenceSize\":null,\"user\":\"pmmartin\",\"methodPath\":\"Method Path\",\"errorMessage\":null,\"className\":\"CamelOrchestrator\",\"methodName\":\"preauthorization_validate\"}","idOp":"","inout":"IN","startTime":1604471622281,"finishTime":null,"executionTime":null,"entrySize":5494.0,"exitSize":null,"differenceSize":null,"user":"pmmartin","methodPath":"Method Path","errorMessage":null,"className":"CamelOrchestrator","methodName":"preauthorization_validate"}

Example of "unwanted" logs (check how there is a Fluentd warning per each unexpected log message):

2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.InternalRouteStartupManager","thread":"restartedMain","message":"Route: route6 started and consuming from: servlet:/preAuth"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Total 20 routes, of which 20 are started'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"org.apache.camel.impl.engine.AbstractCamelContext", "thread"=>"restartedMain", "message"=>"Total 20 routes, of which 20 are started"}
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.AbstractCamelContext","thread":"restartedMain","message":"Total 20 routes, of which 20 are started"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"org.apache.camel.impl.engine.AbstractCamelContext", "thread"=>"restartedMain", "message"=>"Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds"}
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.AbstractCamelContext","thread":"restartedMain","message":"Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Started MyServiceApplication in 15.446 seconds (JVM running for 346.061)'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"es.organization.project.myapp.MyService", "thread"=>"restartedMain", "message"=>"Started MyService in 15.446 seconds (JVM running for 346.061)"}

The question is: What and how do I tell Fluentd to really filter the info that gets to it so the unwanted info gets discarded?

回答1:

Thanks to @Azeem, and according to grep and regexp features documentation, I got it :).

I just added this to my Fluentd config file:

<filter onpay.**>
  @type grep
  <regexp>
    key message
    pattern /^.*inout.*$/
  </regexp>
</filter>

Any line that does not contain the word "inout" is now excluded.

来源：https://stackoverflow.com/questions/64669749/fluentd-is-not-filtering-as-intended-before-writing-to-elasticsearch

标签

spring-boot

ElasticSearch

logback

fluentd