Using an HDFS Sink and rollInterval in Flume-ng to batch up 90 seconds of log information

前端 未结 2 1783
轮回少年
轮回少年 2021-02-06 16:04

I am trying to use Flume-ng to grab 90 seconds of log information and put it into a file in HDFS. I have flume working to look at the log file via an exec and tail however it i

2条回答
  •  一个人的身影
    2021-02-06 16:38

    According to the source code of org.apache.flume.sink.hdfs.BucketWriter:

     /**
     * Internal API intended for HDFSSink use.
     * This class does file rolling and handles file formats and serialization.
     * Only the public methods in this class are thread safe.
     */
    class BucketWriter {
      ...
      /**
       * open() is called by append()
       * @throws IOException
       * @throws InterruptedException
       */
      private void open() throws IOException, InterruptedException {
        ...
        // if time-based rolling is enabled, schedule the roll
        if (rollInterval > 0) {
          Callable action = new Callable() {
            public Void call() throws Exception {
              LOG.debug("Rolling file ({}): Roll scheduled after {} sec elapsed.",
                  bucketPath, rollInterval);
              try {
                // Roll the file and remove reference from sfWriters map.
                close(true);
              } catch(Throwable t) {
                LOG.error("Unexpected error", t);
              }
              return null;
            }
          };
          timedRollFuture = timedRollerPool.schedule(action, rollInterval,
              TimeUnit.SECONDS);
        }
        ...
      }
      ...
       /**
       * check if time to rotate the file
       */
      private boolean shouldRotate() {
        boolean doRotate = false;
    
        if (writer.isUnderReplicated()) {
          this.isUnderReplicated = true;
          doRotate = true;
        } else {
          this.isUnderReplicated = false;
        }
    
        if ((rollCount > 0) && (rollCount <= eventCounter)) {
          LOG.debug("rolling: rollCount: {}, events: {}", rollCount, eventCounter);
          doRotate = true;
        }
    
        if ((rollSize > 0) && (rollSize <= processSize)) {
          LOG.debug("rolling: rollSize: {}, bytes: {}", rollSize, processSize);
          doRotate = true;
        }
    
        return doRotate;
      }
    ...
    }
    

    and org.apache.flume.sink.hdfs.AbstractHDFSWriter

    public abstract class AbstractHDFSWriter implements HDFSWriter {
    ...
      @Override
      public boolean isUnderReplicated() {
        try {
          int numBlocks = getNumCurrentReplicas();
          if (numBlocks == -1) {
            return false;
          }
          int desiredBlocks;
          if (configuredMinReplicas != null) {
            desiredBlocks = configuredMinReplicas;
          } else {
            desiredBlocks = getFsDesiredReplication();
          }
          return numBlocks < desiredBlocks;
        } catch (IllegalAccessException e) {
          logger.error("Unexpected error while checking replication factor", e);
        } catch (InvocationTargetException e) {
          logger.error("Unexpected error while checking replication factor", e);
        } catch (IllegalArgumentException e) {
          logger.error("Unexpected error while checking replication factor", e);
        }
        return false;
      }
    ...
    }
    

    the rolling of hdfs files is controlled by 4 conditions:

    1. hdfs.rollSize
    2. hdfs.rollCount
    3. hdfs.minBlockReplicas(highest priority, but usually not the reason causing rolling small file)
    4. hdfs.rollInterval

    Change the values accoding to these if-segments in BucketWriter.class

提交回复
热议问题