Which NoSQL database should I use for logging?

前端 未结 3 1143
走了就别回头了
走了就别回头了 2020-12-22 17:16

Do you have any experience logging to NoSQL databases for scalable apps? I have done some research on NoSQL databases for logging and found that MongoDB seems to be a good c

3条回答
  •  难免孤独
    2020-12-22 17:41

    I've seen a lot of companies are using MongoDB to store application logs. Its schema-freeness is really flexible for application logs, at which schema tends to change time-to-time. Also, its Capped Collection feature is really useful because it automatically purges old data to keep the data fit into the memory.

    People aggregates the logs by normal Grouping or MapReduce, but it's not that fast. Especially MongoDB's MapReduce only works within a single thread and its JavaScript execution overhead is huge. New aggregation framework could solve this problem.

    When you use MongoDB for logging, the concern is the lock contention by high write throughputs. Although MongoDB's insert is fire-and-forget style by default, calling a lot of insert() causes a heavy write lock contention. This could affect the application performance, and prevent the readers to aggregate / filter the stored logs.

    One solution might be using the log collector framework such as Fluentd, Logstash, or Flume. These daemons are supposed to be launched at every application nodes, and takes the logs from app processes.

    Fluentd plus MongoDB

    They buffer the logs and asynchronously writes out the data to other systems like MongoDB / PostgreSQL / etc. The write is done by batches, so it's a lot more efficient than writing directly from apps. This link describes how to put the logs into Fluentd from PHP program.

    • Fluentd: Data Import from PHP Applications

    Here's some tutorials about MongoDB + Fluentd.

    • Fluentd + MongoDB: The Easiest Way to Log Your Data Effectively on 10gen blog
    • Fluentd: Store Apache Logs into MongoDB

    MongoDB's problem is it starts slowing down when the data volume exceeds the memory size. At that point, you can switch to other solutions like Apache Hadoop or Cassandra. If you have a distributed logging layer mentioned above, you can instantly switch into another solution as you grow. This tutorial describes how to store logs to HDFS by using Fluentd.

    • Fluentd: Fluentd + HDFS: Instant Big Data Collection

提交回复
热议问题