HBase - What's the difference between WAL and MemStore?

我的未来我决定 提交于 2019-12-20 21:53:33

问题


I am trying to understand the HBase architecture. I can see two different terms are used for same purpose.

Write Ahead Logs and Memstore, both are used to store new data that hasn't yet been persisted to permanent storage.

What's the difference between WAL and MemStore?

Update:

WAL - is used to recover not-yet-persisted data in case a server crashes. MemStore - stores updates in memory as Sorted Keyvalue.

It seems lot of duplication of data before writing the data to Disk.


回答1:


WAL is for recovery NOT for data duplication.(further see my answer here)

Pls go through below to understand more...

  • A Hbase Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.

  • The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.

  • With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck.

  • WAL can be disabled to improve performance bottleneck. This is done by calling the Hbase client field

Mutation.writeToWAL(false)

General Note : Its general practice that while doing bulkloading data, WAL is disabled to get speed. But side effect is if you disable WAL you cant get back data to replay if in case any memory crashes.

More over if you use solr+ HBASE + LILY, i.e LILY Morphiline NRT indexes with hbase then it will work on WAL if you disable WAL for performance reasons, then Solr NRT indexing wont work. since Lily works on WAL.

please have a look at Hbase architecture section



来源:https://stackoverflow.com/questions/40067933/hbase-whats-the-difference-between-wal-and-memstore

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!