Can Bloom Filters in BigTable be used to filter based only on row ID?

落花浮王杯 提交于 2019-12-11 03:32:37

问题


BigTable uses Bloom filters to allow point reads to avoid accessing SSTables that do not contain any data within a given key-column pair. Can these Bloom filters also be used to avoid accessing SSTables if the query only specifies the row ID and no column ID?

BigTable uses row-column pairs as keys to insert into its bloom filters. This means that a query can use these filters for a point read that specifies a row-column pair.

Now, suppose we have a query to get all columns of a row based only on the row ID. As far as I can tell, this query does not know in advance what are the columns that belong to the row, and so it may not be able to use the bloom filters as it cannot enumerate the possible row-column pairs. As a result, such a query may not be able to use the bloom filters, and so it would be less efficient.

In theory, BigTable could already be addressing this problem by also inserting just the row ID into the bloom filters, but I can't tell if the current implementation does this or not.

This question may have importance for designing efficient queries to run on BigTable. Any hints would be wonderful.


回答1:


HBase Bloom filter does both row and row col checks. HBase was built based on BigTable paper, so most probably BigTable would be doing the same.

HBase Bloom Filter is a space-efficient mechanism to test whether a StoreFile contains a specific row or row-col cell.

Reference: https://learning.oreilly.com/library/view/hbase-administration-cookbook/9781849517140/ch09s11.html

The BigTable paper from 2006 however does mention only row-column based search using bloom filter.
https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf



来源:https://stackoverflow.com/questions/54280508/can-bloom-filters-in-bigtable-be-used-to-filter-based-only-on-row-id

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!