We have to write the nodes of a binary tree to a file. What is the most space efficient way of writing a binary tree . We can store it in array format with parent in positi
Since only 2ⁿ-1 items can be stored without any space wasted, you'll have to split the N elements into subtrees, stored sequentially, and keeping a “master index” of the values between trees to decide in which tree to search in. An obvious way to decide which will be the subtrees, is to take advantage of the binary format of a number, and here I'm talking about N, the count of items in the tree.
Let's assume you have a sorted sequence of 20 distinct items:
['aeu', 'bfz', 'cdi', 'dfc', 'eap', 'ggk', 'gsb', 'guj', 'idm', 'ieg',
'izr', 'pba', 'plp', 'rap', 'rhp', 'tat', 'uho', 'uwb', 'wdf', 'yhp']
20₁₀ is 10100₂ and the 1-bits decide the splits.
The first entry in your master table will be item[15], 0 (15₁₀≡01111₂ one less than the largest power of 2 less than or equal to N, and 0 being the length of your output list so far aka the start index for the subtree ). The previous 15 (0–14) items will be appended to your output list, again by taking advantage of binary arithmetic: item[7₁₀≡0111₂] at 0, item[3₁₀≡0011₂] at 1, item[11₁₀≡1001₂] at 2 and so on, thus preserving the rule that item[i] < item[2*i] && item[i] > item[2*i+1].
After this first step:
master_table = [('tat', 0)]
output_list = ['guj', 'dfc', 'pba', 'bfz', 'ggk', 'ieg', 'rap', 'aeu', 'cdi', 'eap', 'gsb', 'idm', 'izr', 'plp', 'rhp']
If N == 2ⁿ then this python generator produces the desired order:
def heap_order(pow2):
"""Return the tree order of items
pow2 MUST be a power of two"""
bit= pow2>>1
while bit:
yield from range(bit - 1, pow2, bit*2)
bit>>= 1
Delete (or ignore) the first 16 items from your list and repeat with the next 1-bit in the list length:
The remaining input list is ['uho', 'uwb', 'wdf', 'yhp'], and its length is 4₁₀==100₂. Largest 2ⁿ-1 is 3, so you append zero-based item 3 along with the current length of the output list to your master table, and then append the previous 3 items to your output list resulting in:
master_table = [('tat', 0), ('yhp', 15)]
# master table items: (key, subtree_base_index)
output_list = [
'guj', 'dfc', 'pba', 'bfz', 'ggk',
'ieg', 'rap', 'aeu', 'cdi', 'eap',
'gsb', 'idm', 'izr', 'plp', 'rhp',
'uwb', 'uho', 'wdf'
]
The whole result is equivalent to a binary tree where all left subtrees have either leaf nodes or tree nodes with both children.
To search for a value: scan the master table to find the key larger than your search key, and search your key in the corresponding subtree, always using index - subtree_base_index adjustments in your 2*index, 2*index-1 calculations. The maximum total (master table + subtree) count of comparisons needed for failed search is usually ⌈log₂N⌉, and sometimes —when the search key is larger than the first entry of the master table— less than that (no subtrees are created for 0-bits of N).
You should also store the level count of each subtree in the master table (so that you know after how many searches you should stop searching the subtree).