I don\'t know if this is the place to ask about algorithms. But let\'s see if I get any answers ... :)
If anything is unclear I\'m very happy to clarify things.
At a glance, it sounds like you've implemented a Patricia Trie. This approach also is called path compression in some of the literature. There should be copies of that paper that aren't behind the ACM paywall, which will include an insertion algorithm.
There's also another compression method you may want to look at: level compression. The idea behind path compression is to replace strings of single child nodes with a single super node that has a "skip" count. The idea behind level compression is to replace full or nearly full subtrees with a super node with a "degree" count that says how many digits of the key the node decodes. There's also a 3rd approach called width compression, but I'm afraid my memory fails me and I couldn't find a description of it with quick googling.
Level compression can shorten the average path considerably, but insertion and removal algorithms get quite complicated as they need to manage the trie nodes as similarly to dynamic arrays. For the right data sets, level compressed trees can be fast. From what I remember, they're the 2nd fastest approach for storing IP routing tables, the fastest is some sort of hash trie.