I was reading this question on MD5 hash values and the accepted answer confuses me. One of the main properties, as I understand it, of a cryptopgraphic hash function is tha
1: The primary purpose of a hash is to map a very, very large space to a smaller but still very large space (e.g., MD5, which will take 'anything' and convert it into a space of size 2^128 -- big, but not nearly as big as aleph-0.)
In addition to other features, good hashes fill the destination space homogeneously. Bad hashes fill the space in a clumpy fashion, coming up with the same hash for many common inputs.
Imagine the idiotic hash function sum(), which just adds all the digits of the input number: it succeeds in mapping down, but there are a bunch of collisions (inputs with the same output, like 3 and 12 and 21) at the low end of the output space and the upper end of the space is nearly empty. As a result it makes very poor use of the space, is easy to crack, etc.
So a good hash that makes even use of the destination space will make it difficult to find two inputs with the same output, just by the odds: if MD5 were perfect, the odds that two inputs would have the same output would be 2^-128. That's pretty decent odds: the best you can do without resorting to a larger output space. (In truth MD5 isn't perfect, which is one of the things that makes it vulnerable.)
But it will still be true that a huge number of inputs will map to any given hash, because the input space is 'infinite', and dividing infinity by 2^128 still gives you infinity.
2: Yes, hashes always cause data loss, except in the case where your output space is the same as, or larger than, your input space -- and in that case you probably didn't need to hash!
3: For smaller inputs, best practice is to salt the input. Actually, that's good practice for any cryptographic hashing, because otherwise an attacker can feed you specific inputs and try to figure out which hash you are using. 'Salt' is just a set of additional information that you append (or prepend) to your input; you then hash the result.
edit: In cryptography, it is also important that the hash function is resistant to preimage attacks, intuitively, that is hard to guess the input for a given output even knowing many other input/output pairs. The "sum" function could probably be guessed rather easily (but since it destroys data still might not be easy to reverse).