Using Git I don\'t understand how using SHA you can generate just a 40 hexadecimal digit code that can then be mapped to any file which could be hundreds of lines long.
A SHA-1 hash is 160 bits long. That gives you 2160, or exactly
1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976
possible hashes.
Assuming hash values are more or less unpredictable, the odds of two files accidentally having the same hash are infinitesimal to the point that it's just not worth worrying about it.
Quoting from Scott Chacon's book "Pro Git":
However, you should be aware of how ridiculously unlikely this scenario is. The SHA–1 digest is 20 bytes or 160 bits. The number of randomly hashed objects needed to ensure a 50% probability of a single collision is about 280.
...
Here’s an example to give you an idea of what it would take to get a SHA–1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (1 million Git objects) and pushing it into one enormous Git repository, it would take 5 years until that repository contained enough objects to have a 50% probability of a single SHA–1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.
It's true that there must be two 21-byte files that have the same SHA-1 hash (since there are 2168 such files and only 2160 possible SHA-1 hashes). No such files have ever been discovered.
UPDATE : As of February 2017, two distinct PDF files with identical SHA-1 checksums have been generated, using a technique that's more than 100,000 times as fast as a brute force attack. Details here: https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
Linux Torvalds (the author of Git) has posted a (preliminary) response here: http://marc.info/?l=git&m=148787047422954