I have two 3GB text files, each file has around 80 million lines. And they share 99.9% identical lines (file A has 60,000 unique lines, file B has 80,000 unique lines).
Python has difflib which claims to be quite competitive with other diff utilities see: http://docs.python.org/library/difflib.html