Here\'s a very simple way to build an suffix array from a string in python:
def sort_offsets(a, b):
return cmp(content[a:], content[b:])
content = \"foobar
+1 for a very interesting problem! I can't see any obvious way to do this directly, but I was able to get a significant speedup (an order of magnitude for 100000 character strings) by using the following comparison function in place of yours:
def compare_offsets2(a, b):
return (cmp(content[a:a+10], content[b:b+10]) or
cmp(content[a:], content[b:]))
In other words, start by comparing the first 10 characters of each suffix; only if the result of that comparison is 0, indicating that you've got a match for the first 10 characters, do you go on to compare the entire suffices.
Obviously 10 could be anything: experiment to find the best value.
This comparison function is also a nice example of something that isn't easily replaced with a key function.