I have a huge set of arbitrary natural language strings. For my tool to analyze them I need to convert each string to unique color value (RGB or other). I need color contras
You can use something like MinHash or some other LSH method and define similarity as intersection between sets of shingles measured by Jaccard coefficient.
There is a good description in Mining of Massive data sets, Ch.3 by Rajaraman and Ullman.