Are there any good, open source engines out there for detecting what language a text is in, perhaps with a probability metric? One that I can run locally and doesn\'t query
I don't think you need anything very sophisticated - for example to detect if a document is in English, with a pretty high level of certainty, simply test if it contains the N most common English words - something like:
"the a an is to are in on in it"
If it contains all of those, I would say it is almost definitely English.