Set locale encoding in python

倾然丶 夕夏残阳落幕 提交于 2019-12-10 23:56:29

问题


I'm calling a java program from my python code in the following way:

subprocess.check_output(["java", "-classpath", "/Users/feralvam/Programas/semanticvectors-3.4/semanticvectors-3.4.jar:/Users/feralvam/Programas/lucene-3.5.0/lucene-core-3.5.0.jar:/Users/feralvam/Programas/lucene-3.5.0/contrib/demo/lucene-demo-3.5.0.jar:", "pitt.search.semanticvectors.CompareTerms", "-queryvectorfile","/Users/feralvam/termvectors.bin",term1,term2])

"term1" and "term2" are strings read from a text file that is in UTF-8 encoding.

When I run this command from PyDev (version 2.5 in Eclipse 3.7.2) I get the following output: (here, "term1" = "Eles" and "term2" = "é")

Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: Opened query vector store from file: /Users/feralvam/termvectors.bin
Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: Couldn't open Lucene index at 
Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: No Lucene index for query term weighting, so all query terms will have same weight.
Didn't find vector for 'Eles'
No vector for 'Eles'
Didn't find vector for '??'
No vector for '??'
Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: Outputting similarity of "Eles" with "??" ...

But if I run the same command from the terminal, I get:

Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: Opened query vector store from file: /Users/feralvam/termvectors.bin
Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: Couldn't open Lucene index at 
Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: No Lucene index for query term weighting, so all query terms will have same weight.
Didn't find vector for 'Eles'
No vector for 'Eles'
Found vector for 'é'
Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: Outputting similarity of "Eles" with "é" ...

Leaving aside how SemanticVector works, the problem is that in the second case "term2" is passed with the correct encoding, but that doesn't happen in the first case.

Now, using this command:

print locale.getpreferredencoding(), sys.getdefaultencoding()

I get the following information: US-ASCII utf-8 (in PyDev) and UTF-8 ascii (in terminal)

So what I think is happening is that it's using the US-ASCII encoding for passing the arguments and, therefore, the result is wrong because the words don't have the proper encoding. By the way, I'm using python 2.7.

Is there any way to change this?

Thanks in advance of any help you could give.


回答1:


You can pass the locale name in the LANG environment variable when you starts the process. Make something like:

env = os.environ.copy()
env['LANG'] = 'en_US.UTF-8'
subprocess.check_output( ..., env = env)


来源:https://stackoverflow.com/questions/11205571/set-locale-encoding-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!