Using hebrew on python

为君一笑 提交于 2019-12-23 17:55:28

问题


I have a problem printing hebrew words. i am using the counter module in order to count number of words in my given text (which is in hebrew). the counter indeed counts the words, and identifies the language because i am using # -*- coding: utf-8 -*-

The problem is, when i print my counter, i get weird symbols. (I am using eclipse) Here is the code and the printings:

# -*- coding: utf-8 -*-
import string
from collections import Counter
class classifier:
def __init__(self,filename):
    self.myFile = open(filename)
    self.cnt = Counter()

def generateList(self):
    exclude = set(string.punctuation)
    for lines in self.myFile:
        for word in lines.split():
            if word not in exclude:
                nWord = ""
                for letter in word:
                    if letter in exclude:
                        letter = ""
                        nWord += letter
                    else:
                        nWord += letter
                self.cnt[nWord]+=1
    print self.cnt

Printings:

Counter({'\xd7\x97\xd7\x94': 465, '\xd7\x96\xd7\x95': 432, '\xd7\xa1\xd7\x92\xd7\x95\xd7\xa8': 421, '\xd7\x94\xd7\x92\xd7\x91': 413})

Any idea on how to print the words in the right way?


回答1:


The "weird symbols" you are getting is python's way of representing unicode strings.

You need to decode them, for example:

>>>print '\xd7\x97\xd7\x94'.decode('UTF8')
חה



来源:https://stackoverflow.com/questions/18079690/using-hebrew-on-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!