How do I convert filenames from unicode to ascii

匿名 (未验证) 提交于 2019-12-03 08:52:47

问题:

I have a bunch of music files on a NTFS partition mounted on linux that have filenames with unicode characters. I'm having trouble writing a script to rename the files so that all of the file names use only ASCII characters. I think that using the iconv command should work, but I'm having trouble escaping the characters for the 'mv' command.

EDIT: It doesn't matter if there isn't a direct translieration for the unicode chars. I guess that i'll just replace those with a "?" character.

回答1:

I don't think iconv has any character replacement facilities. This in Python might help:

#!/usr/bin/python import sys  def unistrip(s):     if isinstance(s, str):         s = s.decode('utf-8')     chars = []     for i in s:         if ord(i) > 0x7f:             chars.append(u'?')         else:             chars.append(i)     return u''.join(chars)  if __name__ == '__main__':     print unistrip(sys.argv[1]) 

Then call as:

Also:

You might test it a bit first. For large move operations, generating a list of mv commands (ie, write code to write a script) is advisable, as you can look over the move commands before telling them to execute.



回答2:

Sometimes mv will not be able to read the filename in a shell, so you can try the inode reference.

To get the inode of a file:

$ ls -il

Output will be something like this:

Then use find to get your file and perhaps using the python code by Thanatos:

$ find . -inum 9340480 -exec ./unistrip.py {} \;

You could also use the above command with iconv in a shell.

Hope this helps someone out, and excuse me for any mistakes[first answer].



回答3:

convmv is a good Perl script to convert file name encodings. But it can't handle characters that aren't in the destination encoding.

You can change any character not in ASCII to '?' using the rename utility distributed with Perl:

rename 's/[^ -~]/?/g' * 

Unfortunately this replaces multi-byte characters with multiple '?'s. Depending on the Unicode encoding that is used and the characters involved changing the regex may help, e.g.

rename 's/[^ -~]{2}/?/g' * 

for 2-byte characters.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!