utf-16

How can I decode UTF-16 data in Perl when I don't know the byte order?

 ̄綄美尐妖づ 提交于 2019-11-29 09:38:11
问题 If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use Encode; open(my $file,"some.file") || die "error $!\n"; while(<$file>) { print decode("UTF-16",$_); } close($file); I get the following error: UTF-16:Unrecognised BOM d at F:/Perl/lib/Encode.pm line 174 How can I make it work with decode ? EDIT: here are the first

utf-16 file seeking in python. how?

Deadly 提交于 2019-11-29 09:14:05
For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code: f = codecs.open(ai_file, 'r', 'utf-16') seek = self.ai_map[self._cbClass.Text] #seek is valid int f.seek(seek) while True: ln = f.readline().strip() I tried random stuff like first reading something from stream, didnt help. I checked offset that is seeked to using hex editor - string starts at character, not null byte (i guess its good sign, right?) So how to seek utf-16 in python? Well, the error message is telling you why: it's not reading a byte order mark. The byte

Storing UTF-16/Unicode data in SQL Server

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-29 07:28:03
According to this , SQL Server 2K5 uses UCS-2 internally. It can store UTF-16 data in UCS-2 (with appropriate data types, nchar etc), however if there is a supplementary character this is stored as 2 UCS-2 characters. This brings the obvious issues with the string functions, namely that what is one character is treated as 2 by SQL Server. I am somewhat surprised that SQL Server is basically only able to handle UCS-2, and even more so that this is not fixed in SQL 2K8. I do appreciate that some of these characters may not be all that common. Aside from the functions suggested in the article,

Is there a standard technique for packing binary data into a UTF-16 string?

百般思念 提交于 2019-11-29 06:26:22
(In .NET) I have arbitrary binary data stored in in a byte[] (an image, for example). Now, I need to store that data in a string (a "Comment" field of a legacy API). Is there a standard technique for packing this binary data into a string ? By "packing" I mean that for any reasonably large and random data set, bytes.Length/2 is about the same as packed.Length ; because two bytes are more-or-less a single character. The two "obvious" answers don't meet all the criteria: string base64 = System.Convert.ToBase64String(bytes) doesn't make very efficient use of the string since it only uses 64

git gui - can it be made to display UTF16?

我与影子孤独终老i 提交于 2019-11-29 05:14:38
Is there any way to make git gui display and show diffs for UTF16 files somehow? I found some information , but this is mostly referring to the command line rather than the gui. I have been working on a much better solution with help from the msysGit people, and have come up with this clean/smudge filter. The filter uses the Gnu file and iconv commands to determine the type of the file, and convert it to and from msysGit's internal UTF-8 format. This type of Clean/Smudge Filter gives you much more flexibility. It should allow Git to treat your mixed-format files as UTF-8 text in most cases:

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

流过昼夜 提交于 2019-11-29 00:26:58
问题 I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem. The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML

Using unicode characters bigger than 2 bytes with .Net

放肆的年华 提交于 2019-11-28 23:36:12
I'm using this code to generate U+10FFFC var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC}); I know it's for private-use and such, but it does display a single character as I'd expect when displaying it. The problems come when manipulating this unicode character. If I later do this: foreach(var ch in s) { Console.WriteLine(ch); } Instead of it printing just the single character, it prints two characters (i.e. the string is apparently composed of two characters). If I alter my loop to add these characters back to an empty string like so: string tmp=""; foreach(var ch in s) {

Converting xml from UTF-16 to UTF-8 using PowerShell

耗尽温柔 提交于 2019-11-28 23:15:31
What's the easiest way to convert XML from UTF16 to a UTF8 encoded file? Ben Laan This may not be the most optimal, but it works. Simply load the xml and push it back out to a file. the xml heading is lost though, so this has to be re-added. $files = get-ChildItem "*.xml" foreach ( $file in $files ) { [System.Xml.XmlDocument]$doc = new-object System.Xml.XmlDocument; $doc.set_PreserveWhiteSpace( $true ); $doc.Load( $file ); $root = $doc.get_DocumentElement(); $xml = $root.get_outerXml(); $xml = '<?xml version="1.0" encoding="utf-8"?>' + $xml $newFile = $file.Name + ".new" Set-Content -Encoding

Is there any reason to prefer UTF-16 over UTF-8?

大城市里の小女人 提交于 2019-11-28 22:18:21
问题 Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well? EDIT: Meanwhile I've also found this answer, which seems relevant and has some

Storing UTF-8 string in a UnicodeString

落爺英雄遲暮 提交于 2019-11-28 21:39:10
问题 In Delphi 2007 you can store a UTF-8 string in a WideString and then pass that onto a Win32 function, e.g. var UnicodeStr: WideString; UTF8Str: WideString; begin UnicodeStr:='some unicode text'; UTF8Str:=UTF8Encode(UnicodeStr); Windows.SomeFunction(PWideChar(UTF8Str), ...) end; Delphi 2007 does not interfere with the contents of UTF8Str, i.e. it is left as a UTF-8 encoded string stored in a WideString. But in Delphi 2010 I'm struggling to find a way to do the same thing, i.e. store a UTF-8