问题
I'm slowly converting my existing code into Delphi 2010 and read several of the articles on Embarcaedro web site as well as Marco Cantú whitepaper.
There are still some things I haven't understood, so here are two functions to exemplify my question:
function RemoveSpace(InStr: string): string;
var
Ans : string;
I : Word;
L : Word;
TestChar: string[1];
begin
Ans := '';
L := Length(InStr);
if L > 0 then
begin
for I := 1 to L do
begin
TestChar := Copy(InStr, I, 1);
if TestChar <> ' ' then Ans := Ans + TestChar;
end;
end;
RemoveSpace := Ans;
end;
function ReplaceStr(const S, Srch, Replace: string): string;
var
I: Integer;
Source: string;
begin
Source := S;
Result := '';
repeat
I := Pos(Srch, Source);
if I > 0 then begin
Result := Result + Copy(Source, 1, I - 1) + Replace;
Source := Copy(Source, I + Length(Srch), MaxInt);
end
else Result := Result + Source;
until I <= 0;
end;
For the RemoveSpace function, if no unicode character is passed ('aa bb' for example), all is well. Now if I pass the text 'ab cd' then the function doesn't work as expected (I get ab??cd as the output).
How can I account for possible unicode characters on a string? using Length(InStr) is obviously incorrect as well as Copy(InStr, I, 1).
What's the best way of converting this code so that it accounts for unicode characters?
Thanks!
回答1:
If those were your REAL functions and you're just trying to get em working then :
function RemoveSpace(const InStr: string): string;
begin
Result := StringReplace(InStr, ' ', '', [rfReplaceAll]);
end;
function ReplaceStr(const S, Srch, Replace: string): string;
begin
Result := StringReplace(S, Srch, Replace, [rfReplaceAll, rfIgnoreCase]);
end;
回答2:
(we do not use D10, at the moment, so beware!)
The problem in Delphi is with string literals that contain characters outside the basic ascii-range. When they are passed to string routines, the non-ascii-characters are replaced with question marks.
To avoid this, cast the text literals to WideStrings before passing them as a parameter to the function.
I do not know whether it applies to the StringReplace-routine, but Delphi's search routine Pos/Posex does not handle Unicode correctly. We had to replace these routines with our own variant. For this improved routine it is important to make sure that the parameters are of the WideString type, not the normal string type.
We did this in D7 when handling Unicode, and all works well.
回答3:
Although string
is a Unicode type now, when you specify a length, you still get the non-Unicode ShortString
type. The TestChar
variable in your RemoveSpace
function is a non-Unicode one-character string. What you should have been using all along is a real Char
variable. I expect you came from the VB world, where one-character strings were the same as single characters. In Delphi, a string isn't the same as a character, so when you call Copy
, you get a string.
In Unicode Delphi, that one-character string gets reduced to a non-Unicode string, and if there's no representation for that character in the current code page, you get a question mark instead. Fix it like this:
function RemoveSpace(const InStr: string): string;
var
I: Integer;
TestChar: Char;
begin
Result := '';
for I := 1 to Length(InStr) do
begin
TestChar := InStr[I];
if TestChar <> ' ' then
Result := Result + TestChar;
end;
end;
I got rid of Ans
. As of Turbo Pascal 7, you can use the implicitly declared Result
variable instead of declaring your own and then assigning it to the function name. Result
is readable and writable. Also, you don't need to worry about zero-length input. When the upper bound of a "for-to" loop is less than the lower bound, the loop simply doesn't run, so you don't need to check beforehand. Finally, I used the bracket operators on InStr
to extract the character at the given index instead of getting a one-character-long string.
You say that your uses of Length
and Copy
are obviously incorrect, but you're wrong. Those functions continue to work just fine in Unicode. They know that Char
is two bytes wide now, so if you call them on UnicodeString
variables, you'll get the right characters. They also continue to work on AnsiString
variables. In fact, they also work find on WideString
variables, even in older Delphi versions.
The primary problem in your code was where you stored a Unicode character into a non-Unicode string type.
回答4:
Guessing from your problem description, you seem to process UTF8-encoded strings. That's almost always a bad idea. Decode them into a saner representation first, and then operate on them. When you're done, you can encode everything as UTF-8 again.
I think the datatype for wide-character strings is "WString" in Delphi; can't look it up right now.
回答5:
String[1] do not have unicode version
try Char instead.
来源:https://stackoverflow.com/questions/1531250/convert-function-to-delphi-2009-2010-unicode