What are TCHAR strings and the 'A' or 'W' version of Win32 API functions? [duplicate]

天大地大妈咪最大 提交于 2019-12-01 08:35:39

问题


What are TCHAR strings, such as LPTSTR and LPCTSTR and how can I work with these? When I create a new project in Visual Studio it creates this code for me:

#include <tchar.h>

int _tmain(int argc, _TCHAR* argv[])
{
   return 0;
}

How can I, for instance, concatenate all the command line arguments?

If I'd want to open a file with the name given by the first command line argument, how can I do this? The Windows API defines 'A' and 'W' versions of many of its functions, such as CreateFile, CreateFileA and CreateFileW; so how do these differ from one another and which one should I use?


回答1:


Let me start off by saying that you should preferably not use TCHAR for new Windows projects and instead directly use Unicode. On to the actual answer:

Character Sets

The first thing we need to understand is how character sets work in Visual Studio. The project property page has an option to select the character set used:

  • Not Set
  • Use Unicode Character Set
  • Use Multi-Byte Character Set

Depending on which of the three option you choose, a lot of definitions change to accommodate the selected character set. There are three main classes: strings, string routines from tchar.h, and API functions:

  • 'Not Set' corresponds to TCHAR = char using ANSI encoding, where you use the standard 8-bit code page of the system for strings. All tchar.h string routines use the basic char versions. All API functions that work with strings will use the 'A' version of the API function.
  • 'Unicode' corresponds to TCHAR = wchar_t using UTF-16 encoding. All tchar.h string routines use the wchar_t versions. All API functions that work with strings will use the 'W' version of the API function.
  • 'Multi-Byte' corresponds to TCHAR = char, using some multi-byte encoding scheme. All tchar.h string routines use the multi-byte character set versions. All API functions that work with strings will use the 'A' version of the API function.

Related reading: About the "Character set" option in visual studio 2010

TCHAR.h header

The tchar.h header is a helper for using generic names for the C string operations on strings, that switch to the correct function for the given character set. For instance, _tcscat will switch to either strcat (not set), wcscat (unicode), or _mbscat (mbcs). _tcslen will switch to either strlen (not set), wcslen (unicode), or strlen (mbcs).

The switch happens by defining all _txxx symbols as macro's that evaluate to the correct function, depending on the compiler switches.

The idea behind it is that you can use the encoding-agnostic types TCHAR (or _TCHAR) and the encoding-agnostic functions that work on them, from tchar.h, instead of the regular string functions from string.h.

Similarly, _tmain is defined to be either main or wmain. See also: What is the difference between _tmain() and main() in C++?

A helper macro _T(..) is defined for getting string literals of the correct type, either "regular literals" or L"wchar_t literals".

See the caveats mentioned here: Is TCHAR still relevant? -- dan04's answer

_tmain example

For the example of main in the question, the following code concatenates all the strings passed as command line arguments into one.

int _tmain(int argc, _TCHAR *argv[])
{
   TCHAR szCommandLine[1024];

   if (argc < 2) return 0;

   _tcscpy(szCommandLine, argv[1]);
   for (int i = 2; i < argc; ++i)
   {
       _tcscat(szCommandLine, _T(" "));
       _tcscat(szCommandLine, argv[i]);
   }

   /* szCommandLine now contains the command line arguments */

   return 0;
}

(Error checking is omitted) This code works for all three cases of the character set, because everywhere we used TCHAR, the tchar.h string functions and _T for string literals. Forgetting to surround your string literals with _T(..) is a common source of compiler errors when writing such TCHAR-programs. If we had not done all these things, then switching character sets would cause the code to either not compile, or worse, compile but misbehave during runtime.

Windows API functions

Windows API functions that work on strings, such as CreateFile and GetCurrentDirectory, are implemented in the Windows headers as macro's that, like the tchar.h macro's, switch to either the 'A' version or 'W' version. For instance, CreateFile is a macro that is defined to CreateFileA for ANSI and MBCS, and to CreateFileW for Unicode.

Whenever you use the flat form (without 'A' or 'W') in your code, the actual function called will switch depending on the selected character set. You can force the use of a particular version by using the explicit 'A' or 'W' names.

The conclusion is that you should always use the unqualified name, unless you want to always refer to a specific version, independently of the character set option.

For the example in the question, where we want to open the file given by the first argument:

int _tmain(int argc, _TCHAR *argv[])
{  
   if (argc < 2) return 1;

   HANDLE hFile = CreateFile(argv[1], GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);

   /* Read from file and do other stuff */
   ...

   CloseHandle(hFile);

   return 0;
}

(Error checking is omitted) Note that for this example, nowhere we needed to use any of the TCHAR specific stuff, because the macro definitions have already taken care of this for us.

Utilising C++ strings

We've seen how we can use the tchar.h routines to use C style string operations to work with TCHARs, but it would be nice if we could leverage C++ strings to work with this.

My advice would foremost be to not use TCHAR and instead use Unicode directly, see the Conclusion section, but if you want to work with TCHAR you can do the following.

To use TCHAR, what we want is an instance of std::basic_string that uses TCHAR. You can do this by typedefing your own tstring:

typedef std::basic_string<TCHAR> tstring;

For string literals, don't forget to use _T.

You'll also need to use the correct versions of cin and cout. You can use references to implement a tcin and tcout:

#if defined(_UNICODE)
std::wistream &tcin = wcin;
std::wostream &tcout = wcout;
#else
std::istream &tcin = cin;
std::ostream &tcout = cout;
#end

This should allow you to do almost anything. There might be the occasional exception, such as std::to_string and std::to_wstring, for which you can find a similar workaround.

Conclusion

This answer (hopefully) details what TCHAR is and how it's used and intertwined with Visual Studio and the Windows headers. However, we should also wonder if we want to use it.

My advice is to directly use Unicode for all new Windows programs and don't use TCHAR at all!

Others giving the same advice: Is TCHAR still relevant?

To use Unicode after creating a new project, first ensure the character set is set to Unicode. Then, remove the #include <tchar.h> from your source file (or from stdafx.h). Fix up any TCHAR or _TCHAR to wchar_t and _tmain to wmain:

int wmain(int argc, wchar_t *argv[])

For non-console projects, the entry point for Windows applications is WinMain and will appear in TCHAR-jargon as

int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPTSTR    lpCmdLine, int nCmdShow)

and should become

int APIENTRY wWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPWSTR    lpCmdLine, int nCmdShow)

After this, only use wchar_t strings and/or std::wstrings.

Further caveats

  • Be careful when writing sizeof(szMyString) when using TCHAR arrays (strings), because for ANSI this is the size both in characters and in bytes, for Unicode this is only the size in bytes and the number of characters is at most half, and for MBCS this is the size in bytes and the number of characters may or may not be equal. Both Unicode and MBCS can use multiple TCHARs to encode a single character.
  • Mixing TCHAR stuff and fixed char or wchar_t is very annoying; you have to convert the strings from one to the other, using the correct code page! A simple copy will not work in the general case.
  • There is a slight difference between _UNICODE and UNICODE, relevant if you want to conditionally define your own functions. See Why both UNICODE and _UNICODE?

A very good, complementary answer is: Difference between MBCS and UTF-8 on Windows



来源:https://stackoverflow.com/questions/33836706/what-are-tchar-strings-and-the-a-or-w-version-of-win32-api-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!