问题
I need to read through many files and search for specific text in them. I want to open only text files, i.e., no image, movie, etc. files. I am looking for a way to identify non-text files. Since I will be using a FileStream and doing a byte search, it seems to me I can stop reading and close a file if a byte whose decimal value is greater than 128 is encountered. Does this seem like a good approach?
回答1:
There's no foolproof answer for this. If you know that any text files will only ever be ASCII characters (and encoded in ASCII, UTF-8 or something similar) then yes, that will work... although it may not catch all non-text files.
However:
- It will fail for any text files using non-ASCII text
- It could still fail for a file which is a valid binary file for some format, but happens not to contain any values above 128.
Does the sequence of bytes { 34, 87, 23, 10 } represent text or binary data? There's simply no way of knowing for sure. Anything you do will be heuristic.
回答2:
Not sure if this is a home grown application and you just want a quick and dirty solution.
If so you could make use of Path.GetExtension
string p = @"C:\Myfile.txt";
string e = Path.GetExtension(p);
if (e == ".txt")
{
//do stuff; process the file
}
Keep in mind that an extension does not dictate data type. This is only valuable if you can guarantee the extension type is representative of the data.
回答3:
Can you just check the file extension if ".txt,.cvs" etc.?
The thing is you're going to have to know the encoding: How can I detect the encoding/codepage of a text file
来源:https://stackoverflow.com/questions/4751191/how-to-detect-if-a-file-is-not-a-text-file-in-c-sharp