How to detect if a file is not a text file in c#

筅森魡賤 提交于 2019-12-11 03:39:39

问题


I need to read through many files and search for specific text in them. I want to open only text files, i.e., no image, movie, etc. files. I am looking for a way to identify non-text files. Since I will be using a FileStream and doing a byte search, it seems to me I can stop reading and close a file if a byte whose decimal value is greater than 128 is encountered. Does this seem like a good approach?


回答1:


There's no foolproof answer for this. If you know that any text files will only ever be ASCII characters (and encoded in ASCII, UTF-8 or something similar) then yes, that will work... although it may not catch all non-text files.

However:

  • It will fail for any text files using non-ASCII text
  • It could still fail for a file which is a valid binary file for some format, but happens not to contain any values above 128.

Does the sequence of bytes { 34, 87, 23, 10 } represent text or binary data? There's simply no way of knowing for sure. Anything you do will be heuristic.




回答2:


Not sure if this is a home grown application and you just want a quick and dirty solution.

If so you could make use of Path.GetExtension

    string p = @"C:\Myfile.txt";
    string e = Path.GetExtension(p);
    if (e == ".txt")
    {
       //do stuff; process the file
    }

Keep in mind that an extension does not dictate data type. This is only valuable if you can guarantee the extension type is representative of the data.




回答3:


Can you just check the file extension if ".txt,.cvs" etc.?

The thing is you're going to have to know the encoding: How can I detect the encoding/codepage of a text file



来源:https://stackoverflow.com/questions/4751191/how-to-detect-if-a-file-is-not-a-text-file-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!