read a text file and search for string in memory efficient way (and abort when found)

问题

I'm searching for a string in a text file (also includes XML). This is what I thought first:

using (StreamReader sr = File.OpenText(fileName))
{
    string s = String.Empty;
    while ((s = sr.ReadLine()) != null)
    {
        if (s.Contains("mySpecialString"))
            return true;
    }
}

return false;

I want to read line by line to minimize the amount of RAM used. When the string has been found it should abort the operation. The reason why I don't process it as XML is because it has to be parsed and would also consume more memory as necessary.

Another easy implementation would be

bool found = File.ReadAllText(path).Contains("mySpecialString") ? true : false;

but that would read the complete file into memory, which isn't what I want. On the other side it could have a performance increase.

Another one would be this

foreach (string line in File.ReadLines(path))
{
    if (line.Contains("mySpecialString"))
    {
        return true;
    }
}
return false;

But which one of them (or another one from you?) is more memory efficient?

回答1:

You can use a query with File.ReadLines, so it only reads as many lines as it needs to, in order to satisfy your query. The Any() method will stop when it hits a line containing your string.

return File.ReadLines(fileName).Any(line => line.Contains("mySpecialString"));

回答2:

I also prefer the accepted answer. Maybe i'm micro opimizing things here but you have asked for a memory efficient approach. Also consider that the text you are searching could also contain new-line characters like '\r', '\n' or "\r\n" and a large file could theoretically contain a single line which negates the benefit of ReadLines.

So you could use this method:

public static bool FileContainsString(string path, string str, bool caseSensitive = true)
{
     if(String.IsNullOrEmpty(str))
        return false;

    using (var stream = new StreamReader(path))
    while (!stream.EndOfStream)
    {
        bool stringFound = true;
        for (int i = 0; i < str.Length; i++)
        {
            char strChar = caseSensitive ? str[i] : Char.ToUpperInvariant(str[i]);
            char fileChar = caseSensitive ? (char)stream.Read() : Char.ToUpperInvariant((char)stream.Read());
            if (strChar != fileChar)
            {
                stringFound = false;
                break; // break for-loop, start again with first character at next position
            }
        }
        if (stringFound) 
            return true;
    }
    return false;
}

bool containsString = FileContainsString(path, "mySpecialString", false); // ignore case if desired

Note that this might be the most efficient approach and hidden in a method also readable. But it has one drawback, it's not feasible to implement a culture-sensitive comparison because it looks at single characters and not at substrings.

So you have to keep some edge cases in mind where you can run into issues, like the famous turkish i example or surrogate pairs.

回答3:

I think both of your solutions are the same. Read at the MSDN: https://msdn.microsoft.com/en-us/library/dd383503%28v=vs.110%29.aspx

There it says: "The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned"

The same article also suggests that ReadLines should be used in conjunction with LINQ to Objects.

来源：https://stackoverflow.com/questions/30078432/read-a-text-file-and-search-for-string-in-memory-efficient-way-and-abort-when-f

标签

string

file

text

system.io.file