How do you search a large text file for a string without going line by line in C#?

后端 未结 14 1803
灰色年华
灰色年华 2020-12-15 08:23

I have a large text file that I need to search for a specific string. Is there a fast way to do this without reading line by line?

This method is extremely slow beca

相关标签:
14条回答
  • 2020-12-15 08:48

    Stick it into SQL Server 2005/2008 and use its full-text search capability.

    0 讨论(0)
  • 2020-12-15 08:50

    Here's a simple one-function solution reading character by character. Worked fine for me.

    /// <summary>
    /// Find <paramref name="toFind"/> in <paramref name="reader"/>.
    /// </summary>
    /// <param name="reader">The <see cref="TextReader"/> to find <paramref name="toFind"/> in.</param>
    /// <param name="toFind">The string to find.</param>
    /// <returns>Position within <paramref name="reader"/> where <paramref name="toFind"/> starts or -1 if not found.</returns>
    /// <exception cref="ArgumentNullException">When <paramref name="reader"/> is null.</exception>
    /// <exception cref="ArgumentException">When <paramref name="toFind"/> is null or empty.</exception>
    public int FindString(TextReader reader, string toFind)
    {
        if(reader == null)
            throw new ArgumentNullException("reader");
    
        if(string.IsNullOrEmpty(toFind))
            throw new ArgumentException("String to find may not be null or empty.");
    
        int charsRead = -1;
        int pos = 0;
        int chr;
    
        do
        {
            charsRead++;
            chr = reader.Read();
            pos = chr == toFind[pos] ? pos + 1 : 0;
        }
        while(chr >= 0 && pos < toFind.Length);
    
        int result = chr < 0 ? -1 : charsRead - toFind.Length;
        return result < 0 ? -1 : result;
    }
    

    Hope that helps.

    0 讨论(0)
  • 2020-12-15 08:54

    In all cases, you will have to go over all of the file.

    Lookup Rabin-Karp string search or similar.

    0 讨论(0)
  • 2020-12-15 08:56

    I have a large text file that I need to search for a specific string. Is there a fast way to do this without reading line by line?

    The only way to avoid searching across the entire file is to sort or organize the input beforehand. For example, if this is an XML file and you need to do many of these searches, it would make sense to parse the XML file into a DOM tree. Or if this is a list of words and you're looking for all the words which start with the letters "aero", it might make sense to sort the entire input first if you do a lot of that kind of searching on the same file.

    0 讨论(0)
  • 2020-12-15 08:56

    The speed issue here could well be the speed taken to load the file into memory before performing the search. Try profiling your application to see where the bottleneck is. If it is loading the file you could try "chunking" the file load so that the file is streamed in small chunks and each chunk has the search performed on it.

    Obviously if the part of the string to be found is at the end of the file there will be no performance gain.

    0 讨论(0)
  • 2020-12-15 08:57

    You should be able to read the file character by character matching each character in the search string until you reach the end of the search string in which case you have a match. If at any point the character you've read doesn't match the character you're looking for, reset the matched count to 0 and start again. For example (****pseudocode/not tested****):

    byte[] lookingFor = System.Text.Encoding.UTF8.GetBytes("hello world");
    int index = 0;
    int position = 0;
    bool matchFound = false;
    
    using (FileStream fileStream = new FileStream(fileName, FileMode.Open))
    {
      while (fileStream.ReadByte() == lookingFor[index])
      {
        index++;
    
        if (index == lookingFor.length) 
        {
           matchFound = true;
           position = File.position - lookingFor.length;
           break;
        }
      }
    }
    

    That is one of many algorithms you could use (although it may be off by one with the length check). It will only find the first match so you probably want to wrap the while loop in another loop to find multiple matches.

    Also, one thing to note about reading the file line by line is that if the desired string to match spans lines you're not going to find it. If that's fine then you can search line by line but if you need search strings to span lines you'll want to use an algorithm like I detailed above.

    Finally, if you're looking for best speed, which it sounds like you are, you'll want to migrate the code above to use a StreamReader or some other buffered reader.

    0 讨论(0)
提交回复
热议问题