Reading a text file word by word

后端 未结 9 939
梦谈多话
梦谈多话 2020-12-10 18:58

I have a text file containing just lowercase letters and no punctuation except for spaces. I would like to know the best way of reading the file char by char, in a way that

相关标签:
9条回答
  • 2020-12-10 19:21

    I created a simple console program on your exact requirement with the files you mentioned, It should be easy to run and check. Please find attached the code. Hope this helps

    static void Main(string[] args)
        {
    
            string[] input = File.ReadAllLines(@"C:\Users\achikhale\Desktop\file.txt");
            string[] array1File = File.ReadAllLines(@"C:\Users\achikhale\Desktop\array1.txt");
            string[] array2File = File.ReadAllLines(@"C:\Users\achikhale\Desktop\array2.txt");
    
            List<string> finalResultarray1File = new List<string>();
            List<string> finalResultarray2File = new List<string>();
    
            foreach (string inputstring in input)
            {
                string[] wordTemps = inputstring.Split(' ');//  .Split(' ');
    
                foreach (string array1Filestring in array1File)
                {
                    string[] word1Temps = array1Filestring.Split(' ');
    
                    var result = word1Temps.Where(y => !string.IsNullOrEmpty(y) && wordTemps.Contains(y)).ToList();
    
                    if (result.Count > 0)
                    {
                        finalResultarray1File.AddRange(result);
                    }
    
                }
    
            }
    
            foreach (string inputstring in input)
            {
                string[] wordTemps = inputstring.Split(' ');//  .Split(' ');
    
                foreach (string array2Filestring in array2File)
                {
                    string[] word1Temps = array2Filestring.Split(' ');
    
                    var result = word1Temps.Where(y => !string.IsNullOrEmpty(y) && wordTemps.Contains(y)).ToList();
    
                    if (result.Count > 0)
                    {
                        finalResultarray2File.AddRange(result);
                    }
    
                }
    
            }
    
            if (finalResultarray1File.Count > 0)
            {
                Console.WriteLine("file array1.txt contians words: {0}", string.Join(";", finalResultarray1File));
            }
    
            if (finalResultarray2File.Count > 0)
            {
                Console.WriteLine("file array2.txt contians words: {0}", string.Join(";", finalResultarray2File));
            }
    
            Console.ReadLine();
    
        }
    }
    
    0 讨论(0)
  • 2020-12-10 19:25

    I would do something like this:

    IEnumerable<string> ReadWords(StreamReader reader)
    {
        string line;
        while((line = reader.ReadLine())!=null)
        {
            foreach(string word in line.Split(new [1] {' '}, StringSplitOptions.RemoveEmptyEntries))
            {
                yield return word;
            }
        }
    }
    

    If to use reader.ReadAllText it loads the entire file into your memory so you can get OutOfMemoryException and a lot of other problems.

    0 讨论(0)
  • 2020-12-10 19:26

    First of all: StringReader reads from a string which is already in memory. This means that you will have to load up the input file in its entirety before being able to read from it, which kind of defeats the purpose of reading a few characters at a time; it can also be undesirable or even impossible if the input is very large.

    The class to read from a text stream (which is an abstraction over a source of data) is StreamReader, and you would might want to use that one instead. Now StreamReader and StringReader share an abstract base class TextReader, which means that if you code against TextReader then you can have the best of both worlds.

    TextReader's public interface will indeed support your example code, so I 'd say it's a reasonable starting point. You just need to fix the one glaring bug: there is no check for Read returning -1 (which signifies the end of available data).

    0 讨论(0)
  • 2020-12-10 19:26

    This is method that will split your words, while they are separated by space or more than 1 space (two spaces for example)/

    StreamReader streamReader = new StreamReader(filePath); //get the file
    string stringWithMultipleSpaces= streamReader.ReadToEnd(); //load file to string
    streamReader.Close();
    
    Regex r = new Regex(" +"); //specify delimiter (spaces)
    string [] words = r.Split(stringWithMultipleSpaces); //(convert string to array of words)
    
    foreach (String W in words)
    {
       MessageBox.Show(W);
    }
    
    0 讨论(0)
  • 2020-12-10 19:28

    All in one line, here you go (assuming ASCII and perhaps not a 2gb file):

    var file = File.ReadAllText(@"C:\myfile.txt", Encoding.ASCII).Split(new[] { ' ' });
    

    This returns a string array, which you can iterate over and do whatever you need with.

    0 讨论(0)
  • 2020-12-10 19:30

    There is a much better way of doing this: string.Split(): if you read the entire string in, C# can automatically split it on every space:

    string[] words = reader.ReadToEnd().Split(' ');
    

    The words array now contains all of the words in the file and you can do whatever you want with them.

    Additionally, you may want to investigate the File.ReadAllText method in the System.IO namespace - it may make your life much easier for file imports to text.

    Edit: I guess this assumes that your file is not abhorrently large; as long as the entire thing can be reasonably read into memory, this will work most easily. If you have gigabytes of data to read in, you'll probably want to shy away from this. I'd suggest using this approach though, if possible: it makes better use of the framework that you have at your disposal.

    0 讨论(0)
提交回复
热议问题