I have a text file containing just lowercase letters and no punctuation except for spaces. I would like to know the best way of reading the file char by char, in a way that
I created a simple console program on your exact requirement with the files you mentioned, It should be easy to run and check. Please find attached the code. Hope this helps
static void Main(string[] args)
{
string[] input = File.ReadAllLines(@"C:\Users\achikhale\Desktop\file.txt");
string[] array1File = File.ReadAllLines(@"C:\Users\achikhale\Desktop\array1.txt");
string[] array2File = File.ReadAllLines(@"C:\Users\achikhale\Desktop\array2.txt");
List<string> finalResultarray1File = new List<string>();
List<string> finalResultarray2File = new List<string>();
foreach (string inputstring in input)
{
string[] wordTemps = inputstring.Split(' ');// .Split(' ');
foreach (string array1Filestring in array1File)
{
string[] word1Temps = array1Filestring.Split(' ');
var result = word1Temps.Where(y => !string.IsNullOrEmpty(y) && wordTemps.Contains(y)).ToList();
if (result.Count > 0)
{
finalResultarray1File.AddRange(result);
}
}
}
foreach (string inputstring in input)
{
string[] wordTemps = inputstring.Split(' ');// .Split(' ');
foreach (string array2Filestring in array2File)
{
string[] word1Temps = array2Filestring.Split(' ');
var result = word1Temps.Where(y => !string.IsNullOrEmpty(y) && wordTemps.Contains(y)).ToList();
if (result.Count > 0)
{
finalResultarray2File.AddRange(result);
}
}
}
if (finalResultarray1File.Count > 0)
{
Console.WriteLine("file array1.txt contians words: {0}", string.Join(";", finalResultarray1File));
}
if (finalResultarray2File.Count > 0)
{
Console.WriteLine("file array2.txt contians words: {0}", string.Join(";", finalResultarray2File));
}
Console.ReadLine();
}
}
I would do something like this:
IEnumerable<string> ReadWords(StreamReader reader)
{
string line;
while((line = reader.ReadLine())!=null)
{
foreach(string word in line.Split(new [1] {' '}, StringSplitOptions.RemoveEmptyEntries))
{
yield return word;
}
}
}
If to use reader.ReadAllText it loads the entire file into your memory so you can get OutOfMemoryException and a lot of other problems.
First of all: StringReader
reads from a string which is already in memory. This means that you will have to load up the input file in its entirety before being able to read from it, which kind of defeats the purpose of reading a few characters at a time; it can also be undesirable or even impossible if the input is very large.
The class to read from a text stream (which is an abstraction over a source of data) is StreamReader, and you would might want to use that one instead. Now StreamReader
and StringReader
share an abstract base class TextReader, which means that if you code against TextReader
then you can have the best of both worlds.
TextReader
's public interface will indeed support your example code, so I 'd say it's a reasonable starting point. You just need to fix the one glaring bug: there is no check for Read
returning -1 (which signifies the end of available data).
This is method that will split your words, while they are separated by space or more than 1 space (two spaces for example)/
StreamReader streamReader = new StreamReader(filePath); //get the file
string stringWithMultipleSpaces= streamReader.ReadToEnd(); //load file to string
streamReader.Close();
Regex r = new Regex(" +"); //specify delimiter (spaces)
string [] words = r.Split(stringWithMultipleSpaces); //(convert string to array of words)
foreach (String W in words)
{
MessageBox.Show(W);
}
All in one line, here you go (assuming ASCII and perhaps not a 2gb file):
var file = File.ReadAllText(@"C:\myfile.txt", Encoding.ASCII).Split(new[] { ' ' });
This returns a string array, which you can iterate over and do whatever you need with.
There is a much better way of doing this: string.Split()
: if you read the entire string in, C# can automatically split it on every space:
string[] words = reader.ReadToEnd().Split(' ');
The words
array now contains all of the words in the file and you can do whatever you want with them.
Additionally, you may want to investigate the File.ReadAllText
method in the System.IO
namespace - it may make your life much easier for file imports to text.
Edit: I guess this assumes that your file is not abhorrently large; as long as the entire thing can be reasonably read into memory, this will work most easily. If you have gigabytes of data to read in, you'll probably want to shy away from this. I'd suggest using this approach though, if possible: it makes better use of the framework that you have at your disposal.