I have some .csv files which I am parsing before storing in database.
I would like to make application more robust, and perform validation upon the .csv files before
Probably you should take a look to http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
We have been using this in our projects, its quite robust and does what it says.
adrianm and Nipun Ambastha
Thank you for your response to my question.
I solved my problem by writing a solution to validate my .csv file myself.
It's quite possible a more elegant solution could be made by making use of adrianm's code, but I didn't do that, but I am encouraging to give adrianm's code a look.
I am validating the list below.
Empty file new FileInfo(dto.AbsoluteFileName).Length == 0
Wrong formatting of file lines. string[] items = line.Split('\t'); if (items.Count() == 20)
Wrong datatype in line fields. int number; bool isNumber = int.TryParse(dataRow.ItemArray[0].ToString(), out number);
Missing required line fields. if (dataRow.ItemArray[4].ToString().Length < 1)
To work through the contents of the .csv file I based my code on this code example:
http://bytes.com/topic/c-sharp/answers/256797-reading-tab-delimited-file
I do like this:
Create a class to hold each parsed line with expected type
internal sealed class Record {
public int Field1 { get; set; }
public DateTime Field2 { get; set; }
public decimal? PossibleEmptyField3 { get; set; }
...
}
Create a method that parses a line into the record
public Record ParseRecord(string[] fields) {
if (fields.Length < SomeLineLength)
throw new MalformadLineException(...)
var record = new Record();
record.Field1 = int.Parse(fields[0], NumberFormat.None, CultureInvoice.InvariantCulture);
record.Field2 = DateTime.ParseExact(fields[1], "yyyyMMdd", CultureInvoice.InvariantCulture);
if (fields[2] != "")
record.PossibleEmptyField3 = decimal.Parse(fields[2]...)
return record;
}
Create a method parsing the entire file
public List<Record> ParseStream(Stream stream) {
var tfp = new TextFileParser(stream);
...
try {
while (!tfp.EndOfData) {
records.Add(ParseRecord(tfp.ReadFields());
}
}
catch (FormatException ex) {
... // show error
}
catch (MalformadLineException ex) {
... // show error
}
return records;
}
And then I create a number of methods validating the fields
public void ValidateField2(IEnumerable<Record> records) {
foreach (var invalidRecord in records.Where(x => x.Field2 < DateTime.Today))
... // show error
}
I have tried various tools but since the pattern is straight forward they don't help much. (You should use a tool to split the line into fields)
You can use FileHelpers a free/open source .Net library to deal with CSV and many other file formats.