I would not try to validate the file before hand : I would rather prefer going through it line by line, dealing with each line separately :
- Reading one line
- Verifying it's OK
- using the data
- and going to next line.
Now, what could "verify it's OK" means ?
- At least : make sure I can read the line as CSV, with my normal set of functions (maybe
fgetcsv, maybe some other function specific to my project -- anyway, if I cannot read one line with my function that reads hundreds, it's probably because there's a problem on that line)
- Then, check for the number of fields
- then, for each field, check if it contains "valid" data
- mandatory ? optionnal ?
- numeric ?
- string ?
- date ?
- and so on
- then, for each field, some more careful checks
- for instance, for a "code" field : does it correspond to a value that's legal for my application ?
If all that goes OK -- well, not much more to do, excepts use the data ;-)
And when you're done with one line, just go repeat for the next one.
Of course, if you want to either accept or reject a whole file before doing any database (or anything like that) write, you'll have to :
- parse the file, line by line, applying the "verifying" ideas
- store the data of each line in memory
- and, when the whole file has been read to memory,
- either start using the data
- or, if there's been an error on one line, reject everything.
In your specific case, you have three kind of fields :
Date;Id;Shown
15-Mar-10;231;345
15-Mar-10;232;346
From what I can guess :
- The first one must be a date
- Using some regex to validate that will not be easy : there are not the same number of days each month, there are many months, there is not the same number of days in february depending on the year, ...
- In such a case, I would probably try to parse the date with something like strtotime (not sure it's ok for the format you're using, though)
- Or I would just explode the string
- making sure there are three parts
- that the third one is 2 digits
- that the second one is one of
Jan, Feb, Mar, ...
- That the first one corresponds to the correct number of days, depending on the two others
- The second one :
- must be an integer
- must be a valid value, that exists in your database ?
- If so, a simple SQL query will allow you to check that
- For the third one, not really sure...
- I'm guessing it has to be an integer ?