How to validate csv file?

后端 未结 4 779
感情败类
感情败类 2020-12-05 11:29

How can we validate a CSV file ?

I have an CSV file of structure:

Date;Id;Shown
15-Mar-10;231;345
15-Mar-10;232;346
and so on and on !!! approx arou         


        
4条回答
  •  悲哀的现实
    2020-12-05 12:14

    I would not try to validate the file before hand : I would rather prefer going through it line by line, dealing with each line separately :

    • Reading one line
    • Verifying it's OK
    • using the data
    • and going to next line.


    Now, what could "verify it's OK" means ?

    • At least : make sure I can read the line as CSV, with my normal set of functions (maybe fgetcsv, maybe some other function specific to my project -- anyway, if I cannot read one line with my function that reads hundreds, it's probably because there's a problem on that line)
    • Then, check for the number of fields
    • then, for each field, check if it contains "valid" data
      • mandatory ? optionnal ?
      • numeric ?
      • string ?
      • date ?
      • and so on
    • then, for each field, some more careful checks
      • for instance, for a "code" field : does it correspond to a value that's legal for my application ?

    If all that goes OK -- well, not much more to do, excepts use the data ;-)
    And when you're done with one line, just go repeat for the next one.


    Of course, if you want to either accept or reject a whole file before doing any database (or anything like that) write, you'll have to :

    • parse the file, line by line, applying the "verifying" ideas
    • store the data of each line in memory
    • and, when the whole file has been read to memory,
      • either start using the data
      • or, if there's been an error on one line, reject everything.


    In your specific case, you have three kind of fields :

    Date;Id;Shown
    15-Mar-10;231;345
    15-Mar-10;232;346
    

    From what I can guess :

    • The first one must be a date
      • Using some regex to validate that will not be easy : there are not the same number of days each month, there are many months, there is not the same number of days in february depending on the year, ...
      • In such a case, I would probably try to parse the date with something like strtotime (not sure it's ok for the format you're using, though)
      • Or I would just explode the string
        • making sure there are three parts
        • that the third one is 2 digits
        • that the second one is one of Jan, Feb, Mar, ...
        • That the first one corresponds to the correct number of days, depending on the two others
    • The second one :
      • must be an integer
      • must be a valid value, that exists in your database ?
        • If so, a simple SQL query will allow you to check that
    • For the third one, not really sure...
      • I'm guessing it has to be an integer ?

提交回复
热议问题