Reading CSV file some missing columns

后端 未结 3 906
没有蜡笔的小新
没有蜡笔的小新 2021-01-11 12:45

I am trying to read in a CSV file into my VB.net application using the following code:

While Not EOF(1)
    Input(1, dummy)
    Input(1, phone_number)
    Inp         


        
3条回答
  •  天命终不由人
    2021-01-11 12:53

    One thing that you should come to grips with is that those Filexxxx methods are all but officially and formally deprecated. When using them, Intellisense pops up with:

    ...The My feature gives you better productivity and performance in file I/O operations than FileOpen. For more information, see Microsoft.VisualBasic.FileIO.FileSystem.

    They are talking about My.Computer.FileSystem but there are some even more useful NET methods.

    The post doesnt reveal how the data will be stored, but if it is an array of any sort and/or a structure, those are at least suboptimal if not also outdated. This will store it in a class so that the numeric data can be stored as numbers and a List will be used in place of an array.

    I made a quick file similar to yours with some random data: {"CustName", "Phone", "UserName", "Product", "Cost", "Price", "Profit", "SaleDate", "RefCode"}:

    • The CustName is present 70% of the time
    • The username is never present
    • The RefCode is present 30% of the time
    • I added a SaleDate to illustrate that data conversion

    Ziggy Aurantium,132-5562,,Cat Food,8.26,9.95,1.69,08/04/2016,
    Catrina Caison,899-8599,,Knife Sharpener,4.95,6.68,1.73,10/12/2016,X-873-W3
    ,784-4182,,Vapor Compressor,11.02,12.53,1.51,09/12/2016,

    Code to Parse the CSV

    Note: this is a bad way to parse a CSV. There are lots of problems that can arise doing it this way; plus it takes more code. It is presented because it is a simple way to not have to deal with the missing fields. See The Right Way

    ' form/class level var:
    Private SalesItems As List(Of SaleItem)
    

    SaleItem is a simple class to store the elements you care about. SalesItems is a collection which can store only SaleItem objects. The properties in that class allow Price and Cost to be stored as Decimal and the date as a DateTime.

    ' temp var
    Dim item As SaleItem
    ' create the collection
    SalesItems = New List(Of SaleItem)
        
    ' load the data....all of it
    Dim data = File.ReadAllLines("C:\Temp\custdata.csv")
    
    ' parse data lines 
    ' Start at 1 to skip a Header
    For n As Int32 = 0 To data.Length - 1
        Dim split = data(n).Split(","c)
    
        ' check if it is a good line
        If split.Length = 9 Then
            ' create a new item
            item = New SaleItem
            ' store SOME data to it
            item.CustName = split(0)
            item.Phone = split(1)
            ' dont care anout user name (2)
            item.Product = split(3)
            ' convert numbers
            item.Price = Convert.ToDecimal(split(4))
            item.Cost = Convert.ToDecimal(split(5))
            ' dont use the PROFIT, calculate it in the class (6)
    
            ' convert date
            item.SaleDate = Convert.ToDateTime(split(7))
    
            ' ignore nonexistant RefCode (8)
    
            ' add new item to collection
            ' a List sizes itself as needed!
            SalesItems.Add(item)
        Else
            ' To Do: make note of a bad line format
        End If
    Next
    
    ' show in DGV for approval/debugging
    dgvMem.DataSource = SalesItems
    

    Result:

    Notes
    It is generally a bad idea to store something which can be simply calculated. So the Profit property is:

    Public ReadOnly Property Profit As Decimal
        Get
            Return (Cost - Price)
        End Get
    End Property
    

    It can never be "stale" if the cost or price is updated.

    As shown, using the resulting collection can be displayed to the user very easily. Given a DataSource, the DataGridView will create the columns and populate the rows.

    The Right Way

    String.Split(c) is a very bad idea because if the product is: "Hose, Small Green" it will chop that up and treat it as 2 fields. There are a number of tools which will do nearly all the work for you:

    1. Read the file
    2. Parse the lines
    3. Map the CSV data to a class
    4. convert the text into the proper data type
    5. create an economical collecton

    Aside from the class, all the above could be done in just a few lines using CSVHelper:

    Private CustData As List(Of SaleItem)
    ...
    Using sr As New StreamReader("C:\Temp\custdata.csv", False),
         csv = New CsvReader(sr)
        csv.Configuration.HasHeaderRecord = True
    
        CustData = csv.GetRecords(Of SaleItem)().ToList()
    End Using
    

    Two or three lines of code to read, parse, and create a collection of 250 items.

    Even if you want to do it manually for some reason, CSVHelper can help. Rather than create a List(Of SaleItem) for you, you can use it to read and parse the data:

    ... like above
    csv.Configuration.HasHeaderRecord = True
    Do Until csv.Read() = False
        For n As Int32 = 0 To csv.Parser.FieldCount - 1
            DoSomethingWith(csv.GetField(n))
        Next
    Loop
    

    This will return the fields to you one by one. It wont convert any dates or prices, but it wont choke on missing data elements either.

    Resources

    • Five Minute Intro To Classes and Lists
    • CSVHelper
    • Class vs Structure
      • Eric Lippert's Thoughts on the matter
    • The File Class has gobs of useful methods

提交回复
热议问题