Reading CSV file some missing columns

泪湿孤枕 提交于 2019-11-30 22:48:34

问题


I am trying to read in a CSV file into my VB.net application using the following code:

While Not EOF(1)
    Input(1, dummy)
    Input(1, phone_number)
    Input(1, username)
    Input(1, product_name)
    Input(1, wholesale_cost)
    Input(1, dummy)
    Input(1, dummy)
End While

My CSV file (as text) looks like this:

Customer Name,Phone Number,Username,Product,Wholesale Cost,Sales Price,Gross Profit, Customer Reference
  ,00000000000,00000000000,Product Name,25.00,35.00,10.00,
  ,00000000000,00000000000,Product Name,1.00,1.40,0.40,

As you can see, not all fields are always included and therefore an error displays when reading the file because it cannot reach the end of the line.

How can I handle this type of file?

Sometimes the fields will be there on some lines, and others not.

UPDATE

I have tried the answer that Zenacity provided but when trying to read using sArray(1) inside the loop it returns Index was outside the bounds of the array


回答1:


One thing that you should come to grips with is that those Filexxxx methods are all but officially and formally deprecated. When using them, Intellisense pops up with:

...The My feature gives you better productivity and performance in file I/O operations than FileOpen. For more information, see Microsoft.VisualBasic.FileIO.FileSystem.

They are talking about My.Computer.FileSystem but there are some even more useful NET methods.

The post doesnt reveal how the data will be stored, but if it is an array of any sort and/or a structure, those are at least suboptimal if not also outdated. This will store it in a class so that the numeric data can be stored as numbers and a List will be used in place of an array.

I made a quick file similar to yours with some random data: {"CustName", "Phone", "UserName", "Product", "Cost", "Price", "Profit", "SaleDate", "RefCode"}:

  • The CustName is present 70% of the time
  • The username is never present
  • The RefCode is present 30% of the time
  • I added a SaleDate to illustrate that data conversion

Ziggy Aurantium,132-5562,,Cat Food,8.26,9.95,1.69,08/04/2016,
Catrina Caison,899-8599,,Knife Sharpener,4.95,6.68,1.73,10/12/2016,X-873-W3
,784-4182,,Vapor Compressor,11.02,12.53,1.51,09/12/2016,

Code to Parse the CSV

Note: this is a bad way to parse a CSV. There are lots of problems that can arise doing it this way; plus it takes more code. It is presented because it is a simple way to not have to deal with the missing fields. See The Right Way

' form/class level var:
Private SalesItems As List(Of SaleItem)

SaleItem is a simple class to store the elements you care about. SalesItems is a collection which can store only SaleItem objects. The properties in that class allow Price and Cost to be stored as Decimal and the date as a DateTime.

' temp var
Dim item As SaleItem
' create the collection
SalesItems = New List(Of SaleItem)

' load the data....all of it
Dim data = File.ReadAllLines("C:\Temp\custdata.csv")

' parse data lines 
' Start at 1 to skip a Header
For n As Int32 = 0 To data.Length - 1
    Dim split = data(n).Split(","c)

    ' check if it is a good line
    If split.Length = 9 Then
        ' create a new item
        item = New SaleItem
        ' store SOME data to it
        item.CustName = split(0)
        item.Phone = split(1)
        ' dont care anout user name (2)
        item.Product = split(3)
        ' convert numbers
        item.Price = Convert.ToDecimal(split(4))
        item.Cost = Convert.ToDecimal(split(5))
        ' dont use the PROFIT, calculate it in the class (6)

        ' convert date
        item.SaleDate = Convert.ToDateTime(split(7))

        ' ignore nonexistant RefCode (8)

        ' add new item to collection
        ' a List sizes itself as needed!
        SalesItems.Add(item)
    Else
        ' To Do: make note of a bad line format
    End If
Next

' show in DGV for approval/debugging
dgvMem.DataSource = SalesItems

Result:

Notes
It is generally a bad idea to store something which can be simply calculated. So the Profit property is:

Public ReadOnly Property Profit As Decimal
    Get
        Return (Cost - Price)
    End Get
End Property

It can never be "stale" if the cost or price is updated.

As shown, using the resulting collection can be displayed to the user very easily. Given a DataSource, the DataGridView will create the columns and populate the rows.

The Right Way

String.Split(c) is a very bad idea because if the product is: "Hose, Small Green" it will chop that up and treat it as 2 fields. There are a number of tools which will do nearly all the work for you:

  1. Read the file
  2. Parse the lines
  3. Map the CSV data to a class
  4. convert the text into the proper data type
  5. create an economical collecton

Aside from the class, all the above could be done in just a few lines using CSVHelper:

Private CustData As List(Of SaleItem)
...
Using sr As New StreamReader("C:\Temp\custdata.csv", False),
     csv = New CsvReader(sr)
    csv.Configuration.HasHeaderRecord = True

    CustData = csv.GetRecords(Of SaleItem)().ToList()
End Using

Two or three lines of code to read, parse, and create a collection of 250 items.

Even if you want to do it manually for some reason, CSVHelper can help. Rather than create a List(Of SaleItem) for you, you can use it to read and parse the data:

... like above
csv.Configuration.HasHeaderRecord = True
Do Until csv.Read() = False
    For n As Int32 = 0 To csv.Parser.FieldCount - 1
        DoSomethingWith(csv.GetField(n))
    Next
Loop

This will return the fields to you one by one. It wont convert any dates or prices, but it wont choke on missing data elements either.

Resources

  • Five Minute Intro To Classes and Lists
  • CSVHelper
  • Class vs Structure
    • Eric Lippert's Thoughts on the matter
  • The File Class has gobs of useful methods



回答2:


Warning: if CustomerName or ProductName values can contains comma's (.i.e. CustomerName = "Callaway , Mark") you cannot use the String.Split() method. and it is better to search for a third party csv parser, Or you can use TextFieldParser Class --> MSDN article

  • You can follow this link to learn about importing csv using TextFieldParser

    https://www.daniweb.com/programming/software-development/code/438959/read-in-text-from-csv-with-textfieldparser

  • Earlier i faced a problem importing Csv file using SQL server Integration Services (Fields contains delimiter) you can take a look (code in Vb.net):

    Import CSV File Error : Column Value containing column delimiter

My answer is assuming that missing fields are always from the right side of the line and that Fields value does not contains comma's (else @Plutonix answer is what you are looking for)

Using this code you will be able to import lines with missing fields.

You have to read each line from the csv file, Count the "," occurence in this line using the following code

Line.Count(Function(c As Char) c = ",")

if count is less than 7 (8 Columns) you will add missing ","

 String.PadRight((7 - intCommaCount), ",")

Note: if missing comma's are from the left side you can use String.PadLeft((7 - intCommaCount), ",")

And split the line into Item properties

I Created the following Item Class

Public Class MyItem


Public Property CustomerName As String
Public Property PhoneNumber As String
Public Property Username As String
Public Property Product As String
Public Property WholesaleCost As String
Public Property SalesPrice As String
Public Property GrossProfit As String
Public Property CustomerReference As String

Public Shared Function CreateObjectFromLine(ByVal Line As String) As MyItem

    'Count Comma occurence in Line
    Dim intCommaCount As Integer = Line.Count(Function(c As Char) c = CChar(","))
    Dim strTemp = Line

    'Add missing comma's
    If intCommaCount < 7 Then

        strTemp = strTemp.PadRight((7 - intCommaCount), ",")

    End If

    'Split Line and return MyItem Class
    Dim str() As String = strTemp.Split(",")

    Return New MyItem With {.CustomerName = str(0),
        .PhoneNumber = str(1),
        .Username = str(2),
        .Product = str(3),
        .WholesaleCost = str(4),
        .SalesPrice = str(5),
        .GrossProfit = str(6),
        .CustomerReference = str(7)}




End Function



End Class

And i used the following Code to Import Data from CSV File

    Dim SalesItems As New List(Of MyItem)
    Dim csvFile As String = "C:\1.csv"


    Using csvStreamReader As New IO.StreamReader(csvFile)

        While Not csvStreamReader.EndOfStream

       Dim strLine as string = csvStreamReader.ReadLine

       ' Skip Header
       If strLine.StartsWith("Customer Name") Then Continue While

            Dim item As MyItem = MyItem.CreateObjectFromLine(strLine)

            SalesItems.Add(item)



        End While


    End Using

    'Showing Result in a DataGridView
    dgvItems.DataSource = SalesItems

Note: This is an simple example which need adding error Handling Try... Catch , Null Checking




回答3:


By using the following Function, you can evaluate the file content line by line and take the appropriate action.

Imports System.IO    
Private Sub ParseCSVFile(psFile As String)
    Dim sArray() As String
    Dim Customer_Name As String = String.Empty
    Dim Phone_Number As String = String.Empty
    Dim Username As String = String.Empty
    Dim Product As String = String.Empty
    Dim Wholesale_Cost As String = String.Empty
    Dim Sales_Price As String = String.Empty
    Dim Gross_Profit As String = String.Empty
    Dim Customer_Reference As String = String.Empty

    Try
        Using objStreamReader As StreamReader = New StreamReader(psFile) 'should be full path
            Dim sLine As String = String.Empty
            Do
                sLine = objStreamReader.ReadLine()
                If sLine <> Nothing Then
                    sArray = Split(sLine, ",")
                    Customer_Name = sArray(0)
                    Phone_Number = sArray(1)
                    Username = sArray(2)
                    Product = sArray(3)
                    Wholesale_Cost = sArray(4)
                    Sales_Price = sArray(5)
                    Gross_Profit = sArray(6)
                    Customer_Reference = sArray(7)
                    Debug.Print(Customer_Name & "," & Phone_Number & "," & Username & "," & Product & "," & Wholesale_Cost & "," & Sales_Price & "," & Gross_Profit & "," & Customer_Reference)
                End If
            Loop Until sLine Is Nothing
        End Using
    Catch
        'log error
    End Try
End Sub


来源:https://stackoverflow.com/questions/41108845/reading-csv-file-some-missing-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!