Want VBA in excel to read very large CSV and create output file of a small subset of the CSV

前端 未结 7 1955
故里飘歌
故里飘歌 2020-11-30 08:18

I have a csv file of 1.2 million records of text. The alphanumeric fields are wrapped in quotation marks, the date/time or numeric fields are not.

For example \"Fre

相关标签:
7条回答
  • 2020-11-30 08:32

    This doesn't directly answer your question, but grep (or one of the Windows equivalents) would really shine for this, e.g.,

    grep -e <regex_filter> foo.csv > bar.csv
    
    0 讨论(0)
  • 2020-11-30 08:32

    Look at the Input # statement in the Excel help

    Sample usage would be:

    Input #fnInput, s_Forename, s_Surname, dt_DOB, i_Something, s_Street, s_Town, s_County, s_Postcode
    

    and then use the Write # statement to write matching records out again

    The only issue might be that the date format in the output will end up as #1967-07-01# but this format is unambiguous unlike 01/07/1967 which would represent 1st July in the UK and 7th January in the US. If you need to preserve the formatting of the date then write it out as a string:

    s_DOB = Format(dt_DOB, "dd/mm/yyyy")
    
    0 讨论(0)
  • 2020-11-30 08:38

    I used the following derivative of the code given above to successfully open an arbitrary csv file from VBA in Excel.

    Option Explicit
    Public cn As Connection
    Public Sub DoIt()
    Dim strcon As String
    Dim strsql As String
    Dim rs As Recordset

    Set cn = CreateObject("ADODB.Connection")

    strcon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\bin\HomePlanet\;" _
    & "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"

    cn.Open strcon

    strsql = "SELECT * FROM astuname.csv "
    Set rs = New ADODB.Recordset
    rs.Open strsql, cn
    DoEvents ' pause here to inspect objects and properties rs.Close
    End Sub

    The rs (recordset) has a collection of fields, with a Count property. Each field as a Type property.

    You can reference the fields by sequence number ...

    Debug.Print rs.Fields(rs.Fields.Count - 1).Type

    Is this sufficient?

    If not, post the first several rows of the input file and I'll take it the rest of the way.

    0 讨论(0)
  • 2020-11-30 08:43

    I'd suggest taking a look at the Regular Expression library (you should see it in "Tools...References" as "Microsoft VBScript Regular Expressions 5.5" or something very similar.

    There are samples of both the Reg Exp and a fairly comprehensive character-by-character at this location: http://www.xbeat.net/vbspeed/c_ParseCSV.php. Note that the Regexp version is waaaay shorter!

    Have fun...

    0 讨论(0)
  • 2020-11-30 08:50

    How about VBScript, though this would also work in Excel:

    Set cn = CreateObject("ADODB.Connection")
    
    'Note HDR=Yes, that is, first row contains field names '
    'and FMT delimted, ie CSV '
    
    strCon="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Docs\;" _
    & "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
    
    cn.open strcon
    
    'You would not need delimiters ('') if last field is numeric: '    
    strSQL="SELECT FieldName1, FieldName2 INTO New.csv FROM Old.csv " _
    & " WHERE LastFieldName='SomeTextValue'"
    
    'Creates new csv file
    cn.Execute strSQL
    
    0 讨论(0)
  • 2020-11-30 08:53

    The following code should do the trick. I don't have Excel in front of me, so I haven't tested it, but the concept is sound.

    If this ends up being too slow, we can look at ways to improve the efficiency.

    Sub SelectSomeRecords()
        Dim testLine As String
    
        Open inputFileName For Input As #1
        Open outputFileName For Output As #2
    
        While Not EOF(1)
            Line Input #1, testLine
            If RecordIsInteresting(testLine) Then
                Print #2, testLine
            End If
        Wend
    
        Close #1
        Close #2
    End Sub
    
    Function RecordIsInteresting(recordLine As String) As Boolean
        Dim lineItems(1 to 8) As String
    
        GetRecordItems(lineItems(), recordLine)
    
        ''// do your custom checking here:
        RecordIsInteresting = lineItems(8) = "LS1 7AA"
    End Function
    
    Sub GetRecordItems(items() As String, recordLine as String)
        Dim finishString as Boolean
        Dim itemString as String
        Dim itemIndex as Integer
        Dim charIndex as Long
        Dim inQuote as Boolean
        Dim testChar as String
    
        inQuote = False
        charIndex = 1
        itemIndex = 1
        itemString = ""
        finishString = False
    
        While charIndex <= Len(recordLine)
            testChar = Mid$(recordLine, charIndex, 1)
    
            finishString = False
    
            If inQuote Then
                If testChar = Chr$(34) Then
                    inQuote = False
                    finishString = True
                    charIndex = charIndex + 1 ''// ignore the next comma
                Else
                    itemString = itemString + testChar
                End If
            Else
                If testChar = Chr$(34) Then
                    inQuote = True
                ElseIf testChar = "," Then
                    finishString = True
                Else
                    itemString = itemString + testChar
                End If
            End If
    
            If finishString Then
                items(itemIndex) = itemString
                itemString = ""
                itemIndex = itemIndex + 1
            End If
    
            charIndex = charIndex + 1
        Wend
    End Sub
    
    0 讨论(0)
提交回复
热议问题