Importing CSV having Duplicate Column names with SQL Query

问题

I have 3 csv files located in different network folders. The network folders/sub-folders may have spaces in them. I want to join these 3 csv files to create a single ADO Recordset containing required columns.

Test1.csv (I have excluded unnecessary columns from all csv's)

T1Id  | Gpos | lbl
-----------------------
1001  |  0   | Innovate
1002  |  1   | Buys
1003  |  2   | Sales
1004  |  3   | Forecasts
1005  |  4   | Usage
1006  |  5   | Forum

Test2.csv: (I have excluded unnecessary columns from all csv's)

T2Id |  T1Id |  Apos | tval
-----------------------------------
382  |  1001 |  1   | my life my rules.
203  |  1001 |  2   | earth wind rain and fire.
658  |  1002 |  1   | wealth power blood desire.
200  |  1003 |  1   | one good to live for.
301  |  1003 |  2   | before we die.
439  |  1004 |  1   | one taste to glory
795  |  1004 |  2   | one mouthful of sky.
494  |  1004 |  3   | some other text.

Test3.csv: (I have excluded unnecessary columns from all csv's)

(blank)  Aggregate (blank) Aggregate  149_SG_Bryl_Cream   891_SG_Myo__Sky_Blue_Dress
------------------------------------------------------------------------------
X0.1     0.422300          0.424658    0.458014          0.434639
X0.2     0.318628          0.345475    0.334548          0.333675
X0.3     0.274694          0.274643    0.243424          0.286865
X0.4     0.294568          0.346758    0.276552          0.366648
X1.1     0.565734          0.293436    0.283564          0.235366
X1.2     0.286657          0.755456    0.283233          0.310544
X2.1     0.234643          0.245459    0.245434          0.343423
X2.2     0.343645          0.455659    0.343282          0.334343
X2.3     0.234643          0.245459    0.245434          0.343423

As you can see, Test3.csv has 4 issues:

It has Columns with Blank headers, but containing data e.g. X0.1.
It has Columns with Blank headers, but with no data.
It has 2 Aggregate columns with same names.
It has SG columns (many) that i need to extract only the part starting from 'SG'.

Requirements

Test1 & Test2 CSVs need to be joined on T1Id.
Only the lbl and tval columns from these file need ot be kept.
The Apos and Gpos will be used to join 2 columns created from the 1st blank Header column (containing values like X0.1) in Test3.csv file.

Code:

Sub Doit(cFiles As Collection)

    Application.DisplayAlerts = False
    Application.ScreenUpdating = False

    Dim strSQL1$, TempSG$,sFullDirectory$

    sFullDirectory = `\\xxx.xxx.xxx\client name`\

    strFF1 = "Test1.csv"
    strFF2 = "Test2.csv"
    strFF3 = "Test3.csv"

    ' Test1.csv Path: `\\xxx.xxx.xxx\client name\XYZ\sub-folder1 name`\
    strF1 = cFiles(strFF1) 

    ' Test1.csv Path: `\\xxx.xxx.xxx\client name\ABC\sub-folder2 name`\
    strF2 = cFiles(strFF2)

    ' Test1.csv Path: `\\xxx.xxx.xxx\client name\DEF\GHI\sub-folder1 name`\
    strF3 = cFiles(strFF3)

    Set oCon = CreateObject("ADODB.Connection")
    Set oRs = CreateObject("ADODB.Recordset")
    strCon = "Driver=Microsoft Access Text Driver (*.txt, *.csv);Dbq=" & sFullDirectory & ";Extensions=asc,csv,tab,txt;HDR=Yes;"

    ' Select TOP 1 row with headers from `Test3.csv`
    strSQL = "SELECT TOP 1 * FROM " & strF3 & strFF3
    oCon.Open strCon
    Set oRs = oCon.Execute(strSQL)

    i = 1
    strSQL = "SELECT "

    For Each Fld In oRs.Fields
        Select Case True
            Case Is = Fld.Name = "NoName"  'Blank header columns
                If Fld.Value <> vbNullString Then
                    strSQL = strSQL & " CLng(Replace(Left(" & Fld.Name & ", InStr(" & Fld.Name & ", '.') - 1), 'X', ''))" & " AS [gval],"
                    strSQL = strSQL & " CLng(Right(" & Fld.Name & ", Len(" & Fld.Name & ") - InStr(" & Fld.Name & ", '.')))" & " AS [pos],"
                Else
                ' Do nothing here
                End If
            Case Is = Fld.Name = "Aggregate"
                strSQL = strSQL & " CDbl([" & Fld.Name & "]) AS [Aggregate " & i & "],"
                i = i + 1
            Case Is = InStr(1, Fld.Name, "SG") > 0
                TempSG = Trim(Mid(Fld.Name, InStr(1, Fld.Name, "SG"), Len(Fld.Name)))
                strSQL = strSQL & " CDbl([" & Fld.Name & "]) AS [" & TempSG & "], "
        End Select
    Next Fld

    If Right(Trim(strSQL), 1) = "," Then strSQL = Left(Trim(strSQL), Len(Trim(strSQL)) - 1)
    strSQL = strSQL & " FROM " & strF3 & strFF3
    strSQL = strSQL & " WHERE ((NoName) <> 'Base Sizes')"

    oRs.Close

    ' This `strSQL1` will be used to join `strSQL`.
    strSQL1 = "SELECT  G.[lbl], A.[tval], Q.*"
    strSQL1 = strSQL1 & " FROM "
    strSQL1 = strSQL1 & " (SELECT G.[Gpos], A.[Apos], G.[lbl], A.[tval] FROM " & strF1 & strFF1 & "  G," & strF2 & strFF2 & "  A WHERE G.[T1Id] = A.[T1Id])  T, (" & strSQL & ")  Q "
    strSQL1 = strSQL1 & " WHERE (CLng(T.[G].[Gpos]) = CLng(Q.[gval])) AND (CLng(T.[A].[Apos]) = CLng(Q.[pos]))"
    strSQL1 = strSQL1 & " ORDER BY CLng(Q.[gval]), CDbl(Q.[Aggregate 1]) DESC, G.[lbl];"

    'CREATE RECORDSET FROM SQL STRING
    Set oRs = oCon.Execute(strSQL1)

ExitSub:
    oRs.Close
    oCon.Close
    Set oRs = Nothing
    Set oCon = Nothing

    Application.ScreenUpdating = True
    Application.DisplayAlerts = True

Exit Sub
ErrorHandler:
    MsgBox "Error No: " & Err.Number & vbCrLf & "Description: " & Err.Description, vbCritical + vbOKOnly, "An Error occurred!"
    Err.Clear
    On Error GoTo 0
    Resume ExitSub

End Sub

After Splitting the Test3.csv table would like this:

gval   pos    (blank)  Aggregate (blank) Aggregate  149_SG_Bryl_Cream   891_SG_Myo__Sky_Blue_Dress
------------------------------------------------------------------------------
0      1     0.422300          0.424658    0.458014          0.434639
0      2     0.318628          0.345475    0.334548          0.333675
0      3     0.274694          0.274643    0.243424          0.286865
0      4     0.294568          0.346758    0.276552          0.366648
1      1     0.565734          0.293436    0.283564          0.235366
1      2     0.286657          0.755456    0.283233          0.310544
2      1     0.234643          0.245459    0.245434          0.343423
2      2     0.343645          0.455659    0.343282          0.334343
2      3     0.234643          0.245459    0.245434          0.343423

Final Table: (short example)

lbl       txval                     gval    pos  Aggregate 1       Aggregate 2   SG_Bryl_Cream    SG_Myo__Sky_Blue_Dress
-------------------------------------------------------------------------------------------------------------------
Innovate  My life my rules.          0      1    0.422300          0.424658    0.458014          0.434639
Innovate  earth wind rain and fire.  0      2    0.318628          0.345475    0.334548          0.333675
Buys      my life my rules.          1      1    0.565734          0.293436    0.283564          0.235366
Buys      earth wind rain and fire.  1      2    0.286657          0.755456    0.283233          0.310544
Sales     my life my rules.          2      1    0.234643          0.245459    0.245434          0.343423
Sales     earth wind rain and fire.  2      2    0.343645          0.455659    0.343282          0.334343
Sales     Some other text.           2      3    0.234643          0.245459    0.245434          0.343423
...

Questions

Is there a way to pickup both duplicate Aggregate columns via SQL?
Is there a way to Select only SG columns from Q ie. Test3.csv via SQL?

e.g.

strSQL1 = "SELECT  G.[lbl], A.[tval], Q.* "

instead:

strSQL1 = "SELECT  G.[lbl], A.[tval], Q.* LIKE 'SG' "

回答1:

Consider running pure SQL by querying the CSV file directly with needed column aliases:

SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage]
FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t;

Additionally, query can be integrated into action queries:

Make-Table Query

SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage]
INTO [myNewtable]
FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t;

Append Query

INSERT INTO myFinalTable (Food, Bev, Meds, Average, Midpoint, OtherAverage)
SELECT t.Food, t.Bev, t.Meds, t.Average, t.Midpoint, t.Average AS [OtherAverage]
FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t;

To run with ADO in VBA, use the Jet/ACE SQL engine either with Excel or Access ODBC driver where workbook or database file source does not matter since you remotely connect to CSV:

Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")

' EXCEL DRIVER
conn.Open "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
            & "DBQ=" & ThisWorkbook.FullName & ";"
rst.Open "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
            & " FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t", conn 

' ACCESS DRIVER
conn.Open "Driver={Microsoft Access Driver (*.mdb, *.accdb)};" _
            & "DBQ=C:\Path\To\Any\Database.accdb"
rst.Open "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
            & " FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t", conn

Using the Access ODBC text driver

Set oCon = CreateObject("ADODB.Connection")
Set oRs = CreateObject("ADODB.Recordset")

sFullDirectory = "C:\Folder\To\CSV With Spaces"

strCon = "Driver=Microsoft Access Text Driver (*.txt, *.csv);" _
           & "Dbq=" & sFullDirectory & ";Extensions=asc,csv,tab,txt;HDR=Yes;"
strSQL = "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
            & " FROM [my File.csv] AS t"

oCon.Open strCon
Set oRs = oCon.Execute(strSQL)

来源：https://stackoverflow.com/questions/58309879/importing-csv-having-duplicate-column-names-with-sql-query

标签

sql

excel

vba

odbc

import-csv