I have 3 csv files located in different network folders. The network folders/sub-folders may have spaces in them. I want to join these 3 csv files to create a single ADO Recordset containing required columns.
Test1.csv (I have excluded unnecessary columns from all csv's)
T1Id | Gpos | lbl
-----------------------
1001 | 0 | Innovate
1002 | 1 | Buys
1003 | 2 | Sales
1004 | 3 | Forecasts
1005 | 4 | Usage
1006 | 5 | Forum
Test2.csv: (I have excluded unnecessary columns from all csv's)
T2Id | T1Id | Apos | tval
-----------------------------------
382 | 1001 | 1 | my life my rules.
203 | 1001 | 2 | earth wind rain and fire.
658 | 1002 | 1 | wealth power blood desire.
200 | 1003 | 1 | one good to live for.
301 | 1003 | 2 | before we die.
439 | 1004 | 1 | one taste to glory
795 | 1004 | 2 | one mouthful of sky.
494 | 1004 | 3 | some other text.
Test3.csv: (I have excluded unnecessary columns from all csv's)
(blank) Aggregate (blank) Aggregate 149_SG_Bryl_Cream 891_SG_Myo__Sky_Blue_Dress
------------------------------------------------------------------------------
X0.1 0.422300 0.424658 0.458014 0.434639
X0.2 0.318628 0.345475 0.334548 0.333675
X0.3 0.274694 0.274643 0.243424 0.286865
X0.4 0.294568 0.346758 0.276552 0.366648
X1.1 0.565734 0.293436 0.283564 0.235366
X1.2 0.286657 0.755456 0.283233 0.310544
X2.1 0.234643 0.245459 0.245434 0.343423
X2.2 0.343645 0.455659 0.343282 0.334343
X2.3 0.234643 0.245459 0.245434 0.343423
As you can see, Test3.csv has 4 issues:
- It has Columns with Blank headers, but containing data e.g.
X0.1
. - It has Columns with Blank headers, but with
no data
. - It has 2
Aggregate
columns with same names. - It has
SG
columns (many) that i need to extractonly the part starting from 'SG'.
Requirements
Test1
&Test2
CSVs need to be joined onT1Id
.- Only the
lbl
andtval
columns from these file need ot be kept. - The
Apos
andGpos
will be used to join 2 columns created from the 1st blank Header column (containing values likeX0.1
) inTest3.csv
file.
Code:
Sub Doit(cFiles As Collection)
Application.DisplayAlerts = False
Application.ScreenUpdating = False
Dim strSQL1$, TempSG$,sFullDirectory$
sFullDirectory = `\\xxx.xxx.xxx\client name`\
strFF1 = "Test1.csv"
strFF2 = "Test2.csv"
strFF3 = "Test3.csv"
' Test1.csv Path: `\\xxx.xxx.xxx\client name\XYZ\sub-folder1 name`\
strF1 = cFiles(strFF1)
' Test1.csv Path: `\\xxx.xxx.xxx\client name\ABC\sub-folder2 name`\
strF2 = cFiles(strFF2)
' Test1.csv Path: `\\xxx.xxx.xxx\client name\DEF\GHI\sub-folder1 name`\
strF3 = cFiles(strFF3)
Set oCon = CreateObject("ADODB.Connection")
Set oRs = CreateObject("ADODB.Recordset")
strCon = "Driver=Microsoft Access Text Driver (*.txt, *.csv);Dbq=" & sFullDirectory & ";Extensions=asc,csv,tab,txt;HDR=Yes;"
' Select TOP 1 row with headers from `Test3.csv`
strSQL = "SELECT TOP 1 * FROM " & strF3 & strFF3
oCon.Open strCon
Set oRs = oCon.Execute(strSQL)
i = 1
strSQL = "SELECT "
For Each Fld In oRs.Fields
Select Case True
Case Is = Fld.Name = "NoName" 'Blank header columns
If Fld.Value <> vbNullString Then
strSQL = strSQL & " CLng(Replace(Left(" & Fld.Name & ", InStr(" & Fld.Name & ", '.') - 1), 'X', ''))" & " AS [gval],"
strSQL = strSQL & " CLng(Right(" & Fld.Name & ", Len(" & Fld.Name & ") - InStr(" & Fld.Name & ", '.')))" & " AS [pos],"
Else
' Do nothing here
End If
Case Is = Fld.Name = "Aggregate"
strSQL = strSQL & " CDbl([" & Fld.Name & "]) AS [Aggregate " & i & "],"
i = i + 1
Case Is = InStr(1, Fld.Name, "SG") > 0
TempSG = Trim(Mid(Fld.Name, InStr(1, Fld.Name, "SG"), Len(Fld.Name)))
strSQL = strSQL & " CDbl([" & Fld.Name & "]) AS [" & TempSG & "], "
End Select
Next Fld
If Right(Trim(strSQL), 1) = "," Then strSQL = Left(Trim(strSQL), Len(Trim(strSQL)) - 1)
strSQL = strSQL & " FROM " & strF3 & strFF3
strSQL = strSQL & " WHERE ((NoName) <> 'Base Sizes')"
oRs.Close
' This `strSQL1` will be used to join `strSQL`.
strSQL1 = "SELECT G.[lbl], A.[tval], Q.*"
strSQL1 = strSQL1 & " FROM "
strSQL1 = strSQL1 & " (SELECT G.[Gpos], A.[Apos], G.[lbl], A.[tval] FROM " & strF1 & strFF1 & " G," & strF2 & strFF2 & " A WHERE G.[T1Id] = A.[T1Id]) T, (" & strSQL & ") Q "
strSQL1 = strSQL1 & " WHERE (CLng(T.[G].[Gpos]) = CLng(Q.[gval])) AND (CLng(T.[A].[Apos]) = CLng(Q.[pos]))"
strSQL1 = strSQL1 & " ORDER BY CLng(Q.[gval]), CDbl(Q.[Aggregate 1]) DESC, G.[lbl];"
'CREATE RECORDSET FROM SQL STRING
Set oRs = oCon.Execute(strSQL1)
ExitSub:
oRs.Close
oCon.Close
Set oRs = Nothing
Set oCon = Nothing
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Exit Sub
ErrorHandler:
MsgBox "Error No: " & Err.Number & vbCrLf & "Description: " & Err.Description, vbCritical + vbOKOnly, "An Error occurred!"
Err.Clear
On Error GoTo 0
Resume ExitSub
End Sub
After Splitting the Test3.csv table would like this:
gval pos (blank) Aggregate (blank) Aggregate 149_SG_Bryl_Cream 891_SG_Myo__Sky_Blue_Dress
------------------------------------------------------------------------------
0 1 0.422300 0.424658 0.458014 0.434639
0 2 0.318628 0.345475 0.334548 0.333675
0 3 0.274694 0.274643 0.243424 0.286865
0 4 0.294568 0.346758 0.276552 0.366648
1 1 0.565734 0.293436 0.283564 0.235366
1 2 0.286657 0.755456 0.283233 0.310544
2 1 0.234643 0.245459 0.245434 0.343423
2 2 0.343645 0.455659 0.343282 0.334343
2 3 0.234643 0.245459 0.245434 0.343423
Final Table: (short example)
lbl txval gval pos Aggregate 1 Aggregate 2 SG_Bryl_Cream SG_Myo__Sky_Blue_Dress
-------------------------------------------------------------------------------------------------------------------
Innovate My life my rules. 0 1 0.422300 0.424658 0.458014 0.434639
Innovate earth wind rain and fire. 0 2 0.318628 0.345475 0.334548 0.333675
Buys my life my rules. 1 1 0.565734 0.293436 0.283564 0.235366
Buys earth wind rain and fire. 1 2 0.286657 0.755456 0.283233 0.310544
Sales my life my rules. 2 1 0.234643 0.245459 0.245434 0.343423
Sales earth wind rain and fire. 2 2 0.343645 0.455659 0.343282 0.334343
Sales Some other text. 2 3 0.234643 0.245459 0.245434 0.343423
...
Questions
- Is there a way to pickup both duplicate
Aggregate
columns via SQL? - Is there a way to Select only
SG
columns fromQ
ie.Test3.csv
via SQL?
e.g.
strSQL1 = "SELECT G.[lbl], A.[tval], Q.* "
instead:
strSQL1 = "SELECT G.[lbl], A.[tval], Q.* LIKE 'SG' "
Consider running pure SQL by querying the CSV file directly with needed column aliases:
SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage]
FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t;
Additionally, query can be integrated into action queries:
Make-Table Query
SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage]
INTO [myNewtable]
FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t;
Append Query
INSERT INTO myFinalTable (Food, Bev, Meds, Average, Midpoint, OtherAverage)
SELECT t.Food, t.Bev, t.Meds, t.Average, t.Midpoint, t.Average AS [OtherAverage]
FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t;
To run with ADO in VBA, use the Jet/ACE SQL engine either with Excel or Access ODBC driver where workbook or database file source does not matter since you remotely connect to CSV:
Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
' EXCEL DRIVER
conn.Open "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
& "DBQ=" & ThisWorkbook.FullName & ";"
rst.Open "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
& " FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t", conn
' ACCESS DRIVER
conn.Open "Driver={Microsoft Access Driver (*.mdb, *.accdb)};" _
& "DBQ=C:\Path\To\Any\Database.accdb"
rst.Open "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
& " FROM [text;database=C:\Folder\To\CSV With Spaces].[my File].csv AS t", conn
Using the Access ODBC text driver
Set oCon = CreateObject("ADODB.Connection")
Set oRs = CreateObject("ADODB.Recordset")
sFullDirectory = "C:\Folder\To\CSV With Spaces"
strCon = "Driver=Microsoft Access Text Driver (*.txt, *.csv);" _
& "Dbq=" & sFullDirectory & ";Extensions=asc,csv,tab,txt;HDR=Yes;"
strSQL = "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
& " FROM [my File.csv] AS t"
oCon.Open strCon
Set oRs = oCon.Execute(strSQL)
来源:https://stackoverflow.com/questions/58309879/importing-csv-having-duplicate-column-names-with-sql-query