问题
This is my first post. I am new at VBA but I'm pretty familiar with VB6. I wrote some code that takes the text from nasdaq and pastes it into a worksheet. It finally works. There is a lot of extraneous data scattered around above and below The Annual Income Statement. I would like to parse out and place the important data in a place where I can automate analysis. I'm thinking that I could search the cells until I find: Annual Income Statement and extract to a different sheet. Any suggestions would be very appreciated. Here's what I've got:
Sub TransferWebData()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.Navigate "http://www.nasdaq.com/symbol/gd/financials"
Do Until .ReadyState = 4: DoEvents: Loop
IE.ExecWB 17, 0 'SelectAll
IE.ExecWB 12, 2 'Copy selection
Sheets("GD").Range("A1").Select
Sheets("GD").PasteSpecial Format:="HTML", Link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True
IE.Quit
End With
End Sub
回答1:
THis should keep you busy.
Set references to Microsoft HTML Object Library and Microsoft Internet Controls.
In Google Chrome I navigated to the webpage and use inspect element to open WebKit and copy the xpath to the element. This gave me and outline to compose a rough draft of my function. After an hour and a half of tedious debugging I was able to extract the data into an array.
//*[@id="financials-iframe-wrap"]/div1/table/tbody/tr1/td2
Sub TransferWebData()
Dim Data
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.Navigate "http://www.nasdaq.com/symbol/gd/financials"
Do Until .ReadyState = 4: DoEvents: Loop
Data = getFinancialsArray(IE.document)
With Worksheets("GD")
.Cells.ClearContents
.Range("A1").Resize(UBound(Data, 1) + 1, UBound(Data, 2)).Value = Data
.Columns.AutoFit
End With
IE.Quit
End With
End Sub
' //*[@id="financials-iframe-wrap"]/div[1]/table/tbody/tr[1]/td[2]
Function getFinancialsArray(doc As HTMLDocument)
Dim Data
Dim x As Long, y As Long, y1 As Long
Dim divfinancials As HTMLDivElement, div1 As HTMLDivElement
Dim tbl As HTMLTable, allRows
Set divfinancials = doc.getElementById("financials-iframe-wrap")
Set div1 = divfinancials.getElementsByTagName("div").Item(0)
Set tbl = div1.getElementsByTagName("table").Item(0)
Set allRows = tbl.getElementsByTagName("tr")
Dim s As String
ReDim Data(allRows.Length, 10)
For y = 0 To allRows.Length - 1
If Len(Trim(allRows.Item(y).innerText)) Then 'If the row has data
For x = 0 To allRows.Item(y).Cells.Length - 1
Data(y1, x) = allRows.Item(y).Cells(x).innerText
Next
y1 = y1 + 1
End If
Next
getFinancialsArray = Data
End Function
Output
回答2:
Take a look at the below example, retrieving the data using XHR and RegEx, without IE automation:
Option Explicit
Sub GetDataFromNasdaq()
Dim sContent As String
Dim l As Long
Dim i As Long
Dim j As Long
Dim cMatches As Object
Dim r() As String
' retrieve html content
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.nasdaq.com/symbol/gd/financials", False
.Send
sContent = .ResponseText
End With
' parse with regex
With CreateObject("VBScript.RegExp")
.MultiLine = True
.IgnoreCase = True
' simplification
.Global = True
.Pattern = "<(\w*) .*?>"
sContent = .Replace(sContent, "<$1>")
.Pattern = ">\s*<"
sContent = .Replace(sContent, "><")
.Pattern = "<thead>|<tbody>|</thead>|</tbody>"
sContent = .Replace(sContent, "")
.Pattern = "<(/?)th>"
sContent = .Replace(sContent, "<$1td>")
' remove nested tables from target table
.Global = False
.Pattern = "(Annual Income Statement[\s\S]*?<table.*?>(?:(?!</table)[\s\S])*)<table.*?>(?:(?!<table|</table)[\s\S])*</table>"
Do
l = Len(sContent)
sContent = .Replace(sContent, "$1")
Loop Until l = Len(sContent)
' trim target table
.Pattern = "Annual Income Statement[\s\S]*?(<table.*?>(?:(?!</table)[\s\S])*</table>)"
sContent = .Execute(sContent).Item(0).SubMatches(0)
' match rows
.Global = True
.Pattern = "<tr><td>(.*?)</td>(?:<td>.*?</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td>)?</tr>"
Set cMatches = .Execute(sContent)
' populate resulting array
ReDim r(1 To cMatches.Count, 1 To 5)
For i = 1 To cMatches.Count
For j = 1 To 5
r(i, j) = cMatches(i - 1).SubMatches(j - 1)
Next
Next
End With
' ouput resulting array
With ThisWorkbook.Sheets(1)
Cells.Delete
Output .Cells(1, 1), r
End With
End Sub
Sub Output(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1 _
)
'.NumberFormat = "@"
.Value = aCells
.Columns.AutoFit
End With
End With
End Sub
It takes about 2 seconds to complete the processing, the output is as follows:
回答3:
You could also import Income Statement line items for multiple stock tickers.
Sub ImportYrlyFS()
ThisSheet = ActiveSheet.Name
Range("A2").Select
Do Until ActiveCell.Value = ""
Symbol = ActiveCell.Value
Sheets(ThisSheet).Select
Sheets.Add
Dim QT As QueryTable
Symbol = UCase(Symbol)
myurl = "http://finance.yahoo.com/q/is?s=" & Symbol & "+Income+Statement&annual"
Set QT = ActiveSheet.QueryTables.Add( _
Connection:="URL;" & myurl, _
Destination:=Range("A1"))
With QT
.WebSelectionType = xlSpecifiedTables
.WebTables = "9"
.Refresh BackgroundQuery:=False
End With
QT.Delete
Sheets(ActiveSheet.Name).Name = Symbol
Sheets(ThisSheet).Select
ActiveCell.Offset(1, 0).Select
Loop
End Sub
Your sheet should look like this.
来源:https://stackoverflow.com/questions/40458725/transferring-webpage-data-to-an-excel-worksheet-using-vba