OleDb connection to Excel; how do I select fixed width, unbounded height?

前端 未结 5 683
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-13 09:14

I\'m using OleDb to select data from excel spreadsheets. Each spreadsheet can contain many small tables, and possibly furniture like titles and labels. So it might look like

相关标签:
5条回答
  • 2021-01-13 10:01

    You say that in a previous step, the users have selected the headers. Who's to say that below the region of current interest there aren't a few blank rows followed by another unrelated table? I suggest that you get them to select the whole range that they are interested in -- that should fix both problems.

    0 讨论(0)
  • 2021-01-13 10:05

    We read the entire spreadsheet (ie: SELECT * FROM [Sheet1$]) and handle everything else in our application code. It's easy enough to race through the resultant OleDbDataReader to get to the starting point of your data and start processing.

    It may not be the absolutely fastest way to suck data from Excel, but it is reliable.

    0 讨论(0)
  • 2021-01-13 10:06

    I would go with the solution from John ( reading 1000 rows at a time ).

    If you have Excel installed you could also use OLE automation.

    I have recorded a simple macro in Excel which select the last cell in the current table.

    
    Sub Macro2()
        Range("B14").Select
        Selection.End(xlDown).Select
        //MsgBox ActiveCell.Address, vbOKOnly
    End Sub
    
    

    Now you just need to translate this in C# and read the address of the active cell.

    0 讨论(0)
  • 2021-01-13 10:07

    Couple possible solutions:

    1. Put your tables on separate worksheets, then simply query the whole worksheet.
    2. Give each table in Excel a name (in Excel 2007, select the table, right-click, and choose Name a range...), then in your query, use this name instead of "Sheet1$B14:D65535".

    Hope that helps.

    EDIT

    Here's a third idea:

    I'm not sure what you're using to query your database, but if your query engine supports variables (like Sql Server, for example) you could store the result of...

    SELECT COUNT(*) FROM NameOfServer...Sheet1$

    ...in a variable called @UsedRowCount, that will give you the number of rows actually used in the worksheet. So, @UsedRowCount = LastRowUsed - InitialBlankRows.

    You might then be able to use string concatenation to replace "65535" with @UsedRowCount + @InitialBlankRows. You would have to set @InitialBlankRows to a constant (in your example, it would be 3, since the heading row of the first table is located at Row 4).

    0 讨论(0)
  • 2021-01-13 10:14

    Pre-requisite: you can easily determine in your code what the maximum number number of rows is.

    Assuming (1) there's a big overhead per SELECT, so SELECTing a row at a time is slow (2) SELECTing 64K or 8M rows (even if blank) is slow ... so you want to see if somewhere in the middle can be faster. Try this:

    Select CHUNKSIZE (e.g. 100 or 1000) rows at a time (less when you would otherwise over-run MAX_ROWS). Scan each chunk for the blank row that marks end-of-data.

    UPDATE: Actually answering the explicit questions:

    Q: Does anyone know of a way to write a query that says either;

    Q1: 'select everything down and right of B14'?

    A1: select * from [Sheet1$B12:] doesn't work. You would have to do ...B12:IV in Excel 2003 and whatever it is in Excel 2007. However you don't need that because you know what your rightmost column is; see below.

    Q2: 'select everything in columns B->D'

    A2: select * from [Sheet1$B:D]

    Q3: 'select B12:D*' where * means 'everything you can'

    A3: select * from [Sheet1$B12:D]

    Tested with Python 2.5 using the following code:

    import win32com.client
    import sys
    filename, sheetname, range = sys.argv[1:4]
    DSN= """
        PROVIDER=Microsoft.Jet.OLEDB.4.0;
        DATA SOURCE=%s;
        Extended Properties='Excel 8.0;READONLY=true;IMEX=1';
        """ % filename
    conn = win32com.client.Dispatch("ADODB.Connection")
    conn.Open(DSN)
    rs = win32com.client.Dispatch("ADODB.Recordset")
    sql = (
        "SELECT * FROM [Excel 8.0;HDR=NO;IMEX=1;Database=%s;].[%s$%s]"
        % (filename, sheetname, range)
        )
    rs.Open(sql, conn)
    nrows = 0
    while not rs.EOF:
        nrows += 1
        nf = rs.Fields.Count
        values = [rs.Fields.Item(i).Value for i in xrange(nf)]
        print nrows, values
        if not any(value is not None for value in values):
            print "sentinel found"
            break
        rs.MoveNext()
    rs.Close()
    conn.Close()
    
    0 讨论(0)
提交回复
热议问题