OleDb connection to Excel; how do I select fixed width, unbounded height?

前端 未结 5 713
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-13 09:14

I\'m using OleDb to select data from excel spreadsheets. Each spreadsheet can contain many small tables, and possibly furniture like titles and labels. So it might look like

5条回答
  •  猫巷女王i
    2021-01-13 10:14

    Pre-requisite: you can easily determine in your code what the maximum number number of rows is.

    Assuming (1) there's a big overhead per SELECT, so SELECTing a row at a time is slow (2) SELECTing 64K or 8M rows (even if blank) is slow ... so you want to see if somewhere in the middle can be faster. Try this:

    Select CHUNKSIZE (e.g. 100 or 1000) rows at a time (less when you would otherwise over-run MAX_ROWS). Scan each chunk for the blank row that marks end-of-data.

    UPDATE: Actually answering the explicit questions:

    Q: Does anyone know of a way to write a query that says either;

    Q1: 'select everything down and right of B14'?

    A1: select * from [Sheet1$B12:] doesn't work. You would have to do ...B12:IV in Excel 2003 and whatever it is in Excel 2007. However you don't need that because you know what your rightmost column is; see below.

    Q2: 'select everything in columns B->D'

    A2: select * from [Sheet1$B:D]

    Q3: 'select B12:D*' where * means 'everything you can'

    A3: select * from [Sheet1$B12:D]

    Tested with Python 2.5 using the following code:

    import win32com.client
    import sys
    filename, sheetname, range = sys.argv[1:4]
    DSN= """
        PROVIDER=Microsoft.Jet.OLEDB.4.0;
        DATA SOURCE=%s;
        Extended Properties='Excel 8.0;READONLY=true;IMEX=1';
        """ % filename
    conn = win32com.client.Dispatch("ADODB.Connection")
    conn.Open(DSN)
    rs = win32com.client.Dispatch("ADODB.Recordset")
    sql = (
        "SELECT * FROM [Excel 8.0;HDR=NO;IMEX=1;Database=%s;].[%s$%s]"
        % (filename, sheetname, range)
        )
    rs.Open(sql, conn)
    nrows = 0
    while not rs.EOF:
        nrows += 1
        nf = rs.Fields.Count
        values = [rs.Fields.Item(i).Value for i in xrange(nf)]
        print nrows, values
        if not any(value is not None for value in values):
            print "sentinel found"
            break
        rs.MoveNext()
    rs.Close()
    conn.Close()
    

提交回复
热议问题