I\'m using OleDb to select data from excel spreadsheets. Each spreadsheet can contain many small tables, and possibly furniture like titles and labels. So it might look like
Pre-requisite: you can easily determine in your code what the maximum number number of rows is.
Assuming (1) there's a big overhead per SELECT, so SELECTing a row at a time is slow (2) SELECTing 64K or 8M rows (even if blank) is slow ... so you want to see if somewhere in the middle can be faster. Try this:
Select CHUNKSIZE (e.g. 100 or 1000) rows at a time (less when you would otherwise over-run MAX_ROWS). Scan each chunk for the blank row that marks end-of-data.
UPDATE: Actually answering the explicit questions:
Q: Does anyone know of a way to write a query that says either;
Q1: 'select everything down and right of B14'?
A1: select * from [Sheet1$B12:] doesn't work. You would have to do ...B12:IV in Excel 2003 and whatever it is in Excel 2007. However you don't need that because you know what your rightmost column is; see below.
Q2: 'select everything in columns B->D'
A2: select
* from [Sheet1$B:D]
Q3: 'select B12:D*' where * means 'everything you can'
A3: select * from [Sheet1$B12:D]
Tested with Python 2.5 using the following code:
import win32com.client
import sys
filename, sheetname, range = sys.argv[1:4]
DSN= """
PROVIDER=Microsoft.Jet.OLEDB.4.0;
DATA SOURCE=%s;
Extended Properties='Excel 8.0;READONLY=true;IMEX=1';
""" % filename
conn = win32com.client.Dispatch("ADODB.Connection")
conn.Open(DSN)
rs = win32com.client.Dispatch("ADODB.Recordset")
sql = (
"SELECT * FROM [Excel 8.0;HDR=NO;IMEX=1;Database=%s;].[%s$%s]"
% (filename, sheetname, range)
)
rs.Open(sql, conn)
nrows = 0
while not rs.EOF:
nrows += 1
nf = rs.Fields.Count
values = [rs.Fields.Item(i).Value for i in xrange(nf)]
print nrows, values
if not any(value is not None for value in values):
print "sentinel found"
break
rs.MoveNext()
rs.Close()
conn.Close()