问题
I have the following select statement (using sqlite3 and the pysqlite module):
self.cursor.execute("SELECT precursor_id FROM MSMS_precursor "+
"JOIN spectrum ON spectrum_id = spectrum_spectrum_id "+
"WHERE spectrum_id = spectrum_spectrum_id "+
"AND ROUND(ion_mz,9) = ? AND ROUND(scan_start_time,4) = ? "+
"AND msrun_msrun_id = ?", select_inputValues)
Which takes 55 seconds when running in Python. When running it directly on the SQLite command line it only takes 15ms. Now, I noticed that when it's in this step the Python program goes into uninterrupted sleep (31283 ndeklein 18 0 126m 24m 3192 D 1.0 0.0 2:02.50 python
, The D in top output) and it goes down from 100% CPU to around 1% CPU. Now that I noticed it during this query, I also looked at the top output when running the query I asked about here. During this time top also shows it going into uninterrupted sleep, although it switches back and forth between R and D and only slows down to around 50% (it fluctuates depending on if it's in D or R status).
So now I think that this is what is slowing my querying down (please correct me if uninterrupted sleep has nothing to do with programs speed). If this is true, how can I make sure a program does not go into this status?
Update 1:
The EXPLAIN QUERY PLAN using Python returned:
(0, 0, 1, u'SCAN TABLE spectrum (~50000 rows)')
The EXPLAIN QUERY PLAN using sqlite's command line returned:
0|0|1|SCAN TABLE spectrum (~50000 rows)
0|1|0|SEARCH TABLE MSMS_precursor USING INDEX fk_MSMS_precursor_spectrum_spectrum_id_1 (spectrum_spectrum_id=?) (~2 rows)
The EXPLAIN using Python returned:
(0, u'Trace', 0, 0, 0, u'', u'00', None)
The EXPLAIN using sqlite returned:
0|Trace|0|0|0||00|
1|Real|0|1|0|438.718658447|00|
2|Real|0|2|0|692.6345000000001|00|
3|Integer|1|3|0||00|
4|Goto|0|39|0||00|
5|OpenRead|1|33|0|13|00|
6|OpenRead|0|39|0|5|00|
7|OpenRead|2|41|0|keyinfo(1,BINARY)|00|
8|Rewind|1|35|0||00|
9|Column|1|8|5||00|
10|RealAffinity|5|0|0||00|
11|Integer|4|6|0||00|
12|Function|2|5|4|round(2)|02|
13|Ne|2|34|4||6a|
14|Column|1|12|4||00|
15|Ne|3|34|4|collseq(BINARY)|6c|
16|Column|1|0|8||00|
17|IsNull|8|34|0||00|
18|Affinity|8|1|0|d|00|
19|SeekGe|2|34|8|1|00|
20|IdxGE|2|34|8|1|01|
21|IdxRowid|2|7|0||00|
22|Seek|0|7|0||00|
23|Column|1|0|9||00|
24|Column|2|0|10||00|
25|Ne|10|33|9|collseq(BINARY)|6b|
26|Column|0|1|5||00|
27|RealAffinity|5|0|0||00|
28|Integer|9|6|0||00|
29|Function|2|5|11|round(2)|02|
30|Ne|1|33|11||6a|
31|Column|0|0|13||00|
32|ResultRow|13|1|0||00|
33|Next|2|20|0||00|
34|Next|1|9|0||01|
35|Close|1|0|0||00|
36|Close|0|0|0||00|
37|Close|2|0|0||00|
38|Halt|0|0|0||00|
39|Transaction|0|0|0||00|
40|VerifyCookie|0|31|0||00|
41|TableLock|0|33|0|spectrum|00|
42|TableLock|0|39|0|MSMS_precursor|00|
43|Goto|0|5|0||00|
And iostat returned:
io-bash-3.2$ iostat
Linux 2.6.18-194.26.1.el5 (ningal.cluster.lifesci.ac.uk) 06/04/2012
avg-cpu: %user %nice %system %iowait %steal %idle
14.35 0.00 0.30 0.01 0.00 85.34
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.16 4.55 17.22 1520566 5752802
sda1 0.00 0.02 0.00 5074 34
sda2 1.16 4.53 17.22 1515184 5752768
sdb 0.00 0.02 0.00 5108 0
dm-0 2.29 3.88 16.70 1297226 5579336
dm-1 0.00 0.00 0.00 928 0
dm-2 0.11 0.65 0.52 216106 173432
Update 2
I migrated the database to MySQL and here the query only takes about 0.001 second, even though for all the other queries I'm doing it is actually slower than sqlite (I optimized for sqlite so this might or might not be surprising).
回答1:
As I mentioned in an answer to a prior question you asked, did you give the sqlite module apsw a try? From the website:
APSW is a Python wrapper for the SQLite embedded relational database engine. In contrast to other wrappers such as pysqlite it focuses on being a minimal layer over SQLite attempting just to translate the complete SQLite API into Python. The documentation has a section on the differences between APSW and pysqlite.
I tried it myself and it seems to indeed reflect better how SQL statements are executed by the "real" Sqlite (i.e. the client or the C library).
回答2:
There is a performance issue with SQLite and Python. Read this thread for more information. There are a few suggestions there - try them, it might work - like adding an index to your join fields or using pysqlite.
http://www.mail-archive.com/python-list@python.org/msg253067.html
来源:https://stackoverflow.com/questions/10854817/is-uninterruptible-sleep-the-cause-of-my-python-program-being-really-slow-and-i