问题
I have a lisp file which does lots of sampling, file I/O and arithmetic in a loop. (I do particle filtering in common lisp.) I am compiling my lisp file using the compile-file
command. I also use the (declaim (optimize (speed 3) (debug 0) (safety 0)))
at the beginning of my lisp file as I want to have my results as fast as possible.
I use (time (load "/....../myfile.lisp")
and (time (load "/......./myfile.dx64fsl")
to measure speed. The issue is that compiling does not bring any advantage to me. There is no improvement. Do I do something wrong? Is there a way to improve things? Speed is the most important criterion so I can sacrifice a lot in order to get a fast response. I have no idea about this kind of issues so any help would be appreciated.
Moreover, when I increase the number of particles (each particle is a vector of size ~40) to like 10000 the code gets really slow, so there may be some memory issues too.
Thank you very much in advance.
edit: This is the profiling results with 1000 particles and 50 iterations.
(LOAD "/.../myfile.dx64fsl") took 77,488,810 microseconds (77.488810 seconds) to run
with 8 available CPU cores.
During that period, 44,925,468 microseconds (44.925470 seconds) were spent in user mode
32,005,440 microseconds (32.005440 seconds) were spent in system mode
2,475,291 microseconds (2.475291 seconds) was spent in GC.
1,701,028,429 bytes of memory allocated.
1,974 minor page faults, 0 major page faults, 0 swaps.
; Warning: Function CREATE-MY-DBN has been redefined, so times may be inaccurate.
; MONITOR it again to record calls to the new definition.
; While executing: MONITOR::MONITOR-INFO-VALUES, in process repl-thread(10).
Cons
% % Per Total Total
Function Time Cons Calls Sec/Call Call Time Cons
------------------------------------------------------------------------------------------
SAMPLE: 25.61 26.14 2550000 0.000005 174 13.526 443040000
DISCRETE-PARENTS: 19.66 3.12 4896000 0.000002 11 10.384 52800000
LINEAR-GAUSSIAN-MEAN: 8.86 3.12 1632000 0.000003 32 4.679 52800000
DISCRETE-PARENT-VALUES: 7.47 12.33 3264000 0.000001 64 3.946 208896000
LIST-DIFFERENCE: 6.41 25.69 6528000 0.000001 67 3.384 435392000
CONTINUOUS-PARENTS: 6.33 0.00 1632000 0.000002 0 3.343 0
PF-STEP: 5.17 0.23 48 0.056851 80080 2.729 3843840
CONTINUOUS-PARENT-VALUES: 4.13 7.20 1632000 0.000001 75 2.184 122048000
TABLE-LOOKUP: 3.85 8.39 2197000 0.000001 65 2.035 142128000
PHI-INVERSE: 3.36 0.00 1479000 0.000001 0 1.777 0
PHI-INTEGRAL: 3.32 1.38 2958000 0.000001 8 1.755 23344000
PARENT-VALUES: 2.38 10.65 1122000 0.000001 161 1.259 180528016
CONDITIONAL-PROBABILITY: 1.41 0.00 255000 0.000003 0 0.746 0
------------------------------------------------------------------------------------------
TOTAL: 97.96 98.24 30145048 51.746 1664819856
Estimated monitoring overhead: 21.11 seconds
Estimated total monitoring overhead: 23.93 seconds
with 10000 particles and 50 iterations:
(LOAD "/.../myfile.dx64fsl") took 809,931,702 microseconds (809.931700 seconds) to run
with 8 available CPU cores.
During that period, 476,627,937 microseconds (476.627930 seconds) were spent in user mode
328,716,555 microseconds (328.716550 seconds) were spent in system mode
54,274,625 microseconds (54.274624 seconds) was spent in GC.
16,973,590,588 bytes of memory allocated.
10,447 minor page faults, 417 major page faults, 0 swaps.
; Warning: Funtion CREATE-MY-DBN has been redefined, so times may be inaccurate.
; MONITOR it again to record calls to the new definition.
; While executing: MONITOR::MONITOR-INFO-VALUES, in process repl-thread(10).
Cons
% % Per Total Total
Function Time Cons Calls Sec/Call Call Time Cons
-------------------------------------------------------------------------------------------
SAMPLE: 25.48 26.11 25500000 0.000006 174 144.211 4430400000
DISCRETE-PARENTS: 18.41 3.11 48960000 0.000002 11 104.179 528000000
LINEAR-GAUSSIAN-MEAN: 8.61 3.11 16320000 0.000003 32 48.751 528000000
LIST-DIFFERENCE: 7.57 25.66 65280000 0.000001 67 42.823 4353920000
DISCRETE-PARENT-VALUES: 7.50 12.31 32640000 0.000001 64 42.456 2088960000
CONTINUOUS-PARENTS: 5.83 0.00 16320000 0.000002 0 32.980 0
PF-STEP: 5.05 0.23 48 0.595564 800080 28.587 38403840
TABLE-LOOKUP: 4.52 8.38 21970000 0.000001 65 25.608 1421280000
CONTINUOUS-PARENT-VALUES: 4.25 7.19 16320000 0.000001 75 24.041 1220480000
PHI-INTEGRAL: 3.15 1.38 29580000 0.000001 8 17.849 233440000
PHI-INVERSE: 3.12 0.00 14790000 0.000001 0 17.641 0
PARENT-VALUES: 2.87 10.64 11220000 0.000001 161 16.246 1805280000
CONDITIONAL-PROBABILITY: 1.36 0.00 2550000 0.000003 0 7.682 0
-------------------------------------------------------------------------------------------
TOTAL: 97.71 98.12 301450048 553.053 16648163840
Estimated monitoring overhead: 211.08 seconds
Estimated total monitoring overhead: 239.13 seconds
回答1:
Typical arithmetic stuff in Common Lisp can be slow. Improving it is possible, but needs a bit of knowledge.
Reasons:
- Common Lisp numbers are not what the machine provides (bignums, rational, complex, ...)
- automatic change from fixnum to bignum and back
- generic math operations
- tagging uses bits of the word size
- consing of numbers
One thing you can see from the profiling output is that you generate 1.7 GB garbage. This is a typical hint that your number operations cons. To get rid of that is often not that easy. It is just a guess on my side, that these are number operations.
Ken Anderson (unfortunately he died a few years ago) has some advice on his web site for improving numeric software: http://openmap.bbn.com/~kanderso/performance/
A usual solution is to give the code to some experienced Lisp developer which knows a bit about the compiler used and/or optimizations.
回答2:
First of all, never ever declare (speed 3)
together with (safety 0)
at the top-level, i.e. globally. This will sooner or later come back and bite off your head. At these levels, most common lisp compilers do less safety checking than C-compilers. For instnace, some lisps drop checking for interrupt signals in (safety 0)
code. Next, (safety 0)
very rarely give noticeable gains. I would declare (speed 3)(safety 1)(debug 1)
in hot functions, possibly going to (debug 0)
if this brings a noticeable gain.
Otherwise, without actually looking at some of the actual code, it's kinda hard to come up with suggestions. Looking from time() it seems like the GC-pressure is kinda high. Make sure that you use open-coded arithmetic in hot functions, and don't needlessly box floats or ints. Use (disassemble 'my-expensive-function)
to take a close look at the code the compiler generated. SBCL will provide lots of helpful output when compiling with high priority on speed and it can be worthwhile to try to eliminate some of these warnings.
It is also important that you use a fast data structure for representing particles, using instantiatiable arrays and macrology if needed.
回答3:
If all code contained in "myfile.lisp" is the parts where you do computations, no, compiling that file will not noticeably improve your run-time. The difference between the two cases is probably going to be "we compile a few loops", calling functions that are either compiled or interpreted in both cases.
To gain improvement from compiling, you need to also compile the code that is called. You may also need to type-annotate your code, in order to allow your compiler to optimize better. SBCL has quite good compiler diagnostics for missed annotations (to the point that people complain it is too verbose while compiling).
As far as the loading-time goes, it may actually be the case that loading a compiled file takes longer (there's a whole slew of essentially-dynamic-linking happening, if you're not changing your code frequently, but changing the data you process, it may be an advantage to prepare a new core file with your particle-filtering code already in-core).
回答4:
Several points:
Try to move file I/O out of the loop if possible; read data into memory in batch before iteration. File I/O is several magnitudes slower than memory access.
Try SBCL if execution speed is important for you.
Ten times increase in your input results in roughly ten times execution time increase, which is linear, so your algorithm seems fine; just need to work on your constant factor.
Take advantage of Lisp work flow: edit function, compile function and test run instead of edit file, compile file and test. The difference will become pronounced when your projects get bigger (or when you try SBCL, which will take longer to analyze/optimize your programs to produce faster code).
回答5:
Welcome to Clozure Common Lisp Version 1.7-r14925M (DarwinX8664)!
? (inspect 'print)
[0] PRINT
[1] Type: SYMBOL
[2] Class: #<BUILT-IN-CLASS SYMBOL>
Function
[3] EXTERNAL in package: #<Package "COMMON-LISP">
[4] Print name: "PRINT"
[5] Value: #<Unbound>
[6] Function: #<Compiled-function PRINT #x30000011C9DF>
[7] Arglist: (CCL::OBJECT &OPTIONAL STREAM)
[8] Plist: NIL
Inspect> (defun test (x) (+ x 1))
TEST
Inspect> (inspect 'test)
[0] TEST
[1] Type: SYMBOL
[2] Class: #<BUILT-IN-CLASS SYMBOL>
Function
[3] INTERNAL in package: #<Package "COMMON-LISP-USER">
[4] Print name: "TEST"
[5] Value: #<Unbound>
[6] Function: #<Compiled-function TEST #x302000C5EFFF>
[7] Arglist: (X)
[8] Plist: NIL
Inspect>
Note that both #'print and #'test are listed as 'compiled'. This means that the only performance difference between loading a .lisp file, and loading a compiled file, is the compile time. Which I'm assuming is not the bottleneck in your scenario. It's usually not, unless you are using a bunch of macros, and performing a code transformation is the primary purpose of your program.
This is one of the main reasons that I don't ever deal with compiled lisp files. I just load all shared libraries/packages I need up in my core file, and then load a few specific .lisp functions/files on top of that when I'm working on a particular project. And, at least for SBCL and CCL for me, everything is listed as 'compiled'.
来源:https://stackoverflow.com/questions/9561109/common-lisp-compiling-and-execution-time