问题
I have a program doing a LOT of iteration (thousands to millions to hundreds of millions). It's starting to take quite a lot of time (few minutes, to a few days), and despite all my effort to optimize it, I'm still a bit stuck.
Profile: using cProfile via console call
ncalls tottime percall cumtime percall filename:lineno(function)
500/1 0.018 0.000 119.860 119.860 {built-in method builtins.exec}
1 0.006 0.006 119.860 119.860 Simulations_profiling.py:6(<module>)
6/3 0.802 0.134 108.302 36.101 Simulations_profiling.py:702(optimization)
38147 0.581 0.000 103.411 0.003 Simulations_profiling.py:270(overlap_duo_combination)
107691 28.300 0.000 102.504 0.001 Simulations_profiling.py:225(duo_overlap)
12615015 69.616 0.000 69.616 0.000 {built-in method builtins.round}
The first question, is what are the 2 first lines? I assumed that it was the program being called.
I'm going to replace the round() method by tolerance comparison in my if / else statements, thus avoiding this time consumption. I would like to optimize further the 2 following functions, but I can't find a new approach.
import itertools
import numpy as np
class Signal:
def __init__(self, fq, t0, tf, width=0.3):
self.fq = fq # frequency in Hz
self.width = float(width) # cathodic phase width in ms
self.t0 = t0 # Instant of the first pulse in ms
self.tf = tf # End point in ms
# List of instant at which a stim pulse is triggered in ms
self.timeline = np.round(np.arange(t0, self.tf, 1/fq*1000), 3)
def find_closest_t(self, t):
val = min(self.timeline, key=lambda x:abs(x-t))
id = np.where(self.timeline==val)[0][0]
if val <= t or id == 0:
return val, id
else:
return self.timeline[id-1], id-1
def duo_overlap(S1, S2, perc):
pulse_id_t1, pulse_id_t2 = [], []
start = max(S1.t0, S2.t0) - max(S1.width, S2.width)
if start <= 0:
start = 0
start_id_S1 = 0
start_id_S2 = 0
else:
start_id_S1 = S1.find_closest_t(start)[1]
start_id_S2 = S2.find_closest_t(start)[1]
stop = min(S1.tf, S2.tf) + max(S1.width, S2.width)
for i in range(start_id_S1, len(S1.timeline)):
if S1.timeline[i] > stop:
break
for j in range(start_id_S2, len(S2.timeline)):
if S2.timeline[j] > stop:
break
d = round(abs(S2.timeline[j] - S1.timeline[i]), 3) # Computation of the distance of the 2 point
if S1.timeline[i] <= S2.timeline[j] and d < S1.width:
pulse_id_t1.append(i)
pulse_id_t2.append(j)
continue
elif S1.timeline[i] >= S2.timeline[j] and d < S2.width:
pulse_id_t1.append(i)
pulse_id_t2.append(j)
continue
else:
continue
return pulse_id_t1, pulse_id_t2
def overlap_duo_combination(signals, perc=0):
overlap = dict()
for i in range(len(signals)):
overlap[i] = list()
for subset in itertools.combinations(signals, 2):
p1, p2 = duo_overlap(subset[0], subset[1], perc = perc)
overlap[signals.index(subset[0])] += p1
overlap[signals.index(subset[1])] += p2
return overlap
Explanation of the program:
I have square signals of width Signal.width and of frequency Signal.fq starting at Signal.t0 and ending at Signal.tf. During the initialization of a Signal instance, the timeline is computed: it's a 1D-array of float in which each number corresponds to the instant at which a pulse is triggered.
Example:
IN: Signal(50, 0, 250).timeline
OUT: array([ 0., 20., 40., 60., 80., 100., 120., 140., 160., 180., 200., 220., 240.])
A pulse is triggered at t = 0, t = 20, t = 40, ... Each pulse has a width of 0.3.
duo_overlap() takes 2 signals in input (and a percentage that we will keep fix at 0 for this example. This function computes the id of the the pulse for S1 and for S2 (ID in the timeline array) that overlap one with another.
Example:
If a pulse starts at t = 0 for S1 and a pulse starts at t = 0.2 for S2, since 0.2 - 0 = 0.2 < 0.3 (S1.width), the 2 pulses overlap.
I tried to optimize this function by looping only on the ID in which they can possibly overlap (start_id, stop) computed ahead.But as you can see, this function is still really time-consuming because of the high number of calls.
The last function, overlap_duo_combination() takes N signals in input as a list (or tuple / iterable) (2 <= N <= 6 in practice) and creates a dict() in which the key is the ID of the signal in the input list, and the value is a list of overlapping pulses ID (comparison 2 by 2 of the signals within the input list).
Example:
Input: signals = (S1, S2, S3) The pulse n°2 of S1 overlap with pulse n°3 of S2 and the pulse n°3 of S2 overlap with pulse n°5 of S3.
Output: dict[0] = [2] / dict[1] = [3, 3] / dict[2] = [5]
The 3 pops out twice for S2 because it will be add the first tile duo_overlap() is called on S1 and S2, and the second time when it is caleed on S2 and S3.
I don't want to avoid duplicates since it's an information on how many different pulses are overlapping (in this case, 2 pulses are overlapping with the pulse n°3 of S2).
Would you have any idea, suggestion, code or whatever to redue the time complexity of this part of the code?
I am currently looking into PyCUDA implementation since I have a Nvidia 1080 Ti at disposal, but I don't know the C language. Would it be worth to switch to GPU this inner function that doesn't take long to execute when called but is called thousands of times?
Thanks for reading such a long post, and thanks for the help!
回答1:
If I understood your problem correctly, you could speed up your duo_overlap() function by relying on numpy instead of performing all the loops.
You would like to subtract all values of the S2.timeline from the S1.timeline and compare the difference to the width of the signal. The following function repeats the S1.timeline (repeats as columns) and subtracts S2.timeline from each row. Thus, the row indices correspond to S1.timeline, the column indices correspond to S2.timeline.
def duo_overlap_np(S1, S2):
x = S1.timeline
y = S2.timeline
mat = np.repeat(x, y.shape[0]).reshape(x.shape[0],y.shape[0])
mat = mat - y
# End of S1 pulse overlaps with start of S2 pulse
overlap_1 = np.where((0 >= mat) & (mat >= -S1.width))
# End of S2 pulse overlaps with start of S1 pulse
overlap_2 = np.where((0 <= mat) & (mat <= S2.width))
# Flatten the overlap arrays. The first element returned by np.where
# corresponds to S1, the second to S2
S1_overlap = np.concatenate([overlap_1[0], overlap_2[0]])
S2_overlap = np.concatenate([overlap_1[1], overlap_2[1]])
return S1_overlap, S2_overlap
A quick speed comparison on my machine gives,
S1 = Signal(50, 0, 1000, width=0.3)
S2 = Signal(25.5, 20.2, 1000, width=0.6)
%timeit duo_overlap(S1, S2, 0)
# 7 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit duo_overlap_np(S1, S2)
# 38.2 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
来源:https://stackoverflow.com/questions/49380594/time-optimization-of-function-for-signal-processing