Let\'s say you want to calculate the remaining download time, and you have all the information needed, that is: File size, dl\'ed size, size left, time elapsed, momentary dl
Well, as you said, using the absolutely current download speed isn't a great method, because it tends to fluctuate. However, something like an overall average isn't a great idea either, because there may be large fluctuations there as well.
Consider if I start downloading a file at the same time as 9 others. I'm only getting 10% of my normal speed, but halfway through the file, the other 9 finish. Now I'm downloading at 10x the speed I started at. My original 10% speed shouldn't be a factor in the "how much time is left?" calculation any more.
Personally, I'd probably take an average over the last 30 seconds or so, and use that. That should do calculations based on recent speed, without fluctuating wildly. 30 seconds may not be the right amount, it would take some experimentation to figure out a good amount.
Another option would be to set a sort of "fluctuation threshold", where you don't do any recalculation until the speed changes by more than that threshold. For example (random number, again, would require experimentation), you could set the threshold at 10%. Then, if you're downloading at 100kb/s, you don't recalculate the remaining time until the download speed changes to either below 90kb/s or 110kb/s. If one of those changes happens, the time is recalculated and a new threshold is set.
The obvious way would be something in between, you need a 'moving average' of the download speed.
For anyone interested, I wrote an open-source library in C# called Progression that has a "moving-average" implementation: ETACalculator.cs.
The Progression library defines an easy-to-use structure for reporting several types of progress. It also easily handles nested progress for very smooth progress reporting.
You could use an averaging algorithm where the old values decay linearly. If S_n is the speed at time n and A_{n-1} is the average at time n-1, then define your average speed as follows.
A_1 = S_1
A_2 = (S_1 + S_2)/2
A_n = S_n/(n-1) + A_{n-1}(1-1/(n-1))
In English, this means that the longer in the past a measurement occurred, the less it matters because its importance has decayed.
Compare this to the normal averaging algorithm: A_n = S_n/n + A_{n-1}(1-1/n)
You could also have it geometrically decay, which would weight the most recent speeds very heavily: A_n = S_n/2 + A_{n-1}/2
If the speeds are 4,3,5,6 then
A_4 = 4.5 (simple average)
A_4 = 4.75 (linear decay)
A_4 = 5.125 (geometric decay)
Beware that $n+1
(not $n
) is the number of current data points due to PHP's arrays being zero-indexed. To match the above example set n == $n+1
or n-1 == $n
<?php
$s = [4,3,5,6];
// average
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
if ($n == 0)
$a[$n] = $s[$n];
else
{
// $n+1 = number of data points so far
$weight = 1/($n+1);
$a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
}
}
var_dump($a);
// linear decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
if ($n == 0)
$a[$n] = $s[$n];
elseif ($n == 1)
$a[$n] = ($s[$n] + $s[$n-1]) / 2;
else
{
// $n = number of data points so far - 1
$weight = 1/($n);
$a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
}
}
var_dump($a);
// geometric decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
if ($n == 0)
$a[$n] = $s[$n];
else
{
$weight = 1/2;
$a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
}
}
var_dump($a);
Output
array (size=4)
0 => int 4
1 => float 3.5
2 => float 4
3 => float 4.5
array (size=4)
0 => int 4
1 => float 3.5
2 => float 4.25
3 => float 4.8333333333333
array (size=4)
0 => int 4
1 => float 3.5
2 => float 4.25
3 => float 5.125
EDIT: Here's what I finally suggest, I tried it and it provides quite satisfying results:
I have a zero initialized array for each download speed between 0 - 500 kB/s (could be higher if you expect such speeds) in 1 kB/s steps. I sample the download speed momentarily (every second is a good interval), and increment the coresponding array item by one. Now I know how many seconds I have spent downloading the file at each speed. The sum of all these values is the elapsed time (in seconds). The sum of these values multiplied by the corresponding speed is the size downloaded so far. If I take the ratio between each value in the array and the elapsed time, assuming the speed variation pattern stabalizes, I can form a formula to predict the time each size will take. This size in this case, is the size remaining. That's what I do: I take the sum of each array item value multiplied by the corresponding speed (the index) and divided by the elapsed time. Then I divide the size left by this value, and that's the time left.
Takes a few seconds to stabalize, and then works preety damn well.
Note that this is a "complicated" average, so the method of discarding older values (moving average) might improve it even further.
What you could do also is keep track of your average speed and show a calculation of that as well.