问题
This script uses php and mysql to compute a one minute rolling average to reduce the impact of outliers on the my data (one minute = 6 10-second rows). It computes everything correctly, but is not efficient enough to do more than 150 rows at a time. I'd like to do as many rows as I can at a time, possibly between 5-10,000 as my table is over 150,000 and I input approximately 8,000 rows per day.
Does anyone have any suggestions as to how I can make this script run more efficiently?
Thanks!
<?php
//connect to database
mysql_connect("localhost","user","password");//database connection
mysql_select_db("database");
$result = mysql_query("SELECT Timestamp FROM table");
if (!$result) {
die('Could not query:' . mysql_error());
}
//get number of rows in table
$resultA = mysql_query("SELECT * FROM table");
$num_rows = mysql_num_rows($result);
echo "There are $num_rows rows.</br>";
//select column to be averaged
$resultB = mysql_query("SELECT PortRPMSignal FROM table");
if (!$resultB) {
die('Could not query:' . mysql_error());
}
//set start equal to the first row you want to calculate the averages from, likely the first null row
$start = 5;
//calculate 1 minute average, the average is correct
for($i = $start; $i<$num_rows; $i++){
$output = mysql_result($result,$i);
$test = mysql_result($resultB,$i)+mysql_result($resultB,$i-1)+mysql_result($resultB,$i-2)+mysql_result($resultB,$i-3)+mysql_result($resultB,$i-4)+mysql_result($resultB,$i-5);
$test2 = $test/6;
$round = round($test2,4);
$temp = mysql_query("SELECT Timestamp FROM table");
if(!$temp){
die('Could not query:' . mysql_error());
}
//gets timestamp at row $i, and inserts new average value into that row in RPMAve column
$time = mysql_result($result,$i);
mysql_query("UPDATE table SET PortMinuteAveRPM = $round WHERE Timestamp = '$time'");
}
回答1:
For starters, the initial "count" block here can be cleaned up by adding the COUNT() aggregate:
$resultA = mysql_query("SELECT * FROM table");
$num_rows = mysql_num_rows($result);
echo "There are $num_rows rows.</br>";
Change to:
$resultA = mysql_query("SELECT COUNT(*) FROM table");
$row = mysql_fetch_array($result);
$num_rows = $row[0];
echo "There are $num_rows rows.</br>";
That should speed things up considerably on its own. Without it, you're selecting all of the data from the table - a query that will only grow slower the more you put into the table.
For the averages you're computing, is there any logic required that can't be accomplished directly in a MySQL query? Something such as:
UPDATE table SET PortMinuteAveRPM=(SELECT AVG(PortRPMSignal) FROM table WHERE Timestamp BETWEEN '$startTime' AND '$endTime') WHERE TimeStamp='$endTime'
This may save you from looping through results, if it's plausible.
回答2:
It sounds like you're trying to calculate an autoregressive moving average (ARMA) but there's numerous issues with your interpretation of your data and how you are capturing it.
If you've got a complete set of data (though your question implies that you don't), then work out what time interval contains the required amount of records and get it direct from the database, e.g.
SELECT a.timestamp as base, AVG(b.PortRPMSignal)
FROM table a, table b
WHERE b.timestamp BETWEEN a.timestamp AND a.timestamp+INTERVAL 6 HOUR
GROUP BY a.timestamp
If you want to thin out the datapoints, then try something like....
SELECT a.timestamp as base, AVG(b.PortRPMSignal)
FROM table a, table b
WHERE b.timestamp BETWEEN a.timestamp AND a.timestamp+INTERVAL 6 HOUR
AND DATE_FORMAT(a.timestamp, '%i%s')='0000'
GROUP BY a.timestamp
Although a better solution if you've not got a complete dataset but there's only a small amount of jitter would be to use the modulus of an auto-increment id to pick out fewer rows from 'a'
回答3:
It's only a start, but you can bin this bit
//get number of rows in table
$resultA = mysql_query("SELECT * FROM table");
$num_rows = mysql_num_rows($result);
echo "There are $num_rows rows.</br>";
Because the following line
$resultB = mysql_query("SELECT PortRPMSignal FROM table");
...will give you a result set that you can use mysql_num_rows on.
Using the *
in a query increases the load on the database.
In your for loop you then have this
$temp = mysql_query("SELECT Timestamp FROM table");
if(!$temp){
die('Could not query:' . mysql_error());
}
which means this query runs every time you loop and you're not even using the results.
I don't know if mysqli will give you better performance, but you should use it.
来源:https://stackoverflow.com/questions/11613903/rolling-average-efficiency-php-mysql