PHP - Fastest way to convert a 2d array into a 3d array that is grouped by a specific value

别等时光非礼了梦想. 提交于 2020-01-04 14:14:46

问题


I would like to convert this two dimensional array of records:

[records] => Array
(
  [0] => Array
  (
    [0] => Pears
    [1] => Green
    [2] => Box
    [3] => 20
  )
  [1] => Array
  (
    [0] => Pears
    [1] => Yellow
    [2] => Packet
    [3] => 4
  )
  [2] => Array
  (
    [0] => Peaches
    [1] => Orange
    [2] => Packet
    [3] => 4
  )
  [3] => Array
  (
    [0] => Apples
    [1] => Red
    [2] => Box
    [3] => 20
  )
)

Into this three dimensional array where each array key is grouped by a certain value from the original array:

[converted_records] => Array
(
  [Pears] => Array
  (
    [0] => Array
    (
      [0] => Green
      [1] => Box
      [2] => 20
    )
    [1] => Array
    (
      [0] => Yellow
      [1] => Packet
      [2] => 4
    )
  )
  [Peaches] => Array
  (
    [0] => Array
    (
      [0] => Orange
      [1] => Packet
      [2] => 4
    )
  )
  [Apples] => Array
  (
    [0] => Array
    (
      [0] => Red
      [1] => Box
      [2] => 20
    )
  )
)

I can do this like so:

$array = // Sample data like the first array above
$storage = array();
$cnt = 0;
foreach ($array as $key=>$values) {
  $storage[$values[0]][$cnt] = array (
    0 => $values[1],
    1 => $values[2],
    2 => $values[3]
  );
  $cnt ++;
}

I wanted to know if there is a more optimal way to do this. I am not aware of any functions within PHP that are capable of this so I can only assume that this is basically how it would be done.

The problem is though, this is going to be repeated so so many times and every little millisecond is going to count so I really want to know what is the best way to accomplish this task?

EDIT

The records array is created by parsing a .CSV file as follows:

$records = array_map('str_getcsv', file('file.csv'));

EDIT #2

I did a simple benchmark test on a set of 10 results (5k records each) to get an average runtime of 0.645478 seconds. Granted there is a few other things going on before this so this is not a true indication of actual performance but a good indication for comparison to other methods.

EDIT #3

I did a test with about 20x the records. The average of my routine was 14.91971.

At some point the answer below by @num8er had $records[$key][] = array_shift($data); before updating the answer as it is now.

When I tried testing with the larger set of results this it ran out of memory as its generating an error for each record.

This being said, once i did $records[$key][] = $data; the routine completed with an average of 18.03699 seconds with gc_collect_cycles() commented out.

I've reached the conclusion that although @num8ers method is faster for smaller files, for larger ones my method works out quicker.


回答1:


reading big file to memory using file() (1st iteration when it reads file)
and then iterating lines using array_map (2nd iteration after each line of file is read to array)
doing foreach on array (3rd iteration)
it is bad idea when You're looking for performance.

You're iterating 3 times. so what about 100K records? it will iterate 300K times?
most performant way is to do it while reading file. there is only 1 iteration - reading lines (100K records == 100K iteration):

ini_set('memory_limit', '1024M');
set_time_limit(0);

$file = 'file.csv';
$file = fopen($file, 'r');

$records = array();
while($data = fgetcsv($file)) {
  $key = $data[0];
  if(!isset($records[$key])) {
    $records[$key] = array();
  }

  $records[$key][] = array(0 => $data[1],
                           1 => $data[2],
                           2 => $data[3]);
  gc_collect_cycles();
}

fclose($file);


and here is parent -> children processing for huge files:

<?php

ini_set('memory_limit', '1024M');
set_time_limit(0);

function child_main($file)
{
    $my_pid = getmypid();
    print "Starting child pid: $my_pid\n";

    /**
     * OUR ROUTINE
     */

    $file = fopen($file, 'r');
    $records = array();
    while($data = fgetcsv($file)) {
        $key = $data[0];
        if(!isset($records[$key])) {
            $records[$key] = array();
        }

        $records[$key][] = array(0 => $data[1],
            1 => $data[2],
            2 => $data[3]);
        gc_collect_cycles();
    }
    fclose($file);

    unlink($file);

    return 1;
}


$file = __DIR__."/file.csv";
$files = glob(__DIR__.'/part_*');
if(sizeof($files)==0) {
    exec('split -l 1000 '.$file.' part_'); 
    $files = glob(__DIR__.'/part_*');
}

$children = array();
foreach($files AS $file) {
    if(($pid = pcntl_fork()) == 0) {
        exit(child_main($file));
    }
    else {
        $children[] = $pid;
    }
}

foreach($children as $pid) {
    $pid = pcntl_wait($status);
    if(pcntl_wifexited($status)) {
        $code = pcntl_wexitstatus($status);
        print "pid $pid returned exit code: $code\n";
    }
    else {
        print "$pid was unnaturally terminated\n";
    }
}

?>



回答2:


If you're only looking for some clean code:

$array   = array_map('str_getcsv', file('file.csv'));

$storage = array();
foreach ($array as $values) {
    $key             = array_shift($values);
    $storage[$key][] = $values;
}

Unless you have hundreds of thousands of array entries, speed shouldnt be a concern either.



来源:https://stackoverflow.com/questions/32191069/php-fastest-way-to-convert-a-2d-array-into-a-3d-array-that-is-grouped-by-a-spe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!