Sorting file names by numeric value

大兔子大兔子 提交于 2019-12-24 06:16:07

问题


Preamble: I hate to ask questions like this, but I'm stuck with it and just learning Perl... seems like an easy task but I don't know where to look.

I have a folder with lots of xml-files that are all named ".xml". I need to process those files in their numeric order, so "9123.xml" should come before "2384747.xml". I have successfully sorted the list alphabetically with this:

opendir(XMLDIR,$xmldirname);
my @files = sort {$a cmp $b} readdir(XMLDIR);

but this isn't what I need.

I also tried

my @files = sort {$a <=> $b} readdir(XMLDIR);

which obviously fails because the filenames contain ".xml" and are not numeric as a whole.

Could someone open their heart and save me a week of browsing the Perl manuals?


回答1:


Despite your claim, sort { $a <=> $b } readdir(XMLDIR) works. When Perl treats the string 2384747.xml as a number (as <=> does), it is treated as having the value 2384747.

$ perl -wE'say 0+"2384747.xml"'
Argument "2384747.xml" isn't numeric in addition (+) at -e line 1.
2384747

Of course, those warnings are a problem. The solution you accepted tries to remove them, but fails to remove all of them because it doesn't take into account that readdir will return . and ... You gotta remove the files you don't want first.

Here are two simple solutions:

my @files =
   sort { no warnings 'numeric'; $a <=> $b }
      grep { /^(\d)\.xml/ }
         readdir(XMLDIR);

my @files =
   sort { ( $a =~ /(\d+)/ )[0] <=> ( $b =~ /(\d+)/ )[0] }
      grep { /^(\d)\.xml/ }
         readdir(XMLDIR);

In this particular case, you can optimize your code:

my @files =
   map { "$_.xml" }             # Recreate the file name.
      sort { $a <=> $b }        # Compare the numbers.
         map { /^(\d)\.xml/ }   # Extract the number from desired files.
            readdir(XMLDIR);

The simplest and fastest solution, however is to use a natural sort.

use Sort::Key::Natural qw( natsort );

my @files = natsort grep !/^\.\.?/, readdir(XMLDIR);



回答2:


You are actually pretty close. Just strip off the ".xml" when inside your compare:

opendir(XMLDIR,$xmldirname);
my @files = sort {substr($a, 0, index($a, '.')) <=> substr($b, 0, index($b, '.'))} readdir(XMLDIR);



回答3:


The problem is that <=> cannot work on something that is not entirely a number, in fact if you use warnings; you would get a message similar to this at run-time:

Argument "11139.xml" isn't numeric in sort at testsort.pl line 9.

What you can do is separate out the filename from the extension, sort numerically on the filename then re-combine the extensions in. This can be done fairly straightforward with a Schwartzian transform:

use strict;
use warnings; 

use Data::Dumper; 

# get all of the XML files
my @xml_files = glob("*.xml");

print 'Unsorted: ' . Dumper \@xml_files; 
@xml_files = map  { join '.', @$_ }              # join filename and extension
             sort { $a->[0] <=> $b->[0] }        # sort against filename
             map  { [split /\./] } @xml_files;   # split on '.'
print 'Sorted: ' . Dumper \@xml_files; 

__END__
Unsorted: $VAR1 = [
          '11139.xml',
          '18136.xml',
          '28715.xml',
          '6810.xml',
          '9698.xml'
        ];
Sorted: $VAR1 = [
          '6810.xml',
          '9698.xml',
          '11139.xml',
          '18136.xml',
          '28715.xml'
        ];



回答4:


my @files =  sort {
    my ($x) = split /\./, $a;
    my ($y) = split /\./, $b;
    $x <=> $y
} readdir(XMLDIR);

Or without the temporary variables:

my @files =  sort {(split /\./, $a)[0] <=> (split /\./, $b)[0]} readdir(XMLDIR);


来源:https://stackoverflow.com/questions/25206041/sorting-file-names-by-numeric-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!