How can I search CSS with Perl?

可紊 提交于 2019-12-24 12:53:53

问题


First question from a long time user.

I'm writing a Perl script that will go through a number of HTML files, search them line-by-line for instances of "color:" or "background-color:" (the CSS tags) and print the entire line when it comes across one of these instances. This is fairly straightforward.

Now I'll admit I'm still a beginning programmer, so this next part may be extremely obvious, but that's why I came here :).

What I want it to do is when it finds an instance of "color:" or "background-color:" I want it to trace back and find the name of the element, and print that as well. For example:

If my document contained the following CSS:

.css_class {
    font-size: 18px;
    font-weight: bold;
    color: #FFEFA1;
        font-family: Arial, Helvetica, sans-serif;
}

I would want the script to output something like:

css_class,#FFEFA1

Ideally it would output this as a text file.

I would greatly appreciate any advice that could be given to me regarding this!

Here is my script in full thus far:

$color = "color:";


open (FILE, "index.html");  
@document = `<FILE>`;  
close (FILE);  

foreach $line (@document){  
    if($line =~ /$color/){  
        print $line;  
    }  
}   

回答1:


Since you asked for advice (and this isn't a coding service) I'll offer just that.

Always use strictures and warnings:

use strict;
use warnings;

Always check the return value of open calls:

open(FILE, 'filename') or die "Can't read file 'filename' [$!]\n";

Use the three-arg form of open and lexical filehandles instead of globs:

open(my $fh, '<', 'filename') or die "Can't read file 'filename' [$!]\n";

Don't slurp when line-by-line processing will do:

while (my $line = <$fh>) {
    # do something with $line
}

Use backreferences to retrieve data from regex matches:

if ($line =~ /color *: *(#[0-9a-fA-F]{6})/) {
    # color value is in $1
}

Save the class name in a temporary variable so that you have it when you match a color:

if ($line =~ /^.(\w+) *\{/) {
    $class = $1;
}



回答2:


Well, this is not as simple as it seems.

CSS classes can be defined in many ways. For example,

    .classy {
         color: black;
    }

Good luck using a line-by-line approach for parsing that.

Actually, my first approach would be searching CPAN. This looks promising:

CSS - Object oriented access to Cascading Style Sheets (CSS)

Edit:

I installed HTML::TreeBuilder and CSS modules from CPAN and concocted the following aberration:

use strict;
use HTML::TreeBuilder;
use CSS;

foreach my $file_name (@ARGV) {
    my $tree = HTML::TreeBuilder->new; # empty tree
    $tree->parse_file($file_name);

    my $styles = $tree->find('style');

    if ($styles) {
        foreach my $style ($styles) {
            # This is an insane hack, not guarantee
            # to work in the future.
            my $css = CSS->new;
            $css->read_string(join "\n", @{$style->{_content}});

            print $css->output;
        }
    }
    $tree = $tree->delete;
}

This thing only prints all the CSS selectors from list of HTML files, but nicely formatted so you should be able to continue from here.




回答3:


For yet another way to do it, you can ask perl to read from the file in sections other than lines, for example by using the "}" as a record separator.

my $color = "color:";

open (my $fh, '<', "index.html") || die "Can't open file: $!";  

{
    local $/ = "}";
    while( my $section = <$fh>) {  
    if($section =~ /$color(.*)/) {
        my ($selector) = $line =~ /(.*){/;
        print "$selector, $section\n";  
    }  
}

Untested! Also, this of course assumes that your CSS neatly ends its sections with a } on a line on it's own.




回答4:


I'm not having problems with the regex's but rather with the capture of data. Since CSS elements are typically multi-line, I need to figure out how to create an array between the { and } with each linebreak as a delimiter for list items.

No, you don't.

For the problem as stated, the only lines of interest will be those containing either a class name or a color definition, and possibly also lines containing } to mark the end of a class. All other lines can be ignored, so there's no need to put them into an array.

Since class specifications cannot be nested[1], the last seen set of class names will always be the active set of classes. Therefore, you need only record the last seen set of class names and, when a color specification is encountered, print those class names.

There are still some potential difficulties handling cases in which a specification block is shared by multiple classes (.foo, .bar, .baz { ... }), which may or may not be spread across multiple lines, or if multiple attributes are defined on the same line, but dealing with those should follow fairly easily from what I've already laid out. Depending on your input data, you may also need to include a basic state engine to keep track of whether you're in comments or not.

[1] i.e., Although you can have semantically-nested classes, such as .foo and .foo .bar, they have to be specified in the CSS file as

.foo {
  ...
}
.foo .bar {
  ...
}

and cannot be

.foo {
  ...
  .bar {
    ...
  }
}



回答5:


Although I have not tested the code below, but something like this should work:

if ($line =~ m/\.(.*?) \{(.*?)color:(.*?);(.*)/) {
 print "$1,$3\n";
}

You should invest some time learning regular expressions for Perl.



来源:https://stackoverflow.com/questions/952283/how-can-i-search-css-with-perl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!