Automating a Job at Work: Importing Powerpoint Bullet Text into an Excel Sheet

∥☆過路亽.° 提交于 2019-12-04 09:39:30

Here is a sample script using Win32::OLE.

By the way, once you have converted the slides into a format you can process, you can use Spreadsheet::WriteExcel on non-MS systems to write the output. Therefore, I would recommend two programs: One to transform the PowerPoint documents and another to generate the Excel files.

Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via Tools → Macro → Visual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.

#!/usr/bin/perl

use strict;
use warnings;

use FindBin qw( $Bin );
use File::Spec::Functions qw( catfile );

use Win32::OLE;
use Win32::OLE::Const 'Microsoft PowerPoint';
$Win32::OLE::Warn = 3;

my $ppt = get_ppt();
$ppt->{Visible} = 1;

my $ppt_file = catfile $Bin, 'test.ppt';
my $doc = $ppt->Presentations->open( $ppt_file );
my $slides = $doc->Slides;
my $num_slides = $slides->Count;

for my $slide_idx (1 .. $num_slides) {
    print "=== Begin Slide $slide_idx ===\n";

    my $slide = $doc->Slides->Item( $slide_idx );
    my $shapes = $slide->Shapes;
    my $num_shapes = $shapes->Count;

    for my $shape_idx (1 .. $num_shapes) {
        my $shape = $shapes->Item($shape_idx);
        next unless $shape->HasTextFrame;

        my $pars = $shape->TextFrame->TextRange->Paragraphs;
        my $num_pars = $pars->Count;
        for my $par_idx (1 .. $num_pars) {
            my $par = $pars->Paragraphs($par_idx,1);
            print_par( $par );
        }
    }

    print "=== End Slide $slide_idx ===\n\n";
}

sub print_par {
    my ($par) = @_;
    my @bullets = qw( - * > + = @ );

    my $bullet_format = $par->ParagraphFormat->Bullet;
    my $bullet_type = $bullet_format->Type;

    my $bullet_char = '';

    if ($bullet_type == ppBulletNumbered) {
        $bullet_char = $bullet_format->Number . "\t";
    }
    elsif( $bullet_type == ppBulletUnnumbered ) {
        # Need a Unicode => ASCII mapping if you want to use
        # $bullet_format->Character
        my $indent = $par->IndentLevel % scalar @bullets;
        $bullet_char = $bullets[$indent] . "\t";
    }

    my $text = $par->Text;
    $text =~ s/\s+$//;

    print $bullet_char, $text,"\n";
}

sub get_ppt {
    my $app;
    eval {
        $app = Win32::OLE->GetActiveObject('PowerPoint.Application');
    };

    die "$@\n" if $@;

    unless($app) {
        $app = Win32::OLE->new(
            'PowerPoint.Application', sub { $_[0]->Quit }
        ) or die "Oops, cannot start PowerPoint: ",
                 Win32::OLE->LastError, "\n";
    }
    return $app;
}

It can be done with Perl. Pretty much anything you can do with VBA can be done with Perl via Win32::OLE. I've used the Win32::OLE module to work with MS-Office documents, both extracting and creating content. It's been awhile though. Start here, http://win32.perl.org/wiki/index.php?title=Win32_Perl_Modules about the middle of the page.

The VBA documentation on each of the objects is useful for reference, finding what objects there are and the methods and properties available on them.

Puh...you'll have a hard time in dealing with MS Office files using Perl, PHP or Java. When I had to do automation for MS Office files I most often used VBA (Visual Basic for Applications). Take a look at it. For a lot of stuff you can just record a macro, take a look at the produced code and learn how things are being referenced. Then take pieces of the generated code and create your own VBA modules and tools. I could imagine that such a functionality could perfectly fit as an Add-in for Power Point.

Is Visual Basic for Applications not available to you? That should be built-in to Office, and since you're going Office-to-Office it might be easier.

You could use OpenOffice.org's presentation app (Impress) to import the Powerpoint file. You could then export it in native OpenOffice.org format, which is XML. You should then be able to parse the plain-text XML with the language of your choice.

As other people have pointed out, if you want to work with the Powerpoint format directly, you really need to use a Microsoft language (VB, VBA, C#, etc.).

I'm thinking you may want to look into programs that convert from PPT to a CSV file possibly with PDF in the middle? Once the data is in CSV format, you may be able to process it by php/perl much more easily.

Doing this from scratch will be very time consuming because the Office document formats are very complicated in general.

If you have the Zend Framework available, it can help considerably. See here for helpful documentation. See here for writing to excel files.

This sounds a lot like what I do at work, though I work mostly in Excel and Word. Your best bet would be use VBA in PowerPoint to look at each page and find the bullets. Then write to a file in CSV format, each bullet on a different line, which will open in Excel, each bullet on it's own row.

Finding what is a bullet and what isn't might be trick. Try recording some macros selecting, adding and deleting bullets, and maybe changing the level of a few. That should give you enough info about what objects to be looking for, and how they can be worked with.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!