Convert Word doc or docx files into text files?

前端 未结 11 540
难免孤独
难免孤独 2020-12-05 01:28

I need a way to convert .doc or .docx extensions to .txt without installing anything. I also don\'t want to have to manually open Wor

11条回答
  •  天涯浪人
    2020-12-05 01:53

    The method of Sinan Ünür works well.
    However, I got some crash with the files I was transforming.

    Another method is to use Win32::OLE and Win32::Clipboard as such:

    • Open the Word document
    • Select all the text
    • Copy in the Clipboard
    • Print the content of Clipboard in a txt file
    • Empty the Clipboard and close the Word document

    Based on the script given by Sigvald Refsu in http://computer-programming-forum.com/53-perl/c44063de8613483b.htm, I came up with the following script.

    Note: I chose to save the txt file with the same basename as the .docx file and in the same folder but this can easily be changed

    ########################################### 
    use strict; 
    use File::Spec::Functions qw( catfile );
    use FindBin '$Bin';
    use Win32::OLE qw(in with); 
    use Win32::OLE::Const 'Microsoft Word'; 
    use Win32::Clipboard; 
    
    my $monitor_word=0; #set 1 to watch MS Word being opened and closed
    
    sub docx2txt {
        ##Note: the path shall be in the form "C:\dir\ with\ space\file.docx"; 
        my $docx_file=shift; 
    
        #MS Word object
        my $Word = Win32::OLE->new('Word.Application', 'Quit') or die "Couldn't run Word"; 
        #Monitor what happens in MS Word 
        $Word->{Visible} = 1 if $monitor_word; 
    
        #Open file 
        my $Doc = $Word->Documents->Open($docx_file); 
        with ($Doc, ShowRevisions => 0); #Turn of revision marks 
    
        #Select the complete document
        $Doc->Select(); 
        my $Range = $Word->Selection();
        with ($Range, ExtendMode => 1);
        $Range->SelectAll(); 
    
        #Copy selection to clipboard 
        $Range->Copy();
    
        #Create txt file 
        my $txt_file=$docx_file; 
        $txt_file =~ s/\.docx$/.txt/;
        open(TextFile,">$txt_file") or die "Error while trying to write in $txt_file (!$)"; 
        printf TextFile ("%s\n", Win32::Clipboard::Get()); 
        close TextFile; 
    
        #Empty the Clipboard (to prevent warning about "huge amount of data in clipboard")
        Win32::Clipboard::Set("");
    
        #Close Word file without saving 
        $Doc->Close({SaveChanges => wdDoNotSaveChanges});
    
        # Disconnect OLE 
        undef $Word; 
    }
    

    Hope it can helps you.

提交回复
热议问题