Here is the process that I ended up using. The main tool I used was Inkscape which was able to convert text alright.
- used Adobe Acrobat Pro actions with JavaScript to split-up the PDF sheets
- ran Inkscape Portable 0.48.5 from Windows Cmd to convert to SVG
- made some manual edits to a particular SVG XML attribute I was having issues with by using Windows Cmd and Windows PowerShell
Separate Pages: Adobe Acrobat Pro with JavaScript
Using Adobe Acrobat Pro Actions (formerly Batch Processing) create a custom action to separate PDF pages into separate files. Alternatively you may be able to split up PDFs with GhostScript
Acrobat JavaScript Action to split pages
/* Extract Pages to Folder */
var re = /.*\/|\.pdf$/ig;
var filename = this.path.replace(re,"");
{
for ( var i = 0; i < this.numPages; i++ )
this.extractPages
({
nStart: i,
nEnd: i,
cPath : filename + "_s" + ("000000" + (i+1)).slice (-3) + ".pdf"
});
};
PDF to SVG Conversion: Inkscape with Windows CMD batch file
Using Windows Cmd created batch file to loop through all PDF files in a folder and convert them to SVG
Batch file to convert PDF to SVG in current folder
:: ===== SETUP =====
@echo off
CLS
echo Starting SVG conversion...
echo.
:: setup working directory (if different)
REM set "_work_dir=%~dp0"
set "_work_dir=%CD%"
:: setup counter
set "count=1"
:: setup file search and save string
set "_work_x1=pdf"
set "_work_x2=svg"
set "_work_file_str=*.%_work_x1%"
:: setup inkscape commands
set "_inkscape_path=D:\InkscapePortable\App\Inkscape\"
set "_inkscape_cmd=%_inkscape_path%inkscape.exe"
:: ===== FIND FILES IN WORKING DIRECTORY =====
:: Output from DIR last element is single carriage return character.
:: Carriage return characters are directly removed after percent expansion,
:: but not with delayed expansion.
pushd "%_work_dir%"
FOR /f "tokens=*" %%A IN ('DIR /A:-D /O:N /B %_work_file_str%') DO (
CALL :subroutine "%%A"
)
popd
:: ===== CONVERT PDF TO SVG WITH INKSCAPE =====
:subroutine
echo.
IF NOT [%1]==[] (
echo %count%:%1
set /A count+=1
start "" /D "%_work_dir%" /W "%_inkscape_cmd%" --without-gui --file="%~n1.%_work_x1%" --export-dpi=300 --export-plain-svg="%~n1.%_work_x2%"
) ELSE (
echo End of output
)
echo.
GOTO :eof
:: ===== INKSCAPE REFERENCE =====
:: print inkscape help
REM "%_inkscape_cmd%" --help > "%~dp0\inkscape_help.txt"
REM "%_inkscape_cmd%" --verb-list > "%~dp0\inkscape_verb_list.txt"
Cleanup attributes: Windows Cmd and PowerShell
I realize it is not best practice to manually brute force edit SVG or XML tags or attributes due to potential variations and should use an XML parser instead. However I had a simple issue where the stroke width on one drawing was very small, and on another the font family was being incorrectly identified, so I basically modified the previous Windows Cmd batch script to do a simple find and replace. The only changes were to the search string definitions and changing to call a PowerShell command. The PowerShell command will perform a find and replace and save the modified file with an added suffix. I did find some other references that could be better used to parse or modify the resultant SVG files if some other minor cleanup is needed to be performed.
Modifications to manually find and replace SVG XML data
:: setup file search and save string
set "_work_x1=svg"
set "_work_x2=svg"
set "_work_s2=_mod"
set "_work_file_str=*.%_work_x1%"
powershell -Command "(Get-Content '%~n1.%_work_x1%') | ForEach-Object {$_ -replace 'stroke-width:0.06', 'stroke-width:1'} | ForEach-Object {$_ -replace 'font-family:Times Roman','font-family:Times New Roman'} | Set-Content '%~n1%_work_s2%.%_work_x2%'"
Hope this might help someone
References
Adobe Acrobat Pro Actions and JavaScript references to Separate Pages
- How to automate extracting pages from a PDF...
- JavaScript for Acrobat API Reference - extractPages
- Extract pages to separate pdfs (something wrong with loop?)
- How can I create a Zerofilled value using JavaScript?
- How to output integers with leading zeros in JavaScript
GhostScript references to Separate Pages
- GhostScript noob help - Breaking a multipage PDF file...
- How to convert a multi-page PDF file...
- Splitting a PDF with Ghostscript
Inkscape Command Line references for PDF to SVG Conversion
- convert pdf to svg
- Convert PDF to clean SVG?
Windows Cmd Batch File Script references
- Hidden features of Windows batch files
- SS64.com - Index of the Windows CMD command line
- Why is the FOR /f loop in this batch script evaluating a blank line?
XML tag/attribute replacement research
- How can you find and replace text in a file using the Windows command-line environment?
- Changing tag data in an XML file using windows batch file
- update XML from the command line [windows]
- How to modify/create values in XML files using PowerShell?
- Editing XML Attributes using Powershell
- powershell change the value of XML Element attribute