How to extract table data from PDF as CSV from the command line?

前端 未结 5 2047
生来不讨喜
生来不讨喜 2021-02-02 12:13

I want to extract all rows from here while ignoring the column headers as well as all page headers, i.e. Supported Devices.

pdftotext -layout DAC06         


        
5条回答
  •  自闭症患者
    2021-02-02 12:29

    This can be done easily with an IntelliGet (http://akribiatech.com/intelliget) script as below

    userVariables = brand, name, device, model;
    { start = Not(Or(Or(IsSubstring("Supported Devices",Line(0)),
                      IsSubstring("Retail Branding",Line(0))),
                    IsEqual(Length(Trim(Line(0))),0))); 
      brand = Trim(Substring(Line(0),10,44));
      name = Trim(Substring(Line(0),45,79));
      device = Trim(Substring(Line(0),80,114));
      model = Trim(Substring(Line(0),115,200));
      output = Concat(brand, ",", name, ",", device, ",", model);
    }
    

提交回复
热议问题