问题
I have some .tex files from which I want to receive the plain text without any latex tags such as \section{...} or \newpage.
Does anybody have any idea on how to achieve this?
I also have the .pdf file but when I just copy the code from there, some words get concatenated which is real bad.
Is there any tool you know?
回答1:
detex(1):
Please see the OpenDetex GitHub page for the latest version of OpenDetex. It is a more modern, derivative version of my original DeTeX.
My legacy DeTeX home page is available here.
If you just want the legacy detex-2.8.tar source, you can get it here.
回答2:
opendetex is available both for windows and Linux
download the program opendetex from here
http://opendetex.googlecode.com/files/opendetex-2.8.1.tar.bz2
http://code.google.com/p/opendetex/downloads/list
Usage: http://code.google.com/p/opendetex/wiki/Usage
extract it to any directory of your choice. Say u extract it to Downloads directory.
make another directory of any name in that (optional. but its good if u create). say the directory name is “my_paper”. Put your paper in the “my_paper” directory. say your paper name is project.tex
Navigate through the path
cd ~/Downloads/opendetex
Run the command
detex -n my_paper/project.tex > out.txt
generic form
detex -n full_path_to_tex_file.tex > output_text_file.txt
回答3:
Maybe not 100% what the OP requested, but maybe it is of some help.
There is pdftotext in poppler-utils. This can convert a PDF file to a TXT file via
pdftotext yourPDF.pdf
Of course this incurs the overhead of installing this package, but I think it's neglible, since it is the standard library to render PDF on Linux if I remember correctly, so if you have a PDF viewer installed (Think Evince or Okular), it will be installed already.
Find here some more instructions.
来源:https://stackoverflow.com/questions/829408/extract-text-from-tex-remove-latex-tags