how to replace a character INSIDE the text content of many files automatically?

和自甴很熟 提交于 2020-01-03 11:58:16

问题


I have a folder /myfolder containing many latex tables.

I need to replace a character in each of them, namely replacing any minus sign -, by an en dash .

Just to be sure: we are replacing hypens INSIDE all of the tex file in that folder. I dont care about the tex file names.

Doing that manually would be a nightmare (too many files, too many minuses). Is there a way to loop over the files automatically and do the replacement? A solution in Python/R would be great.

Thanks!


回答1:


sed -i -e 's/-/–/g' /myfolder/* should work.

The expression does a search globally and replaces all - inside the files the shell expands from /myfolder/* with . Sed does the change in-place, that is, overwriting the original file (you need to explicitly specify a backup-file on MacOS, I can't remember the parameter though).

Absolutely no care is taken about wether or not the - is a verbatim hyphen or part of the latex syntax. Be aware of that.




回答2:


To rename file names, use

rename 's/-/–/g' *

it will rename all the hyphens to en dash.

To replace all contents from hyphens to en dash, use

 sed -i 's/-/–/g' *tex



回答3:


Try with sed

find /home/milenko/pr -type f -exec \
sed -i 's/-/–/g' {} +

from command line(if you are using Linux)

More about type

The find utility -exec clause is using {} to represent the matched files.




回答4:


First, back all your files up before removing the ".bak" in the code. I don't want to cause you to lose something, or if my script misfires, I'd like you to be able to recreate what you have.

Second, this is probably not very good Python code, because I am not an expert. But it works, if you are editing in utf-8. Because en dash is not an ASCII character, a straight replace doesn't work. I confess I'm not quite sure what's going on here, so bigger python experts may be able to sort out where I can do better.

#-*- coding: utf-8 -*-

import codecs
import glob
import re
import os

def replace_file(file):
    endash = "–".encode('utf-8')
    print ("Replacing " + file)
    temp = codecs.open("temp", "w", "utf-8")
    with codecs.open(file) as f:
        for line in f:
            line = re.sub("-", "–", line)
            temp.write(line)
    temp.close()
    f.close()
    os.system("copy temp \"" + file + ".bak\"")

x = glob.glob("*.tex")

for y in x:
    replace_file(y)



回答5:


Python Solution

import os
directory = os.getcwd()
for filename in os.listdir(directory):
  if "-" in filename:
    os.rename(os.path.join(directory,filename),os.path.join(directory,filename.replace("-","-")))

New solution to replace characters inside a file

u2212 is unicode character for minus and u2014 for en-dash.

import os
directory = os.getcwd()
import fnmatch

def _changefiletext(fileName):
  with open(fileName,'r') as file:
    str = file.read()
    str = str.decode("utf-8").replace(u"\u2212",u"\u2014").encode("utf-8")
  with open(fileName,'wb') as file:
    file.write(str)

# Filter the files on which you want to run the replace code (*.txt in this case)    

matches = []
for root, dirnames, filenames in os.walk(directory):
    for filename in fnmatch.filter(filenames, '*.txt'):
        matches.append(os.path.join(root, filename))

for filename in matches:
  print "Converting file %s" %(filename)
  _changefiletext(filename)


来源:https://stackoverflow.com/questions/44996829/how-to-replace-a-character-inside-the-text-content-of-many-files-automatically

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!