How can I convert tabs to spaces in every file of a directory (possibly recursively)?
Also, is there a way of setting the number of spaces per tab?
How can I convert tabs to spaces in every file of a directory (possibly recursively)?
This is usually not what you want.
Do you want to do this for png images? PDF files? The .git directory? Your
Makefile (which requires tabs)? A 5GB SQL dump?
You could, in theory, pass a whole lot of exlude options to find or whatever
else you're using; but this is fragile, and will break as soon as you add other
binary files.
What you want, is at least:
expand does this, sed
doesn't).As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.
A while ago I created a little script called
sanitize_files which does exactly
that. It also fixes some other common stuff like replacing \r\n with \n,
adding a trailing \n, etc.
You can find a simplified script without the extra features and command-line arguments below, but I recommend you use the above script as it's more likely to receive bugfixes and other updated than this post.
I would also like to point out, in response to some of the other answers here,
that using shell globbing is not a robust way of doing this, because sooner
or later you'll end up with more files than will fit in ARG_MAX (on modern
Linux systems it's 128k, which may seem a lot, but sooner or later it's not
enough).
#!/usr/bin/env python
#
# http://code.arp242.net/sanitize_files
#
import os, re, sys
def is_binary(data):
return data.find(b'\000') >= 0
def should_ignore(path):
keep = [
# VCS systems
'.git/', '.hg/' '.svn/' 'CVS/',
# These files have significant whitespace/tabs, and cannot be edited
# safely
# TODO: there are probably more of these files..
'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock'
]
for k in keep:
if '/%s' % k in path:
return True
return False
def run(files):
indent_find = b'\t'
indent_replace = b' ' * indent_width
for f in files:
if should_ignore(f):
print('Ignoring %s' % f)
continue
try:
size = os.stat(f).st_size
# Unresolvable symlink, just ignore those
except FileNotFoundError as exc:
print('%s is unresolvable, skipping (%s)' % (f, exc))
continue
if size == 0: continue
if size > 1024 ** 2:
print("Skipping `%s' because it's over 1MiB" % f)
continue
try:
data = open(f, 'rb').read()
except (OSError, PermissionError) as exc:
print("Error: Unable to read `%s': %s" % (f, exc))
continue
if is_binary(data):
print("Skipping `%s' because it looks binary" % f)
continue
data = data.split(b'\n')
fixed_indent = False
for i, line in enumerate(data):
# Fix indentation
repl_count = 0
while line.startswith(indent_find):
fixed_indent = True
repl_count += 1
line = line.replace(indent_find, b'', 1)
if repl_count > 0:
line = indent_replace * repl_count + line
data = list(filter(lambda x: x is not None, data))
try:
open(f, 'wb').write(b'\n'.join(data))
except (OSError, PermissionError) as exc:
print("Error: Unable to write to `%s': %s" % (f, exc))
if __name__ == '__main__':
allfiles = []
for root, dirs, files in os.walk(os.getcwd()):
for f in files:
p = '%s/%s' % (root, f)
if do_add:
allfiles.append(p)
run(allfiles)