How can I convert tabs to spaces in every file of a directory?

后端 未结 19 1316
既然无缘
既然无缘 2020-12-02 03:48

How can I convert tabs to spaces in every file of a directory (possibly recursively)?

Also, is there a way of setting the number of spaces per tab?

19条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-02 04:06

    How can I convert tabs to spaces in every file of a directory (possibly recursively)?

    This is usually not what you want.

    Do you want to do this for png images? PDF files? The .git directory? Your Makefile (which requires tabs)? A 5GB SQL dump?

    You could, in theory, pass a whole lot of exlude options to find or whatever else you're using; but this is fragile, and will break as soon as you add other binary files.

    What you want, is at least:

    1. Skip files over a certain size.
    2. Detect if a file is binary by checking for the presence of a NULL byte.
    3. Only replace tabs at the start of a file (expand does this, sed doesn't).

    As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.

    A while ago I created a little script called sanitize_files which does exactly that. It also fixes some other common stuff like replacing \r\n with \n, adding a trailing \n, etc.

    You can find a simplified script without the extra features and command-line arguments below, but I recommend you use the above script as it's more likely to receive bugfixes and other updated than this post.

    I would also like to point out, in response to some of the other answers here, that using shell globbing is not a robust way of doing this, because sooner or later you'll end up with more files than will fit in ARG_MAX (on modern Linux systems it's 128k, which may seem a lot, but sooner or later it's not enough).


    #!/usr/bin/env python
    #
    # http://code.arp242.net/sanitize_files
    #
    
    import os, re, sys
    
    
    def is_binary(data):
        return data.find(b'\000') >= 0
    
    
    def should_ignore(path):
        keep = [
            # VCS systems
            '.git/', '.hg/' '.svn/' 'CVS/',
    
            # These files have significant whitespace/tabs, and cannot be edited
            # safely
            # TODO: there are probably more of these files..
            'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock'
        ]
    
        for k in keep:
            if '/%s' % k in path:
                return True
        return False
    
    
    def run(files):
        indent_find = b'\t'
        indent_replace = b'    ' * indent_width
    
        for f in files:
            if should_ignore(f):
                print('Ignoring %s' % f)
                continue
    
            try:
                size = os.stat(f).st_size
            # Unresolvable symlink, just ignore those
            except FileNotFoundError as exc:
                print('%s is unresolvable, skipping (%s)' % (f, exc))
                continue
    
            if size == 0: continue
            if size > 1024 ** 2:
                print("Skipping `%s' because it's over 1MiB" % f)
                continue
    
            try:
                data = open(f, 'rb').read()
            except (OSError, PermissionError) as exc:
                print("Error: Unable to read `%s': %s" % (f, exc))
                continue
    
            if is_binary(data):
                print("Skipping `%s' because it looks binary" % f)
                continue
    
            data = data.split(b'\n')
    
            fixed_indent = False
            for i, line in enumerate(data):
                # Fix indentation
                repl_count = 0
                while line.startswith(indent_find):
                    fixed_indent = True
                    repl_count += 1
                    line = line.replace(indent_find, b'', 1)
    
                if repl_count > 0:
                    line = indent_replace * repl_count + line
    
            data = list(filter(lambda x: x is not None, data))
    
            try:
                open(f, 'wb').write(b'\n'.join(data))
            except (OSError, PermissionError) as exc:
                print("Error: Unable to write to `%s': %s" % (f, exc))
    
    
    if __name__ == '__main__':
        allfiles = []
        for root, dirs, files in os.walk(os.getcwd()):
            for f in files:
                p = '%s/%s' % (root, f)
                if do_add:
                    allfiles.append(p)
    
        run(allfiles)
    

提交回复
热议问题