Python os.walk and symlinks

最后都变了- 提交于 2019-12-10 19:37:40

问题


While fixing one user's answer on AskUbuntu , I've discovered a small issue. The code itself is straightforward : os.walk , recursively get sum of all files in the directory.

But it breaks on symlinks :

$ python test_code2.py $HOME                                                                                          
Traceback (most recent call last):
  File "test_code2.py", line 8, in <module>
    space += os.stat(os.path.join(subdir, f)).st_size
OSError: [Errno 2] No such file or directory: '/home/xieerqi/.kde/socket-eagle'

Question then is, how do I tell python to ignore those files and avoid summing them ?

Solution:

As suggested in the comments , I've added os.path.isfile() check and now it works perfectly and gives correct size for my home directory

$> cat test_code2.py                                                          
#! /usr/bin/python
import os
import sys

space = 0L  # L means "long" - not necessary in Python 3
for subdir, dirs, files in os.walk(sys.argv[1]):
    for f in files:
        file_path = os.path.join(subdir, f)
        if os.path.isfile(file_path):
           space += os.stat(file_path).st_size

sys.stdout.write("Total: {:d}\n".format(space))
$> python test_code2.py  $HOME                                                
Total: 76763501905

回答1:


As already mentioned by Antti Haapala in a comment, The script does not break on symlinks, but on broken symlinks. One way to avoid that, taking the existing script as a starting point, is using try/except:

#! /usr/bin/python2
import os
import sys

space = 0L  # L means "long" - not necessary in Python 3
for root, dirs, files in os.walk(sys.argv[1]):
    for f in files:
        fpath = os.path.join(root, f)
        try:
            space += os.stat(fpath).st_size
        except OSError:
            print("could not read "+fpath)

sys.stdout.write("Total: {:d}\n".format(space))

As a side effect, it gives you information on possible broken links.




回答2:


Yes, os.path.isfile is the way to go. However the following version may be more memory efficient.

for subdir, dirs, files in os.walk(sys.argv[1]):
    paths = (os.path.join(subdir, f) for f in files)
    space = sum(os.stat(path).st_size for path in paths if os.path.isfile(path))


来源:https://stackoverflow.com/questions/36484470/python-os-walk-and-symlinks

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!