Better way to convert file sizes in Python

烈酒焚心 提交于 2019-11-28 16:04:01

There is hurry.filesize that will take the size in bytes and make a nice string out if it.

>>> from hurry.filesize import size
>>> size(11000)
'10K'
>>> size(198283722)
'189M'

Or if you want 1K == 1000 (which is what most users assume):

>>> from hurry.filesize import size, si
>>> size(11000, system=si)
'11K'
>>> size(198283722, system=si)
'198M'

It has IEC support as well (but that wasn't documented):

>>> from hurry.filesize import size, iec
>>> size(11000, system=iec)
'10Ki'
>>> size(198283722, system=iec)
'189Mi'

Because it's written by the Awesome Martijn Faassen, the code is small, clear and extensible. Writing your own systems is dead easy.

Here is one:

mysystem = [
    (1024 ** 5, ' Megamanys'),
    (1024 ** 4, ' Lotses'),
    (1024 ** 3, ' Tons'), 
    (1024 ** 2, ' Heaps'), 
    (1024 ** 1, ' Bunches'),
    (1024 ** 0, ' Thingies'),
    ]

Used like so:

>>> from hurry.filesize import size
>>> size(11000, system=mysystem)
'10 Bunches'
>>> size(198283722, system=mysystem)
'189 Heaps'
James Sapam

Here is what I use:

import math

def convert_size(size_bytes):
   if size_bytes == 0:
       return "0B"
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size_bytes, 1024)))
   p = math.pow(1024, i)
   s = round(size_bytes / p, 2)
   return "%s %s" % (s, size_name[i])

NB : size should be sent in Bytes.

Instead of a size divisor of 1024 * 1024 you could use the << bitwise shifting operator, i.e. 1<<20 to get megabytes, 1<<30 to get gigabytes, etc.

In the simplest scenario you can have e.g. a constant MBFACTOR = float(1<<20) which can then be used with bytes, i.e.: megas = size_in_bytes/MBFACTOR.

Megabytes are usually all that you need, or otherwise something like this can be used:

# bytes pretty-printing
UNITS_MAPPING = [
    (1<<50, ' PB'),
    (1<<40, ' TB'),
    (1<<30, ' GB'),
    (1<<20, ' MB'),
    (1<<10, ' KB'),
    (1, (' byte', ' bytes')),
]


def pretty_size(bytes, units=UNITS_MAPPING):
    """Get human-readable file sizes.
    simplified version of https://pypi.python.org/pypi/hurry.filesize/
    """
    for factor, suffix in units:
        if bytes >= factor:
            break
    amount = int(bytes / factor)

    if isinstance(suffix, tuple):
        singular, multiple = suffix
        if amount == 1:
            suffix = singular
        else:
            suffix = multiple
    return str(amount) + suffix

print(pretty_size(1))
print(pretty_size(42))
print(pretty_size(4096))
print(pretty_size(238048577))
print(pretty_size(334073741824))
print(pretty_size(96995116277763))
print(pretty_size(3125899904842624))

## [Out] ###########################
1 byte
42 bytes
4 KB
227 MB
311 GB
88 TB
2 PB
Pavan Gupta

Here is the compact function to calculate size

def GetHumanReadable(size,precision=2):
    suffixes=['B','KB','MB','GB','TB']
    suffixIndex = 0
    while size > 1024 and suffixIndex < 4:
        suffixIndex += 1 #increment the index of the suffix
        size = size/1024.0 #apply the division
    return "%.*f%s"%(precision,size,suffixes[suffixIndex])

For more detailed output and vice versa operation please refer: http://code.activestate.com/recipes/578019-bytes-to-human-human-to-bytes-converter/

Just in case anyone's searching for the reverse of this problem (as I sure did) here's what works for me:

def get_bytes(size, suffix):
    size = int(float(size))
    suffix = suffix.lower()

    if suffix == 'kb' or suffix == 'kib':
        return size << 10
    elif suffix == 'mb' or suffix == 'mib':
        return size << 20
    elif suffix == 'gb' or suffix == 'gib':
        return size << 30

    return False

See below for a quick and relatively easy-to-read way to print file sizes in a single line of code if you already know what you want. These one-liners combine the great answer by @ccpizza above with some handy formatting tricks I read here How to print number with commas as thousands separators?.

Bytes

print ('{:,.0f}'.format(os.path.getsize(filepath))+" B")

Kilobits

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<7))+" kb")

Kilobytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<10))+" KB")

Megabits

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<17))+" mb")

Megabytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<20))+" MB")

Gigabits

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<27))+" gb")

Gigabytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<30))+" GB")

Terabytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<40))+" TB")

Obviously they assume you know roughly what size you're going to be dealing with at the outset, which in my case (video editor at South West London TV) is MB and occasionally GB for video clips.


UPDATE USING PATHLIB In reply to Hildy's comment, here's my suggestion for a compact pair of functions (keeping things 'atomic' rather than merging them) using just the Python standard library:

from pathlib import Path    

def get_size(path = Path('.')):
    """ Gets file size, or total directory size """
    if path.is_file():
        size = path.stat().st_size
    elif path.is_dir():
        size = sum(file.stat().st_size for file in path.glob('*.*'))
    return size

def format_size(path, unit="MB"):
    """ Converts integers to common size units used in computing """
    bit_shift = {"B": 0,
            "kb": 7,
            "KB": 10,
            "mb": 17,
            "MB": 20,
            "gb": 27,
            "GB": 30,
            "TB": 40,}
    return "{:,.0f}".format(get_size(path) / float(1 << bit_shift[unit])) + " " + unit

# Tests and test results
>>> get_size("d:\\media\\bags of fun.avi")
'38 MB'
>>> get_size("d:\\media\\bags of fun.avi","KB")
'38,763 KB'
>>> get_size("d:\\media\\bags of fun.avi","kb")
'310,104 kb'

Here my two cents, which permits casting up and down, and adds customizable precision:

def convertFloatToDecimal(f=0.0, precision=2):
    '''
    Convert a float to string of decimal.
    precision: by default 2.
    If no arg provided, return "0.00".
    '''
    return ("%." + str(precision) + "f") % f

def formatFileSize(size, sizeIn, sizeOut, precision=0):
    '''
    Convert file size to a string representing its value in B, KB, MB and GB.
    The convention is based on sizeIn as original unit and sizeOut
    as final unit. 
    '''
    assert sizeIn.upper() in {"B", "KB", "MB", "GB"}, "sizeIn type error"
    assert sizeOut.upper() in {"B", "KB", "MB", "GB"}, "sizeOut type error"
    if sizeIn == "B":
        if sizeOut == "KB":
            return convertFloatToDecimal((size/1024.0), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size/1024.0**2), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0**3), precision)
    elif sizeIn == "KB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size/1024.0), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0**2), precision)
    elif sizeIn == "MB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0**2), precision)
        elif sizeOut == "KB":
            return convertFloatToDecimal((size*1024.0), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0), precision)
    elif sizeIn == "GB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0**3), precision)
        elif sizeOut == "KB":
            return convertFloatToDecimal((size*1024.0**2), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size*1024.0), precision)

Add TB, etc, as you wish.

Here's a version that matches the output of ls -lh.

def human_size(num: int) -> str:
    base = 1
    for unit in ['B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
        n = num / base
        if n < 9.95 and unit != 'B':
            # Less than 10 then keep 1 decimal place
            value = "{:.1f}{}".format(n, unit)
            return value
        if round(n) < 1000:
            # Less than 4 digits so use this
            value = "{}{}".format(round(n), unit)
            return value
        base *= 1024
    value = "{}{}".format(round(n), unit)
    return value

Here is my implementation:

from bisect import bisect

def to_filesize(bytes_num, si=True):
    decade = 1000 if si else 1024
    partitions = tuple(decade ** n for n in range(1, 6))
    suffixes = tuple('BKMGTP')

    i = bisect(partitions, bytes_num)
    s = suffixes[i]

    for n in range(i):
        bytes_num /= decade

    f = '{:.3f}'.format(bytes_num)

    return '{}{}'.format(f.rstrip('0').rstrip('.'), s)

It will print up to three decimals and it strips trailing zeros and periods. The boolean parameter si will toggle usage of 10-based vs. 2-based size magnitude.

This is its counterpart. It allows to write clean configuration files like {'maximum_filesize': from_filesize('10M'). It returns an integer that approximates the intended filesize. I am not using bit shifting because the source value is a floating point number (it will accept from_filesize('2.15M') just fine). Converting it to an integer/decimal would work but makes the code more complicated and it already works as it is.

def from_filesize(spec, si=True):
    decade = 1000 if si else 1024
    suffixes = tuple('BKMGTP')

    num = float(spec[:-1])
    s = spec[-1]
    i = suffixes.index(s)

    for n in range(i):
        num *= decade

    return int(num)

Here's another version of @romeo's reverse implementation that handles a single input string.

import re

def get_bytes(size_string):
    try:
        size_string = size_string.lower().replace(',', '')
        size = re.search('^(\d+)[a-z]i?b$', size_string).groups()[0]
        suffix = re.search('^\d+([kmgtp])i?b$', size_string).groups()[0]
    except AttributeError:
        raise ValueError("Invalid Input")
    shft = suffix.translate(str.maketrans('kmgtp', '12345')) + '0'
    return int(size) << int(shft)

Similar to Aaron Duke's reply but more "pythonic" ;)

import re


RE_SIZE = re.compile(r'^(\d+)([a-z])i?b?$')

def to_bytes(s):
    parts = RE_SIZE.search(s.lower().replace(',', ''))
    if not parts:
        raise ValueError("Invalid Input")
    size = parts.group(1)
    suffix = parts.group(2)
    shift = suffix.translate(str.maketrans('kmgtp', '12345')) + '0'
    return int(size) << int(shift)

I'm new to programming. I came up with this following function that converts a given file size into readable format.

def file_size_converter(size):
    magic = lambda x: str(round(size/round(x/1024), 2))
    size_in_int = [int(1 << 10), int(1 << 20), int(1 << 30), int(1 << 40), int(1 << 50)]
    size_in_text = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']
    for i in size_in_int:
        if size < i:
            g = size_in_int.index(i)
            position = int((1024 % i) / 1024 * g)
            ss = magic(i)
            return ss + ' ' + size_in_text[position]

This work correctly for all file sizes:

import math
from os.path import getsize

def convert_size(size):
   if (size == 0):
       return '0B'
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size,1024)))
   p = math.pow(1024,i)
   s = round(size/p,2)
   return '%s %s' % (s,size_name[i])

print(convert_size(getsize('file_name.zip')))
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!