Improper use of __new__ to generate classes?

后端 未结 3 1219
我寻月下人不归
我寻月下人不归 2020-11-29 04:05

I\'m creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data

3条回答
  •  無奈伤痛
    2020-11-29 04:38

    Edit [BLUF]: there is no problem with the answer provided by @martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.

    I'd like to supply some additional information on the use of __init_subclass__ in conjuncture with using __new__ as a factory. The answer that @martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path is implemented.

    To follow up on a comment trail with @martinaeu I have taken the following snippet from his answer:

    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        @classmethod
        def __init_subclass__(cls, /, **kwargs):
            path_prefix = kwargs.pop('path_prefix', None)
            super().__init_subclass__(**kwargs)
            cls._registry[path_prefix] = cls  # Add class to registry.
    
        @classmethod
        def _get_prefix(cls, s):
            """ Extract any file system prefix at beginning of string s and
                return a lowercase version of it or None when there isn't one.
            """
            match = cls._PATH_PREFIX_PATTERN.match(s)
            return match.group(1).lower() if match else None
    
        def __new__(cls, path):
            """ Create instance of appropriate subclass. """
            path_prefix = cls._get_prefix(path)
            subclass = FileSystem._registry.get(path_prefix)
            if subclass:
                # Using "object" base class method avoids recursion here.
                return object.__new__(subclass)
            else:  # No subclass with matching prefix found (and no default).
                raise FileSystem.Unknown(
                    f'path "{path}" has no known file system prefix')
    
        def count_files(self):
            raise NotImplementedError
    
    
    class Nfs(FileSystem, path_prefix='nfs'):
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    
    class LocalDrive(FileSystem, path_prefix=None):  # Default file system.
        def __init__(self, path):
            if not os.access(path, os.R_OK):
                raise FileSystem.NoAccess('Cannot read directory')
            self.path = path
    
        def count_files(self):
            return sum(os.path.isfile(os.path.join(self.path, filename))
                         for filename in os.listdir(self.path))
    
    
    if __name__ == '__main__':
    
        data1 = FileSystem('nfs://192.168.1.18')
        data2 = FileSystem('c:/')  # Change as necessary for testing.
    
        print(type(data1).__name__)  # -> Nfs
        print(type(data2).__name__)  # -> LocalDrive
    
        print(data2.count_files())  # -> 
    
        try:
            data3 = FileSystem('foobar://42')  # Unregistered path prefix.
        except FileSystem.Unknown as exc:
            print(str(exc), '- raised as expected')
        else:
            raise RuntimeError(
                  "Unregistered path prefix should have raised Exception!")
    

    This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.

    Firstly, for the decorator on __init_subclass__, per the PEP:

    One could require the explicit use of @classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.

    Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).

    In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as @martineau does here:

    class Nfs(FileSystem, path_prefix='nfs'): ...
    class LocalDrive(FileSystem, path_prefix=None): ...
    

    When browsing through the PEP:

    As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.

    Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass= keyword) and the overridden __init_subclass__. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop().

    If we look at the following:

    # code in CPython 3.7
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        def __init_subclass__(cls, **kwargs):
            path_prefix = kwargs.pop('path_prefix', None)
            super().__init_subclass__(**kwargs)
            cls._registry[path_prefix] = cls  # Add class to registry.
    
        ...
    
    class Nfs(FileSystem, path_prefix='nfs'): ...
    

    This will still run without issue, but if we remove the kwargs.pop():

        def __init_subclass__(cls, **kwargs):
            super().__init_subclass__(**kwargs)  # throws TypeError
            cls._registry[path_prefix] = cls  # Add class to registry.
    

    The error thrown is already known and described in the PEP:

    In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.

    What is happening is the path_prefix= keyword is being "popped" off of kwargs, not just accessed, so then **kwargs is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).

    To avoid this entirely, I propose not relying on kwargs but instead use that which is already present in the call to __init_subclass__, namely the cls reference:

    # code in CPython 3.7
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        def __init_subclass__(cls, **kwargs):
            super().__init_subclass__(**kwargs)
            cls._registry[cls._path_prefix] = cls  # Add class to registry.
    
        ...
    
    class Nfs(FileSystem):
        _path_prefix = 'nfs'
    
        ...
    

    Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.

    So to @martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.

提交回复
热议问题