Python print isn't using __repr__, __unicode__ or __str__ for unicode subclass?

前端 未结 2 1801
面向向阳花
面向向阳花 2021-01-02 07:19

Python print isn\'t using __repr__, __unicode__ or __str__ for my unicode subclass when printing. Any clues as to what I am doing wron

相关标签:
2条回答
  • 2021-01-02 07:45

    You are subclassing unicode.

    It'll never call __unicode__ because it already is unicode. What happens here instead is that the object is encoded to the stdout encoding:

    >>> s.encode('utf8')
    'HI'
    

    except that it'll use direct C calls instead of the .encode() method. This is the default behaviour for print for unicode objects.

    The print statement calls PyFile_WriteObject, which in turn calls PyUnicode_AsEncodedString when handling a unicode object. The latter then defers to an encoding function for the current encoding, and these use the Unicode C macros to access the data structures directly. You cannot intercept this from Python.

    What you are looking for is an __encode__ hook, I guess. Since this is already a unicode subclass, print needs only to encode, not to convert it to unicode again, nor can it convert it to string without encoding it explicitly. You'd have to take this up with the Python core developers, to see if an __encode__ makes sense.

    0 讨论(0)
  • 2021-01-02 07:55

    The problem is that print doesn't respect __str__ on unicode subclasses.

    From PyFile_WriteObject, used by print:

    int
    PyFile_WriteObject(PyObject *v, PyObject *f, int flags)
    {
    ...
            if ((flags & Py_PRINT_RAW) &&
        PyUnicode_Check(v) && enc != Py_None) {
        char *cenc = PyString_AS_STRING(enc);
        char *errors = fobj->f_errors == Py_None ? 
          "strict" : PyString_AS_STRING(fobj->f_errors);
        value = PyUnicode_AsEncodedString(v, cenc, errors);
        if (value == NULL)
            return -1;
    

    PyUnicode_Check(v) returns true if v's type is unicode or a subclass. This code therefore writes unicode objects directly, without consulting __str__.

    Note that subclassing str and overriding __str__ works as expected:

    >>> class mystr(str):
    ...     def __str__(self): return "str"
    ...     def __repr__(self): return "repr"
    ... 
    >>> print mystr()
    str
    

    as does calling str or unicode explicitly:

    >>> class myuni(unicode):
    ...     def __str__(self): return "str"
    ...     def __repr__(self): return "repr"
    ...     def __unicode__(self): return "unicode"
    ... 
    >>> print myuni()
    
    >>> str(myuni())
    'str'
    >>> unicode(myuni())
    u'unicode'
    

    I believe this could be construed as a bug in Python as currently implemented.

    0 讨论(0)
提交回复
热议问题