ZyXI January 28, 2015 at 22:23

Push non-ASCII into unintended places

He sat at home in the evening, thinking what to do. A! Python has a debugger, but it has a completely ugly prompt for input. Let me stared back powerline . The point would seem completely trifling: you just need to create your own subclass pdb.Pdb with its own property , right?

def use_powerline_prompt(cls):
    '''Decorator that installs powerline prompt to the class
    '''
    @property
    def prompt(self):
        try:
            powerline = self.powerline
        except AttributeError:
            powerline = PDBPowerline()
            powerline.setup(self)
            self.powerline = powerline
        return powerline.render(side='left')
    @prompt.setter
    def prompt(self, _):
        pass
    cls.prompt = prompt
    return cls

Not. On Python-3, such code can still work, but on Python-2 we are already faced with a problem: to output it is necessary to turn a Unicode string into a set of bytes , which requires an encoding. Well, it's simple:

encoding = get_preferred_output_encoding()
def prompt(self):
    …
    ret = powerline.render(side='left')
    if not isinstance(ret, str):
        # Python-2
        ret = ret.encode(encoding)
    return ret

. It is simple and it works ... until the user installs pdbpp . Now we are greeted by a number of errors related to the fact that pdbpp can use pyrepl, and pyrepl does not work with Unicode (moreover, whether pyrepl will be used, somehow depends on the value of $TERM¹). Errors related to the fact that someone doesn’t want to see Unicode in the invitation are not new - IPython has also tried to disable Unicode in rewrite prompt². But here everything is much worse: pyrepl uses from __future__ import unicode_literals, while doing using ordinary strings (converted by this import to unicode), various operations on the prompt string, which can be explicitly converted to strat the very beginning.

So, here is what we need:

A successor class unicodethat would convert to strwithout throwing errors on non-ASCII characters (conversion is carried out simply in the form str(prompt)). This part is very simple: you need to redefine the methods __str__and __new__(without the second, you can, in principle, do, but it’s more convenient when converting to this class from the following and for the possibility of explicitly specifying the encoding to be used).
An inherited class strinto which the previous class would be converted. Overriding the two methods here is categorically not enough:
1. __new__needed for convenient saving of encoding and no need for explicit conversion unicode→ str.
2. __contains__and several other methods should work with unicode arguments as if the current class is unicode(nothing needs to be changed for non-unicode arguments). The fact is, if any, unicode_literals'\n' in promptthrows an exception if it prompt is a byte string with non-ASCII characters, since Python is trying to cast promptto unicode, and not vice versa.
3. findand similar functions should work with unicode arguments as if they were byte strings in the current encoding. This is necessary so that they produce the correct indexes, but do not fail with errors due to converting the byte string to unicode (and here is why the conversion is not the opposite?).
4. __len__should produce the length of the string in unicode codepoints. This part is needed so that pyrepl, which considers where the invitation ends (and puts the cursor accordingly), does not make a mistake and does not make a giant space between the invitation and the cursor. I suspect that you really need to use not codepoints, but the line width in the screen cells (which, for example, does strdisplaywidth () in Vim).
5. __add__should return our first heir class unicodewhen added to a unicode string. __radd__must do the same. Addition of byte strings should be given by our successor class str. More details in the next paragraph.
6. Well, finally, __getslice__(attention: __getitem__does not roll, struses deprecated __getslice__for slices) it should return an object of the same class, since pyrepl at the very end adds an empty unicode string, a slice from the current class and another slice from it. And if we ignore this part, then again we get some kind of UnicodeError .

The result is the following two freaks:

class PowerlineRenderBytesResult(bytes):
    def __new__(cls, s, encoding=None):
        encoding = encoding or s.encoding
        self = bytes.__new__(cls, s.encode(encoding) if isinstance(s, unicode) else s)
        self.encoding = encoding
        return self
    for meth in (
        '__contains__',
        'partition', 'rpartition',
        'split', 'rsplit',
        'count', 'join',
    ):
        exec((
            'def {0}(self, *args):\n'
            '   if any((isinstance(arg, unicode) for arg in args)):\n'
            '       return self.__unicode__().{0}(*args)\n'
            '   else:\n'
            '       return bytes.{0}(self, *args)'
        ).format(meth))
    for meth in (
        'find', 'rfind',
        'index', 'rindex',
    ):
        exec((
            'def {0}(self, *args):\n'
            '   if any((isinstance(arg, unicode) for arg in args)):\n'
            '       args = [arg.encode(self.encoding) if isinstance(arg, unicode) else arg for arg in args]\n'
            '   return bytes.{0}(self, *args)'
        ).format(meth))
    def __len__(self):
        return len(self.decode(self.encoding))
    def __getitem__(self, *args):
        return PowerlineRenderBytesResult(bytes.__getitem__(self, *args), encoding=self.encoding)
    def __getslice__(self, *args):
        return PowerlineRenderBytesResult(bytes.__getslice__(self, *args), encoding=self.encoding)
    @staticmethod
    def add(encoding, *args):
        if any((isinstance(arg, unicode) for arg in args)):
            return ''.join((
                arg
                if isinstance(arg, unicode)
                else arg.decode(encoding)
                for arg in args
            ))
        else:
            return PowerlineRenderBytesResult(b''.join(args), encoding=encoding)
    def __add__(self, other):
        return self.add(self.encoding, self, other)
    def __radd__(self, other):
        return self.add(self.encoding, other, self)
    def __unicode__(self):
        return PowerlineRenderResult(self)
class PowerlineRenderResult(unicode):
    def __new__(cls, s, encoding=None):
        encoding = (
            encoding
            or getattr(s, 'encoding', None)
            or get_preferred_output_encoding()
        )
        if isinstance(s, unicode):
            self = unicode.__new__(cls, s)
        else:
            self = unicode.__new__(cls, s, encoding, 'replace')
        self.encoding = encoding
        return self
    def __str__(self):
        return PowerlineRenderBytesResult(self)

(in Python2 bytes is str).

The result on github so far is only in my branch , later it will be in the developmain repository.
Of course, the result is not limited only to pyrepl, but can be applied in various places where you cannot slip a non-ASCII string, but you really want to.

¹ When TERM=xterm-256colorI get errors from pyrepl, when with TERM=or TERM=konsole-256color - no and everything works fine.
² What you will see if you enable autocall in IPython and type int 42: Powerline IPython in and rewrite prompt

(bottom line).

Tags:

Push non-ASCII into unintended places

Also popular now: