How to convert a string to utf-8 in Python

How to convert a string to utf-8 in Python

In Python 2

>>> plain_string = Hi!
>>> unicode_string = uHi!
>>> type(plain_string), type(unicode_string)
(<type str>, <type unicode>)

^ This is the difference between a byte string (plain_string) and a unicode string.

>>> s = Hello!
>>> u = unicode(s, utf-8)

^ Converting to unicode and specifying the encoding.

In Python 3

All strings are unicode. The unicode function does not exist anymore. See answer from @Noumenon

If the methods above dont work, you can also tell Python to ignore portions of a string that it cant convert to utf-8:

stringnamehere.decode(utf-8, ignore)

How to convert a string to utf-8 in Python

Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:

def make_unicode(inp):
    if type(inp) != unicode:
        inp =  inp.decode(utf-8)
    return inp

Leave a Reply

Your email address will not be published. Required fields are marked *