How to convert a string to utf-8 in Python
How to convert a string to utf-8 in Python
In Python 2
>>> plain_string = Hi!
>>> unicode_string = uHi!
>>> type(plain_string), type(unicode_string)
(<type str>, <type unicode>)
^ This is the difference between a byte string (plain_string) and a unicode string.
>>> s = Hello!
>>> u = unicode(s, utf-8)
^ Converting to unicode and specifying the encoding.
In Python 3
All strings are unicode. The unicode
function does not exist anymore. See answer from @Noumenon
If the methods above dont work, you can also tell Python to ignore portions of a string that it cant convert to utf-8:
stringnamehere.decode(utf-8, ignore)
How to convert a string to utf-8 in Python
Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:
def make_unicode(inp):
if type(inp) != unicode:
inp = inp.decode(utf-8)
return inp