Advertisement
Advertisement


How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte"


Question

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File "/usr/local/bin/wok", line 4, in
Engine()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
self.load_pages()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
page.meta['content'] = page.renderer.render(page.original)
File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
return markdown(plain, Markdown.plugins)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
return md.convert(text)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

How to fix it?

In some other python-based static blog apps, Chinese post can be published successfully. Such as this app: http://github.com/vrypan/bucket3. In my site http://bc3.brite.biz/, Chinese post can be published successfully.

2017/11/12
1
468
11/12/2017 10:43:05 AM


Finally I got it:

as3:/usr/local/lib/python2.7/site-packages# cat sitecustomize.py
# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

Let me check:

as3:~/ngokevin-site# python
Python 2.7.6 (default, Dec  6 2013, 14:49:02)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.getdefaultencoding()
'utf8'
>>>

The above shows the default encoding of python is utf8. Then the error is no more.

2016/01/04

This is the classic "unicode issue". I believe that explaining this is beyond the scope of a StackOverflow answer to completely explain what is happening.

It is well explained here.

In very brief summary, you have passed something that is being interpreted as a string of bytes to something that needs to decode it into Unicode characters, but the default codec (ascii) is failing.

The presentation I pointed you to provides advice for avoiding this. Make your code a "unicode sandwich". In Python 2, the use of from __future__ import unicode_literals helps.

Update: how can the code be fixed:

OK - in your variable "source" you have some bytes. It is not clear from your question how they got in there - maybe you read them from a web form? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are. You need to explicitly tell it what the encoding is. This means that you need to know what the encoding is! That is not always easy, and it depends entirely on where this string came from. You could experiment with some common encodings - for example UTF-8. You tell unicode() the encoding as a second parameter:

source = unicode(source, 'utf-8')
2019/09/04

In some cases, when you check your default encoding (print sys.getdefaultencoding()), it returns that you are using ASCII. If you change to UTF-8, it doesn't work, depending on the content of your variable. I found another way:

import sys
reload(sys)  
sys.setdefaultencoding('Cp1252')
2014/11/05

I was searching to solve the following error message:

unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 5454: ordinal not in range(128)

I finally got it fixed by specifying 'encoding':

f = open('../glove/glove.6B.100d.txt', encoding="utf-8")

Wish it could help you too.

2018/03/06

"UnicodeDecodeError: 'ascii' codec can't decode byte"

Cause of this error: input_string must be unicode but str was given

"TypeError: Decoding Unicode is not supported"

Cause of this error: trying to convert unicode input_string into unicode


So first check that your input_string is str and convert to unicode if necessary:

if isinstance(input_string, str):
   input_string = unicode(input_string, 'utf-8')

Secondly, the above just changes the type but does not remove non ascii characters. If you want to remove non-ascii characters:

if isinstance(input_string, str):
   input_string = input_string.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.

elif isinstance(input_string, unicode):
   input_string = input_string.encode('ascii', 'ignore')