LPTHW - Exercise 23 (UTF-8 Error)

For anyone getting errors similar to the one below

python ex23.py utf-8 strict
Traceback (most recent call last):
  File "ex23.py", line 23, in <module>
    main(languages, input_encoding, error)
  File "ex23.py", line 6, in main
    line = language_file.readline()
  File "C:\Users\Aahrvenos\AppData\Local\Programs\Python\Python37-32\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 11-12: invalid continuation byte

The issue is that you have to actually download the text file (CTRL-S on the web page where the pdf directs you) I copied the text and put it in notepad then saved it as languages.txt. Not sure why that doesn’t work but yeah :slight_smile:

1 Like

Ahhhhhhhh yes that would most likely save it as your computer’s encoding, which would really only work in the US and UK. Everyone else would be in trouble.

Now that’s a handy bit of info. I happen to live in the U.S. and so I can only assume there’s something funky going on with my computer. Either way, happy for the resolution, and the extra tidbit! :grinning:

Yes, either you’re using something like utf-16 or some other encoding. PowerShell is weird.

Gotcha :+1:

Some time ago someone on the forum recommended using Cmder to execute the code. I can confirm it works flawlessly with Example 23.

link to Cmder: https://cmder.net/

I’ve since moved on from this exercise but I did check to see what encoding my pc was using. I checked with the powershell command:

[System.Text.Encoding]::Default

… and it says it is using Bodyname (iso-8859-1) and CodePage (1252) which according to Wikipedia is an 8-bit encoding… So I can also confirm… Powershell is weird… Why it wouldn’t display with copy/paste is once again beyond me :expressionless:

Yeah, cmdr is great. The installer is terrible but once you get past that it works.

import sys 

script , input_encoding, error=sys.argv

def main(language_file, encoding, errors):
     line=language_file.readline()

     if line:
         print_line(line, encoding, errors)
         return main(language_file, encoding, errors)

def print_line(line, encoding, errors):
    next_lang = line.strip()
    raw_bytes = next_lang.encode(encoding, errors=errors)
    cooked_bytes = raw_bytes.decode(encoding, errors=errors)

    print (raw_bytes ,   "<===>" , cooked_bytes)

languages=open("languages.txt", encoding= "utf-16")

main(languages, input_encoding, error)

and it gives me o/p with following error! whats the issue here

lpthw>python ex23.py utf-16 languages.txt
Traceback (most recent call last):
  File "ex23.py", line 21, in <module>
    main(languages, input_encoding, error)
  File "ex23.py", line 6, in main
    line=language_file.readline()
  File "C:\Program Files (x86)\Python37-32\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "C:\Program Files (x86)\Python37-32\lib\encodings\utf_16.py", line 61, in _buffer_decode
    codecs.utf_16_ex_decode(input, errors, 0, final)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 812-813: illegal UTF-16 surrogate

Hi @littleman, in the future can you please start new threads instead of hijacking old ones. Your issue is different enough that it needs a separate thread. Also, do this with your code:

[code]
# your code here
[/code]

That way it’s formatted well.

Answer

You seem to have added encoding="utf-16" but it should be encoding="utf-8". Can you explain why you made that change? Does it work when you change it back?