When I run the exercise 23 code in Jupyter Notebook, the decoding and encoding looks funny and the bytes inbetween the b’…’ is different from Zed’s. Do you know if this might be a problem due to Jupyter? It should be able to read unicode as it is a based on a browser, but somehow it seems to go wrong.
(I know I should not use Jupyter to do the exercises, but kind of forced as we use anaconda at my work. And it has actually taught me alot, since I have had to debug and rewrite code to make it work in Jupyter.)
Here is my code and output:
import sys sys.argv=['script', 'utf-8', 'strict'] def main(language_file, encoding, errors): line = language_file.readline() if line: print_line(line, encoding, errors) return main(language_file, encoding, errors) def print_line(line, encoding, errors): next_lang = line.strip() raw_bytes = next_lang.encode(encoding, errors=errors) cooked_string = raw_bytes.decode(encoding, errors=errors) print(raw_bytes, "<===>", cooked_string) languages = open("languages.txt", encoding = "utf-8") main(languages, sys.argv, sys.argv)
And then the output (I’ll just show you the top two lines).
b’\xef\xbb\xbfAfrikaans’ <===> Afrikaans
b’\xc3\xa1\xc5\xa0 \xc3\xa1\xcb\x86\xe2\x80\xba\xc3\xa1\xcb\x86\xc2\xad\xc3\xa1\xc5\xa0\xe2\x80\xba’<===> áŠ áˆ›áˆáŠ›
The “error” is not major, and my script runs. But the decoding/encoding becomes incorrect, and it bugs the hell out of me.
Hope someone knows what went wrong,