Ex23,trying to output in a file

ttx002000 · January 6, 2020, 6:32am

Here is my code,

from sys import argv
script,errors,encoding =argv

def main(file,encoding,errors):
    line = file.readline()
    if line:
        print_a_line(line,encoding,"ignore")
        main(file,encoding,errors)

def print_a_line(line,encoding,errors):
    newline=line.strip()
    raw_byte=newline.encode(encoding,errors="ignore")
    cooked_string=raw_byte.decode(encoding,errors='ignore')
    outputfile.write(str(raw_byte))
    outputfile.write("<<<<<>>>>>>")
    outputfile.write(cooked_string)

language= open("languages.txt",encoding="utf-8")
outputfile=open("test.txt","w")
main(language,encoding,'ignore')

If I use “write” function, it will show errors saying “UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\u12a0’ in position 0: illegal multibyte sequence”(basically I am trying to present the result in a new file instead of presenting them on the terminal)

However, if I just use “print” to replace the last three “write”, the program will run just fine. What’s going on here?Thx

florian · January 6, 2020, 9:27am

Are you sure this depends on whether you use write? With gbk and big5 I get the same error regardless of where I direct the output.

I just looked and actually Zed shows the very same error message at the end of the exercise. Maybe reread that, it’s in the Encodings Deep Dive section.

ttx002000 · January 6, 2020, 9:53am

i think there is output in the output file, but just one line, something like"b’Afrikaans’========Afrikaansb’\xe1\x8a\xa0\xe1\x88\x9b\xe1\x88\xad\xe1\x8a\x9b’========" and no more. But if I use print, I can have the full list just as what Zed shows in his book.

zedshaw · January 10, 2020, 6:43pm

Hmmm, I think you should check the languages.txt file and make sure it really is utf-8. You might have your computer set to gbk and then when you view it in your browser it converts it. Delete that file, then save as… in the browser to you get the real one.