[Solved?] LPTHW3 Exercise 23 Extra Credit 3 -- Reverse Script


#1

My Google-fu isn’t good enough for this one.

For Learn Python3 The Hard Way Exercise 23, under Breaking It: 3) I sort of got that bit to work, but I had to cheat and put the actual byte string in the script and break the loop. Like so:

Some Terrible Code

import sys
script, input_encoding, error = sys.argv

def main(language_file, encoding, errors):
print(">>>> Entering main function")
line = language_file #.readline – I’m not reading a line right now

if line:
    print_line(line, encoding, errors)
    #return main(language_file, encoding, errors)
print("<<<< Exiting main function")

def print_line(line, encoding, errors):
print(">> Entering print_line function")
next_lang = line.strip()
print(next_lang)
cooked_string = next_lang.decode(encoding, errors=errors)
raw_bytes = cooked_string.encode(encoding, errors=errors)

print(cooked_string, "<====>", raw_bytes)
print("<< Exiting print_line function")

languages = b’\xd0\x90\xd2\xa7\xd1\x81\xd1\x88\xd3\x99\xd0\xb0’ #or another byte string

#languages = open(‘bytes.txt’)
#languages = open(“bytes.txt”, ‘r’, ‘b’)
#languages = open(“bytes.txt”, encoding=“utf-8”)
#none of these lets python read the context of “bytes.txt” as a byte string

main(languages, input_encoding, error)

That works fine, but I can’t quite figure out how if I have a text file with a few lines of typed out byte strings to get python to read them as byte strings and not a strings.

I tried changing the parameters of opening the file away from encoding=“utf-8”, I tried just a default open with no parameters and with ‘r’ and ‘b’ as parameters. I also, not displayed here, tried futzing with the actual input-encoding, but I’m not sure if that makes a difference. Regardless, no matter what change I’ve made whenever I print next_lang rather than a byte string it prints out something like

“b’\\xd0\\x90\\xd2\\xa7\\xd1\\x81\\xd1\\x88\\xd3\\x99\\xd0\\xb0’”

Which does make total sense with the double slashes if that was a string, but I don’t want Python to read the lines as a string.

I’m sure I’m missing some obvious way to Google this, but I’m not sure what I should be precisely Googling, which usually isn’t something I run into. A hint or slight point in the right direction is all I really need here.


#2

For the record, the forum broke my indenting.

I slept on it, made some progress.

languages = open(“bytes.txt”, ‘rb’)

That seems to have better luck actually having readline do anything with the file.

Terminal Picture

The line or next_lang variables still aren’t quite how I’d want it, but hopefully I’m on the right track.


#3

Day Three on Exercise 23:

Spoilers

I’ve been thinking about this too hard. The languages.txt file doesn’t have to change and you don’t have to physically type the byte strings for something to be opened in binary mode (‘rb’). Python, in fact, does that work. I created a file called “bytes.txt”, but really, in the end I just copied a small part of languages.txt just to see if it would work. It does. I should rename the file “binary_mode.txt” or something more accurate.

Maybe there still is a way to have a text file with actual b’’ strings typed out inside the file and have python work with that, but for the purposes of Ex23, I’m going to answer the questions and slowly walk away for now. Maybe I’ll revisit it later.


#4

Yeahhhhh, that seems about right. That challenge is kind of open ended so looking at what you described and your code I think that’s probably good enough. It’s all about being able to convert from a bytes to a string and a string to a bytes without having to think on it too much.


#5

I am having a similar issue. I added this bit of code to create a Bytes file:

cooked_string = raw_bytes.decode(encoding, errors=errors)
example = raw_bytes ***
target.write(str(example))***
target.write("\n")***
print(raw_bytes, "<===>", cooked_string)

I rewrote the program to ‘reverse’ and ‘decode the bytes’ but now they are viewed as strings.

def print_line(line, encoding, errors):
next_lang = line.strip()
next_lang2 = next_lang[2:-1]
cooked_string = next_lang2.encode(encoding, errors=errors)
raw_bytes = cooked_string.decode(encoding, errors=errors)
print(cooked_string, "<===>", raw_bytes)

I get this when I print:
ex:
b’\xc4\x8ce\xc5\xa1tina’ <===> \xc4\x8ce\xc5\xa1tina

What am I doing wrong?


#6

That’s pretty much all you have to do. I’d say move on and you did this. Also, you can edit your post and put [code] [/code] around your code so it displays better.


#7

Thanks Zed…sorry I’m about to nerd out, but I’m pretty stoked that you replied. Love the book so far.


#8

I edited for you to show you what to do with the code blocks. I have no idea why it’s so strict but your code has to go on a new line to work, and be formatted like that.


#9

Ok, I’m glad I found this thread. Can someone tell me what’s wrong with my (not yet) solution?. Here is what I have done so far:

Step 1 - a conversion script that gives me a .txt file with the b’[language characters]’ using the original languages.txt file


import sys 
script, encoding, error, from_file, to_file = sys.argv #added from_file and to_file


def main(language_file, encoding, errors):
    line = language_file.readline()
    
    if line:
        next_lang = line.strip()
        raw_bytes = next_lang.encode(encoding, errors=errors) #DBES - decode bytes, encode strings. What does encode() do? It encodes the string from next_lang into bytes
        raw_bytes_file = open(to_file, 'a') #APPEND MODE!!!!!!! saving raw_bytes to to_file which was given in terminal as an argument...
        raw_bytes_file.write(str(raw_bytes)) #must apparently be a string according to the error message
        raw_bytes_file.write("\n") #if I omit this all the utf-8 stuff is in the same line = no newlines for readability
        print("This should be written to the file defined in the to_file argument: ", raw_bytes)
        return main(language_file, encoding, errors)
    
    
languages = open(from_file, encoding="utf-8") #changed "languages.txt" to argument from_file

main(languages, encoding, error)
#in terminal: python3 ex23_txt_conversion.py utf-8 strict languages.txt lang_bytes_only.txt

Step 2 - checking in the terminal whether the file was created and contains what I want:

ls
cat lang_bytes_only.txt

Step 3 - the reversing script:

import sys 
script, bytesfile = sys.argv


def main(language_file):
    line = language_file.readline()
    
    if line:
        print_line(line)
        return main(language_file)
    
    
def print_line(line):
    next_lang = line.strip()
    string = next_lang.decode()
    print(string)
    

languages = open(bytesfile)

main(languages)

#in terminal: python3 ex23_reversed.py lang_bytes_only.txt

Now it says in the terminal that

AttributeError: 'str' object has no attribute 'decode'

What I have figured out so far:

For Python, my lang_bytes_only.txt does not contain b’[utf-8 encoded characters]’ but “b’[utf-8 encoded characters]’” with “double quotes” at the beginning and the end of each line which makes it a string.

After googling a bit I found out that I could change my ex23_txt_conversion.py. open() not only has append mode ‘a’ but also byte mode ‘b’, or a combination, in my case I used ‘ab’. I also removed

raw_bytes_file.write("\n")

from ex23_txt_conversion.py for this experiment because I couldn’t figure out how to get a byte-newline.

Then

cat lang_bytes_only.txt

gave me this:

中文ייִדיש吴语文言VõroTiếng ViệtاردوУкраїнськаTürkçeТоҷикӣతెలుగుТатарча/tatarçaTaqbaylitภาษาไทยதமிழ்TagalogSvenskaSuomiСрпски / srpskiکوردیی ناوەندیSlovenčinaSimple EnglishShqipSeelterskРусскийRomaniRomânăPortuguêsPolskiPlattdüütschپښتوپنجابیਪੰਜਾਬੀOʻzbekcha/ўзбекчаOccitanNouormandNorsk bokmål日本語नेपाल भाषाNederlandsМонголBahasa MelayuمازِرونیმარგალურიमराठीMaltiМакедонскиMagyarLietuviųLëtzebuergeschLatviešuLatinaLatgaļuKreyòl ayisyenҚазақшаქართულიKapampanganಕನ್ನಡעבריתItalianoInterlinguaIdoHrvatskiहिन्दीՀայերեն한국어GalegoGàidhligGaelgFryskFrançaisفارسیEsperantoEspañolΕλληνικάEestiDeutschDanskCymraegČeštinaЧӑвашлаCatalàБуряадBosanskiBoarischБългарскиБеларускаяBân-lâm-gúবাংলাBamanankanAzərbaycancaArpetanAragonésالعربيةАҧсшәаአማርኛAfrikaans

which doesn’t look like what I want.

I gave up with the approach of getting a lang_bytes_only.txt with real bytes in it and tried to convert the strings from my original lang_bytes_only.txt when I work with the file in ex23_reversed.py but I cannot figure out how to do this right. pydoc is still a little bit too complicated for me and what I get from Google didn’t help either.

Can someone give me a hint?
Also, can someone tell my why my lang_bytes_only.txt contains the languages in reverse order?

Thanks a lot :slight_smile:

EDIT: Wait…did my ex23_txt_conversion.py do what was meant with “reversing the script”?


#10

Actually, just skip that part. I should remove that as it’s not a very good extra credit. I declare you now done with this exercise and you should move on.


#11

YAY! Thanks :smiley:


#12

I used languages = open(“languages.txt”, ‘rb’) to “reverse the script”. Seems to work, but I’m not sure that was what was intended.

Also not sure what was meant by breaking the bytes and removing some.

Anyway, here’s my code:

import sys
script, encoding, error = sys.argv

def main(language_file, encoding, errors):
    line = language_file.readline()

    if line:
        print_line(line, encoding, errors)
        return main(language_file, encoding, errors)

def print_line(line, encoding, errors):
    next_lang = line.strip()
    cooked_string = next_lang.decode(encoding, errors=errors)
    raw_bytes = cooked_string.encode(encoding, errors=errors)

    print(cooked_string, "<===>", raw_bytes)

languages = open("languages.txt", 'rb')

main(languages, encoding, error)

#13

Yep, that’s good enough. It’s honestly not a very good extra credit since it’s just getting you try to do the reverse which ends up doing the same thing mostly.


#14

Thanks. Great book BTW.


#15

Hi,

I think my question fits into this thread. I had some extra fun with this exercise. I created a document where I encoded all the languages and wrote down all the byte strings. Then I wanted to read in the byte strings to create the original document again (decode the strings), but python reads the byte strings as a str, even though it looks like this in the document: b’Catal\xc3\xa0’

I will add my code below, that is the code where I am reading the document (that has all the languages as string bytes as the string above) and decode them again.

def decoding(open_file):
    line = open_file.readline()

    if line:
        stripped_line = line.strip()
        decoded_lang = stripped_line.decode()
        open_file_decoded.write(f"{decoded_lang} \n")
        decoding(open_file)
    else:
        open_file_encoded.close()
        open_file_decoded.close()
        print("nothing else to process")


open_file_encoded = open("languages_encoded.txt")
open_file_decoded = open("languages_decoded.txt", 'w')
decoding(open_file_encoded)

My question is, how do I make sure Python knows it’s byte strings, on not strings? This is the error message that I’m getting with the code above:

Traceback (most recent call last):
  File "ex23e.py", line 17, in <module>
    decoding(open_file_encoded)
  File "ex23e.py", line 6, in decoding
    decoded_lang = stripped_line.decode()
AttributeError: 'str' object has no attribute 'decode'

#16

Hi Malin,
I think the problem is that the contents of the file you are trying to read are returned as a string, not as raw bytes.
I’d try to see what exactly is that the file returns. I’d use type.
I’d do: print(type(line)) right after the line declaration in your function.


#17

Hi @io_io
Yes, exactly, that’s the problem. Sorry that I wasn’t clear about that. I am getting a string from the file (I have done the test you suggested to confirm this) and I need to make Python understand that I want to read it as raw bytes. Do you know how I can do that?