[Solved?] LPTHW3 Exercise 23 Extra Credit 3 -- Reverse Script


#1

My Google-fu isn’t good enough for this one.

For Learn Python3 The Hard Way Exercise 23, under Breaking It: 3) I sort of got that bit to work, but I had to cheat and put the actual byte string in the script and break the loop. Like so:

Some Terrible Code

import sys
script, input_encoding, error = sys.argv

def main(language_file, encoding, errors):
print(">>>> Entering main function")
line = language_file #.readline – I’m not reading a line right now

if line:
    print_line(line, encoding, errors)
    #return main(language_file, encoding, errors)
print("<<<< Exiting main function")

def print_line(line, encoding, errors):
print(">> Entering print_line function")
next_lang = line.strip()
print(next_lang)
cooked_string = next_lang.decode(encoding, errors=errors)
raw_bytes = cooked_string.encode(encoding, errors=errors)

print(cooked_string, "<====>", raw_bytes)
print("<< Exiting print_line function")

languages = b’\xd0\x90\xd2\xa7\xd1\x81\xd1\x88\xd3\x99\xd0\xb0’ #or another byte string

#languages = open(‘bytes.txt’)
#languages = open(“bytes.txt”, ‘r’, ‘b’)
#languages = open(“bytes.txt”, encoding=“utf-8”)
#none of these lets python read the context of “bytes.txt” as a byte string

main(languages, input_encoding, error)

That works fine, but I can’t quite figure out how if I have a text file with a few lines of typed out byte strings to get python to read them as byte strings and not a strings.

I tried changing the parameters of opening the file away from encoding=“utf-8”, I tried just a default open with no parameters and with ‘r’ and ‘b’ as parameters. I also, not displayed here, tried futzing with the actual input-encoding, but I’m not sure if that makes a difference. Regardless, no matter what change I’ve made whenever I print next_lang rather than a byte string it prints out something like

“b’\\xd0\\x90\\xd2\\xa7\\xd1\\x81\\xd1\\x88\\xd3\\x99\\xd0\\xb0’”

Which does make total sense with the double slashes if that was a string, but I don’t want Python to read the lines as a string.

I’m sure I’m missing some obvious way to Google this, but I’m not sure what I should be precisely Googling, which usually isn’t something I run into. A hint or slight point in the right direction is all I really need here.


#2

For the record, the forum broke my indenting.

I slept on it, made some progress.

languages = open(“bytes.txt”, ‘rb’)

That seems to have better luck actually having readline do anything with the file.

Terminal Picture

The line or next_lang variables still aren’t quite how I’d want it, but hopefully I’m on the right track.


#3

Day Three on Exercise 23:

Spoilers

I’ve been thinking about this too hard. The languages.txt file doesn’t have to change and you don’t have to physically type the byte strings for something to be opened in binary mode (‘rb’). Python, in fact, does that work. I created a file called “bytes.txt”, but really, in the end I just copied a small part of languages.txt just to see if it would work. It does. I should rename the file “binary_mode.txt” or something more accurate.

Maybe there still is a way to have a text file with actual b’’ strings typed out inside the file and have python work with that, but for the purposes of Ex23, I’m going to answer the questions and slowly walk away for now. Maybe I’ll revisit it later.


#4

Yeahhhhh, that seems about right. That challenge is kind of open ended so looking at what you described and your code I think that’s probably good enough. It’s all about being able to convert from a bytes to a string and a string to a bytes without having to think on it too much.


#5

I am having a similar issue. I added this bit of code to create a Bytes file:

cooked_string = raw_bytes.decode(encoding, errors=errors)
example = raw_bytes ***
target.write(str(example))***
target.write("\n")***
print(raw_bytes, "<===>", cooked_string)

I rewrote the program to ‘reverse’ and ‘decode the bytes’ but now they are viewed as strings.

def print_line(line, encoding, errors):
next_lang = line.strip()
next_lang2 = next_lang[2:-1]
cooked_string = next_lang2.encode(encoding, errors=errors)
raw_bytes = cooked_string.decode(encoding, errors=errors)
print(cooked_string, "<===>", raw_bytes)

I get this when I print:
ex:
b’\xc4\x8ce\xc5\xa1tina’ <===> \xc4\x8ce\xc5\xa1tina

What am I doing wrong?


#6

That’s pretty much all you have to do. I’d say move on and you did this. Also, you can edit your post and put [code] [/code] around your code so it displays better.


#7

Thanks Zed…sorry I’m about to nerd out, but I’m pretty stoked that you replied. Love the book so far.


#8

I edited for you to show you what to do with the code blocks. I have no idea why it’s so strict but your code has to go on a new line to work, and be formatted like that.


#9

Ok, I’m glad I found this thread. Can someone tell me what’s wrong with my (not yet) solution?. Here is what I have done so far:

Step 1 - a conversion script that gives me a .txt file with the b’[language characters]’ using the original languages.txt file


import sys 
script, encoding, error, from_file, to_file = sys.argv #added from_file and to_file


def main(language_file, encoding, errors):
    line = language_file.readline()
    
    if line:
        next_lang = line.strip()
        raw_bytes = next_lang.encode(encoding, errors=errors) #DBES - decode bytes, encode strings. What does encode() do? It encodes the string from next_lang into bytes
        raw_bytes_file = open(to_file, 'a') #APPEND MODE!!!!!!! saving raw_bytes to to_file which was given in terminal as an argument...
        raw_bytes_file.write(str(raw_bytes)) #must apparently be a string according to the error message
        raw_bytes_file.write("\n") #if I omit this all the utf-8 stuff is in the same line = no newlines for readability
        print("This should be written to the file defined in the to_file argument: ", raw_bytes)
        return main(language_file, encoding, errors)
    
    
languages = open(from_file, encoding="utf-8") #changed "languages.txt" to argument from_file

main(languages, encoding, error)
#in terminal: python3 ex23_txt_conversion.py utf-8 strict languages.txt lang_bytes_only.txt

Step 2 - checking in the terminal whether the file was created and contains what I want:

ls
cat lang_bytes_only.txt

Step 3 - the reversing script:

import sys 
script, bytesfile = sys.argv


def main(language_file):
    line = language_file.readline()
    
    if line:
        print_line(line)
        return main(language_file)
    
    
def print_line(line):
    next_lang = line.strip()
    string = next_lang.decode()
    print(string)
    

languages = open(bytesfile)

main(languages)

#in terminal: python3 ex23_reversed.py lang_bytes_only.txt

Now it says in the terminal that

AttributeError: 'str' object has no attribute 'decode'

What I have figured out so far:

For Python, my lang_bytes_only.txt does not contain b’[utf-8 encoded characters]’ but “b’[utf-8 encoded characters]’” with “double quotes” at the beginning and the end of each line which makes it a string.

After googling a bit I found out that I could change my ex23_txt_conversion.py. open() not only has append mode ‘a’ but also byte mode ‘b’, or a combination, in my case I used ‘ab’. I also removed

raw_bytes_file.write("\n")

from ex23_txt_conversion.py for this experiment because I couldn’t figure out how to get a byte-newline.

Then

cat lang_bytes_only.txt

gave me this:

中文ייִדיש吴语文言VõroTiếng ViệtاردوУкраїнськаTürkçeТоҷикӣతెలుగుТатарча/tatarçaTaqbaylitภาษาไทยதமிழ்TagalogSvenskaSuomiСрпски / srpskiکوردیی ناوەندیSlovenčinaSimple EnglishShqipSeelterskРусскийRomaniRomânăPortuguêsPolskiPlattdüütschپښتوپنجابیਪੰਜਾਬੀOʻzbekcha/ўзбекчаOccitanNouormandNorsk bokmål日本語नेपाल भाषाNederlandsМонголBahasa MelayuمازِرونیმარგალურიमराठीMaltiМакедонскиMagyarLietuviųLëtzebuergeschLatviešuLatinaLatgaļuKreyòl ayisyenҚазақшаქართულიKapampanganಕನ್ನಡעבריתItalianoInterlinguaIdoHrvatskiहिन्दीՀայերեն한국어GalegoGàidhligGaelgFryskFrançaisفارسیEsperantoEspañolΕλληνικάEestiDeutschDanskCymraegČeštinaЧӑвашлаCatalàБуряадBosanskiBoarischБългарскиБеларускаяBân-lâm-gúবাংলাBamanankanAzərbaycancaArpetanAragonésالعربيةАҧсшәаአማርኛAfrikaans

which doesn’t look like what I want.

I gave up with the approach of getting a lang_bytes_only.txt with real bytes in it and tried to convert the strings from my original lang_bytes_only.txt when I work with the file in ex23_reversed.py but I cannot figure out how to do this right. pydoc is still a little bit too complicated for me and what I get from Google didn’t help either.

Can someone give me a hint?
Also, can someone tell my why my lang_bytes_only.txt contains the languages in reverse order?

Thanks a lot :slight_smile:

EDIT: Wait…did my ex23_txt_conversion.py do what was meant with “reversing the script”?


#10

Actually, just skip that part. I should remove that as it’s not a very good extra credit. I declare you now done with this exercise and you should move on.


#11

YAY! Thanks :smiley: