I enjoyed Ex 23. I got the script pretty much right away (and commented the hell out of it to ram it home) so I moved on to the extra credit stuff. I hit a brick wall with #3 and after hours of trying to research this online, I’ve thrown in the towel and come here.
I know you can reverse the process by importing the languages.txt file as binary and then go do your encode/decode that way. But I read the exercise literally so I considered that method to be a bit of a cheat and that what reversing the process really should be was having a text file full of the binary string outputs of languages.txt and starting there.
It was relatively easy to write a small script that took languages.txt and did the encode of the languages.txt file lines and spit them out to a file which I called binaries.txt
Binaries.txt looked like this when you opened it… (this is just a snippet)
b'Afrikaans'
b'\xe1\x8a\xa0\xe1\x88\x9b\xe1\x88\xad\xe1\x8a\x9b'
b'\xd0\x90\xd2\xa7\xd1\x81\xd1\x88\xd3\x99\xd0\xb0'
b'\xd8\xa7\xd9\x84\xd8\xb9\xd8\xb1\xd8\xa8\xd9\x8a\xd8\xa9'
b'Aragon\xc3\xa9s'
b'Arpetan'
b'Az\xc9\x99rbaycanca'
b'Bamanankan'
b'\xe0\xa6\xac\xe0\xa6\xbe\xe0\xa6\x82\xe0\xa6\xb2\xe0\xa6\xbe'
The hard part, and what has caused me to draw a blank despite numerous attempts at solving it, is getting each line to format correctly as binary.
The problem is if I do this…
binaries = open("binaries.txt", encoding="utf-8")
The opened file’s lines will come in as strings. At that point if you try to convert a line as binary
a whole bunch of escape characters will be added and instead of seeing this
b'\xe1\x8a\xa0\xe1\x88\x9b\xe1\x88\xad\xe1\x8a\x9b'
I end up with this…
b"b'\\xe1\\x8a\\xa0\\xe1\\x88\\x9b\\xe1\\x88\\xad\\xe1\\x8a\\x9b'"
And of course there’s no way to decode that into its expected utf-8 string text counterpart.
The same thing happens if you try to open the file with “r+b”
binaries = open("binaries.txt", "r+b")
you’ll get a bunch of escape characters added to your line.
Now, after researching this, I totally understand what’s going on here and why it’s doing that and it makes perfect sense.
But it doesn’t get me any closer to solving my problem. Surely there must be a way to read from a file what are essentially binary strings (albeit in text string format) into Python and flip them over to binary strings without corrupting the line with a bunch of escape characters. But for the life of me I haven’t found it and I’m all out of ideas (and have hit the red vino to boot).
Maybe I’m overthinking this and going with the first option at the top is the easiest way out (obviously) but now that I’ve gotten this far and spent this much time on it I really want to know the solution or at least to understand what it would be.
So if anyone can throw me a clue that would be great. Meantime I’m off to Ex24 because I’ve done everything I can with Ex23 now.