Opened 18 years ago
Closed 18 years ago
#4917 closed defect (fixed)
ofdb.py: charset issues
| Reported by: | Owned by: | Anduin Withers | |
|---|---|---|---|
| Priority: | minor | Milestone: | 0.21.1 |
| Component: | mythtv | Version: | unknown |
| Severity: | medium | Keywords: | |
| Cc: | Ticket locked: | no |
Description
ofdb.py has some charset issues.
When searching for a movie ID using ofdb.py -M, the query is sent encoded as UTF8 and the server does not return any hits. It seems it expects iso-8859-15 instead.
I tried to recode the query to iso-8859-15 inside ofdb.py, but urllib would complain because it was expecting ASCII. I'm not sure where to go from here.
The movie meta data returned by ofdb.py doesn't look good in the video manager either. It looks like UTF-8 being displayed as latin-1, eg you get two weird characters instead of one umlaut.The same happens in the console, but it goes away if I comment out
content = unicode(content, charset)
in line 105.
Here's a simple test case: ofdb.py -M 'identität'
My terminal and my environment are both set to UTF-8.
Change History (5)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:3 by , 18 years ago
| Milestone: | unknown → 0.21.1 |
|---|
comment:4 by , 18 years ago
comment:5 by , 18 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Merges [16988], [16995], [17012], and [17051] from trunk.
- Fixes character encoding issues (page reports UTF-8 but contains ISO-8859/UTF-8).
- Fixes HTML parser error caused by bad meta/script tags.
- Modify the modules search path to include oldxml if the directory exists, the previous workaround didn't because the xml modules were not reloaded. This is for Ubuntu Hardy, they are removing python-xml.
Thanks to Michael Haas (laga) for finding these issues, and for finding them again when my initial fix attempts failed.

I've created a small wrapper script which converts all input to iso8859-15:
When using this script and commenting out line 105 as shown above, ofdb.py almost works. Searching for titles works fine and umlauts will show up correctly in MythVideo. For some IDs, ofdb.py will give me errors, though:
laga@prometheus:~$ /usr/share/mythtv/mythvideo/scripts/ofdb.py -D "35767,Fluch-der-Karibik" # Traceback (most recent call last): # File "/usr/share/mythtv/mythvideo/scripts/ofdb.py", line 313, in search_data # doc = reader.fromString(content) # File "/usr/lib/python2.5/site-packages/_xmlplus/dom/ext/reader/HtmlLib.py", line 69, in fromString # return self.fromStream(stream, ownerDoc, charset) # File "/usr/lib/python2.5/site-packages/_xmlplus/dom/ext/reader/HtmlLib.py", line 27, in fromStream # self.parser.parse(stream) # File "/usr/lib/python2.5/site-packages/_xmlplus/dom/ext/reader/Sgmlop.py", line 57, in parse # self._parser.parse(stream.read()) # File "/usr/lib/python2.5/site-packages/_xmlplus/dom/ext/reader/Sgmlop.py", line 160, in finish_starttag # unicode(value, self._charset)) # File "/usr/lib/python2.5/site-packages/_xmlplus/dom/Element.py", line 170, in setAttributeNS # raise InvalidCharacterErr() # Traceback (most recent call last): # File "/usr/share/mythtv/mythvideo/scripts/ofdb.py", line 458, in <module> # main() # File "/usr/share/mythtv/mythvideo/scripts/ofdb.py", line 444, in main # search_data(options.data_search, options.ratings_from) # File "/usr/share/mythtv/mythvideo/scripts/ofdb.py", line 357, in search_data # print_exception(traceback.format_exc()) # File "/usr/share/mythtv/mythvideo/scripts/ofdb.py", line 53, in print_exception # comment_out(line) # File "/usr/share/mythtv/mythvideo/scripts/ofdb.py", line 41, in comment_out # print("# %s" % (str,)) # File "/usr/lib/python2.5/codecs.py", line 303, in write # data, consumed = self.encode(object, self.errors) # UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 26: ordinal not in range(128)This is without the wrapper script but with my patch from ticket #4916.