Discussion:
[Pykaraoke-discuss] Filename encoding on Linux
Matthew Schulkind
2010-12-05 21:25:59 UTC
Permalink
I've got an iso-8859-1 encoded filename on linux that has characters
above 127. When pykaraoke tries to scan it, it borks because the
filename can't be encoded in python's default filesystem encoding.

Basically, the problem comes down to a combination of:
1) I believe you can either install wxPython with unicode set or
non-unicode. I have unicode set, so all wx* functions return unicode
objects.
2) Due to #1, the directories to be scanned are unicode objects, and
when listdir() is called on a unicode, it attempts to decode all
filenames that are returned using the filesystem's default encoding,
but this causes an exception which currently goes unhandled and just
crashes pykaraoke.

I have attached a patch which converts all directory names to str
objects before scanning them. It makes thing work here. I have not
tested it out on other platforms, but I believe since I remain with
str objects, all encoding issues should be effectively bypassed, so it
should work everywhere.

-Matt
Kelvin Lawson
2010-12-16 21:45:41 UTC
Permalink
Hi Matt,

Thanks for taking the time to submit a patch.
Post by Matthew Schulkind
I have attached a patch which converts all directory names to str
objects before scanning them. It makes thing work here. I have not
tested it out on other platforms, but I believe since I remain with
str objects, all encoding issues should be effectively bypassed, so it
should work everywhere.
This is actually how PyKaraoke used to behave but users with unicode
directory names found that the str() conversion made the scan fail.
The previous implementation (the same as yours) compared to the new
can be seen here:
http://pykaraoke.cvs.sourceforge.net/viewvc/pykaraoke/pykaraoke/pykdb.py?r1=1.35&r2=1.36

This fixed scanning of folders with unicode names, and I confirmed it
to work on my machine, so I cannot reinstate the str() as is. Is there
an alternative patch that would work in both environments? Could it be
that the path needs to be encoded using the file system encoding for
example (fileList[i].encode(sys.getfilesystemencoding()))?

Thanks again for spending the time to dig into the sources, much appreciated.

Kelvin.

Loading...