Translating Character Codes from iTunes Playlist Exports for Android

I have a Perl script I use to take an iTunes playlist export text file and turn it into a DOS batch script for copying music to my Android phone’s microSDHC card, as well as creating an M3U playlist.  This may sound like a strange way to do things, but I have a VMware Ubuntu instance installed under Windows Vista and run my Perl scripts from there.  But, it’s easier to copy my music files from Vista when my Android phone is mounted, which makes the auto-generated DOS script a simple way to do things.

In order to make sure that files with greater than 7-bit character codes in their names get copied and played correctly, I had to do some interesting things.  There may be easier or smarter ways to do this, but at least this seems to work according to the 80/20 rule of time spent investigating versus results. =)

It looks like the iTunes playlist export files use the UCS-2 character set.  The upside of this is that each character has a two-byte representation that shows up as an extra 0x00 byte for each character.  I strip the nulls and end up with characters in ASCII representation.  For the M3U playlist, I convert the characters with more than seven bits to their corresponding UTF-8 two-byte representation.  For example, a lower-case “e” with an acute over it is 0xE9 coming from the iTunes file, after having its null stripped.  The UTF-8 representation is 0xC3A9.  After this conversion, programs under Android that read the file names in the M3U file and compare them to file names in the file system work just fine.

Now, to copy the files from my hard drive to the microSDHC card via a DOS batch script, I had to convert the ASCII character codes to what I have found is the CP437 character set—also known as Extended ASCII.  That’s because the DOS shell uses it for the file names.  So, in order to get comparisons and file moves to work via the batch script, the 8-bit character codes in file names had to be converted to it.  In the case of our lower-case “e” with an acute, 0xE9 becomes 0x82.  The file names for the files as they were copied to the microSDHC card were left in this format.  The files then are seen just fine by the Android OS and programs running under it.

I’m writing an Android program that uses these M3U playlist files to access music files I’ve transferred as described above.  As I investigated the character sets and representations, I found that the Android SDK’s Java presents file names as UTF-8 strings.  When my M3U playlist contained ASCII 8-bit characters, my file names would not match the file names as retrieved by Java.  Using the Eclipse debugger, 8-bit characters from the M3U file were shown as a small rectangle, meaning Java didn’t recognize them.  Using the getBytes() String method to dump a string to a byte array showed the character represented as 0xEF 0xBF 0xBD, which I’ve found is the UTF-8 representation of “I have no idea what this character is.” =)  Using the debugger in this way also led me to learn that Java doesn’t store the character bytes as unsigned short ints like C, which is what I’m most familiar with.  Java doesn’t have the concept of an unsigned int at all.  Thus, 0xEF 0xBF 0xBD showed up as three consecutive bytes with decimal displays of -17 -65 -67.  The bytes are in 2’s-complement, so subtracting their absolute values from 256 results in decimal 239 191 189, which in hex are the codes described above.

2 Responses to “Translating Character Codes from iTunes Playlist Exports for Android”

  1. Your article is an inspiration for me to discover more about this matter. I must confess your clarity diversified my views and I will forthwith grab your rss feed to remain up to date on any next articles you might put out. You merit thanks for a job well done!

  2. keith says:

    Thanks, Kelvin. I’m glad you found the article helpful!