Emflix – Part Nine – XML – Foreign titles

I want to start off this post by acknowledging that I haven’t really mentioned genres. Wasn’t part of the whole point of this project to display and index the films by more than a single genre? WASN’T IT?


And I will talk about them! This order of posts isn’t actually the order of how things ‘happened’. I was working on everything simultaneously, and making genre decisions (and revising those decisions) all the time. But the genre stuff is so self contained and yet so involved, that I want to talk about it on its own, without getting bogged down by the other pieces I pulled in.

Okay, so today — foreign titles!

While every movie that gets released in a foreign country (and here I mean foreign to the country of origin) may get another title, I didn’t want to necessarily spend the time tracking down everything that ‘Alien’ was called. My rule of thumb was

  • If it was produced and released simultaneously in more than one country (and under different titles), give each title
  • If it was a foreign (to the US) film but was primarily known in the US by an English title, give the original and its English title.

Here are some examples:

   <media id="1737472" dateCreated="2015-04-10" lastModified="2015-04-10">
       <title xml:lang="cmn" type="foreign">Chuntian de kuangxiang</title>
       <title>Rhapsody of Spring</title>
   <media id="1616575" dateCreated="2014-11-09">      
       <title>Pelle the Conqueror</title>
       <title type="foreign" xml:lang="da">Pelle Erobreren</title>
       <title xml:lang="sv" type="foreign">Pelle erovraren</title>

Notice that I included an attribute ‘type=foreign’ on all non English titles. This was my way of differentiating them from the primary display title. It’s also fairly other-ing and I regret having done it this way. For of course, these are not ‘foreign titles’. They are the native title, and it’s the English title which is foreign.

I also included a piece of ACTUAL STANDARD data! I know, what a concept! ‘xml:lang’ is used by the XML spec to indicate the language contained in that element. The codes are pulled from the Internet Assigned Numbers Authority Language Subtag Registry. I never ended up doing anything that used that data…but there it is.

Learn From My Mistakes

All the titles were searchable but only the first ‘foreign’ title was displayed. This means that you’d get hits, without seeing why. This is very bad information retrieval.

Example here, a search of ‘khamas’ gives you a hit, because the Arabic title ‘Khamas Kamirat Muhattamah’ is contained in the record, but not displayed to the user. Not great.


Check back for part 10 when I talk tv shows! (Mostly similar but some of its own weirdnesses)


