Emflix – Part Eight – XML – Years, Boxes, Metametadata

Link to Part Seven

Today I’ll be discussing years, box-sets and some meta-metadata, the date created/updated

I chose to use a single year for films: the earliest year I could find that it was released. I handled tv shows differently, but that’ll come up later.

Heading13 contained every 500 field mushed together as anyone who’s cataloged DVDs knows we usually type some phrase like “DVD of the original motion picture released in 1999”, transcribed from the back of the box. Bearing that in mind, I applied this template

   <xsl:template match="Heading13">
         <xsl:variable name="Heading13Data" select="normalize-space(.)"/>
         <xsl:variable name="yearFind" as="xs:string*"
             select="('motion picture in ','release of the ','released in ')"/>
         <year>
             <xsl:variable name="yearFound">
                 <xsl:for-each select="1 to count($yearFind)">
                     <xsl:variable name="x" select="."/>
                     <xsl:value-of
                         select="substring(substring-after($Heading13Data, $yearFind[$x]),1,4)"/>
                 </xsl:for-each>
             </xsl:variable>
 
             <xsl:choose>
                 <xsl:when test="matches($yearFound, '(19|20)\d{2}')">
                     <xsl:value-of select="$yearFound"/>
                 </xsl:when>
                 <xsl:otherwise>
                     <xsl:value-of select="normalize-space(../../Heading8)"/>
                 </xsl:otherwise>
             </xsl:choose>
         </year>
     </xsl:template>

First I assign all the data (normalizing the space) in Heading 13 to a variable. Next there’s a variable created of the trigger phrases which often indicate a year. As with the writer/director templates, I added these as a I encountered them. Then it counting from 1 to the number of trigger phrases (3 in this case) it will take the four characters found after the trigger phrase and assign them to a variable called ‘yearFound’.

The choose statement then tests if those four characters satisfy a regex match of 19 or 20 followed by two digits. If it does match then that is the year it outputs, if it doesn’t then it outputs whatever was in Heading8 (which is the second of the two date fields in the fixed field.


Box-sets  were a real bugaboo for me. They were much more work and they ended up (or at least, the I way I handled them) breaking a data rule. I had decided to break up box-sets and give equal access and treatment to every movie contained therein. Remember way back in part three when I said I couldn’t use unique IDs after a while? This is why. I had been using the bib ID from the catalog as the unique ID, but of course a box set that had been cataloged as such would have a single ID. That meant that every movie in the set would have the same ID. Bad move, Ganin.

It also meant that I had to spend loads more time manually inputting data because as anyone who’s cataloged a box-set knows…you end up cramming a lot into a single field. Most of my clever little templates would either give me a single entry for the first movie in the set, or not even that. I had to hop over to wikipedia or another source and find the directors/writers/years/titles for all the other movies in the set. Sigh. This is one of the few things that I’m not super sure what I would’ve done differently if I were starting over. I think ultimately, it was beneficial (particularly because this was a visual display) to have the individual movies separated, which I think would just have to mean more work on my part.


As for that tasty bit of meta-metadata (not that I ever actually used it…)  — each media element had two attributes, “dateCreated” and “lastModified” seen here:

<media id="{Heading16}" dateCreated="{$date}" lastModified="{$date}">

They were initially set to the same value and then I would update the lastModified date when I changed something manually. There’s a generally scoped variable giving their data, seen here:

    <xsl:variable name=“date” select=“substring(string(current-date()), 1, 10)”/>

The weirdness is just because I only wanted DD-MM-YYY, but the current-date() function gives you LOTS more than that, so I first converted into a string and stripped anything after the first 10 characters.


Next time I’ll talk about foreign titles.

Advertisements

2 thoughts on “Emflix – Part Eight – XML – Years, Boxes, Metametadata

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s