Emflix – Part Five – XML – Titles

Link to Part Four

Here’s where I introduce my big fancy proudest XSLT that I wrote for this project (and I wrote many). I’ve been talking a big game about how I wanted to transform the data from the catalog into the data of my structure, with as little copying/pasting as possible. There would inevitably still be lots of copying/pasting (all the genres taken from Netflix, for instance) — but I wanted to save myself as much time as possible.

So here goes, let’s take it a bit at a time.

Let’s talk titles! As we all know, in a MARC 245 field, we capitalize the first letter of the first word, and nothing else (save from proper nouns). That’s not how I wanted my interface to look, it’s not how titles are really presented anywhere besides a library catalog.

Learn From My Mistakes

There was no reason to store the data this way! If I wanted to present the data with a more traditional title capitalization, I could’ve left that to the XSLT which turned my structure into the user-visible interface.


 <title>
     <xsl:apply-templates select="Heading3"/>
 </title>

This instruction is going to  put whatever data is stored in “Heading3” in a “title” element, so let’s look at the template instruction to see what happens to it.

 <xsl:template match="Heading3">
    <xsl:variable name="title"
    select="functx:substring-before-if-ends-with(functx:substring-befo    re-if-ends-with((normalize-space(.)),' /'), '.')"/>
    <xsl:sequence
     select="functx:capitalize-first(string-join(for $x in tokenize($title,'\s') return functx:titleCase($x),' '))"/>
 </xsl:template>

Follow me here…using some functions from http://www.xqueryfunctions.com/, the template first creates a variable named “title” and stores the full value (after normalizing the space) of the title in that variable but stripping a ‘ /’ and then a period, if they are the terminal character. Then it tokenizes the title by word, runs each token through my own little function called “titleCase”, joins the tokens back together, and then capitalizes the first letter.

“Heading3”, I believe, was created from MARC 245 subfields ‘a’ and ‘b’. Again, this is what happens when the person creating the conversion, isn’t the same one who exported the data from the catalog. I was never 100% sure where things were coming from and had to make best-guesses. Why I had to remove trailing ‘ /’ but not ‘ : ‘ I can’t tell you.

Here’s the function I wrote to convert each token into my desired title case:

     <xsl:function name="functx:titleCase" as="xs:string">
         <xsl:param name="s" as="xs:string"/>
         <xsl:choose>
             <xsl:when
                 test="lower-case($s)=('a','aboard','about','above','absent','across','after','against','alongside','amid','amidst','among','amongst','an','and','around','as','aslant','astride','at','athwart','atop','barring','before','behind','below','beneath','beside','besides','between','beyond','but','by','despite','down','during','except','failing','following','for','from','in','inside','into','like','mid','minus','near','next','nor','notwithstanding','of','off','on','onto','opposite','or','out','outside','over','past','per','plus','regarding','round','save','since','so','than','the','through','throughout','till','times','to','toward','towards','under','underneath','unlike','until','up','upon','via','vs.','when','with','within','without','worth','yet')">
                 <xsl:value-of select="lower-case($s)"/>
             </xsl:when>
             <xsl:otherwise>
                 <xsl:value-of
                     select="concat(upper-case(substring($s, 1, 1)), lower-case(substring($s, 2)))"/>
             </xsl:otherwise>
         </xsl:choose>
     </xsl:function>

It checks if each token (if converted to lower case) matched a string that I wanted to be lower case, a set of the little words which shouldn’t be capitalized in a title. I actually can’t, for the life of me, remember where the heck I got this list…that’s bad documentation on my part.

If it matches one of those strings, it returns the lower case value of that string, and begins processing the next token. If it doesn’t match, it falls back on the option of capitalizing the first letter (using some native sub-string functions) of the string.

That’s what happens to every title! Pretty neat, huh? I would then add a ‘sort’ attribute to the titles, essentially duplicating the MARC second indicator for a 245. I had to add 1 to it though, XSLT counts the first character as position “1”, not “position 0” as is more common in computer programming. Come to think of it…I really could’ve written a quick thing to add those ‘sort’ attributes automatically — just some kind of check for first token “the”, “a”, “an” kind of thing. Oh well! You live and learn. That’s why I’m writing this up, to talk through what I did, and realize what I could’ve/should’ve done.

That’s titles, when next we return we’ll talk the programmatic conversion of directors/writers.

 

Advertisements

2 thoughts on “Emflix – Part Five – XML – Titles

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s