A Long Overdue Post

So many things get lost to the edges. Too many commitments, too little motivation, etc. etc.

For many a moon I’ve been meaning to write up how Alex and I turned my QueerLCSH into a libGuide. I’m finally doing it!


The very first thing, of course, was the QueerLCSH itself which you can read ALL about over at that link. (Scroll past the updates for the original post and a link to the headings themselves)

Then I was alerted by Jessica Colbert that she had made a very cool LibGuide at her institution using my headings! Clicking any of the headings brings the user directly into the library catalog, performing a subject search.

Naturally I thought, “hey, why not do that here too?”

I talked to Alex about the idea and they were totally on board and excited. The question was how to turn a looooooong list of headings into a LibGuide without it being totally tedious?

As it usually is (to me), the answer was XSLT. Remember that I made my QueerLCSH by painstakingly (and tediously) downloading records 1-by-1 from id.loc.gov as RDF/XML (MADS and SKOS). That means that I still have all the raw material to work with and transform however I want.

Alex pointed out that even if I could generate the links easily, we’d have to load those links into the LibGuide one at a time, thereby returning us to tedium land. I reached out to Springshare Support, and learned that if you upload a set of links as a database, they can then flip them to be link assets. I don’t know why that power isn’t given to users of LibGuide software, but I was glad they were willing to do it for us.

At this point we were ready for me to generate the links. I wrote the following XSLT transformation:

<?xml version=”1.0″ encoding=”UTF-8″?>
<xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform&#8221;
    xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;        xmlns:ng=http://www.netanelganin.com&#8221;
    exclude-result-prefixes=“xs” version=“2.0”>

    <xsl:function name=“ng:norm” as=“xs:string”>
<xsl:param name=“arg” as=“xs:string”/>
<xsl:sequence select=“normalize-space(lower-case(translate($arg, ‘-.,’, ‘   ‘)))”/>

    <xsl:template match=“/rdf:RDF”>
            select=“madsrdf:Topic/madsrdf:authoritativeLabel | madsrdf:ComplexSubject/madsrdf:authoritativeLabel”>
<xsl:sort select=“ng:norm(.)”/>

    <xsl:template match=“madsrdf:authoritativeLabel”>





It’s pretty straight forward — there’s a function which I use to normalize the headings, it turns commas, periods and dashes into spaces. Then all the headings get sorted, and tossed into a big list. I’ve highlighted the most important part in bold. This is what’s called a ‘deep link’. I learned while doing this that you can’t just perform a subject search in Alma and then create a link based on that. It’ll decay eventually. You need to build a subject search using this deep link thing. Here’s some Ex Libris documentation on deep links.

As you can see there’s an “ng:norm(.)” buried in the middle of the deep link, that’s where the heading slides in. This particular link is a subject search, but it could easily be a browse search.

So this stylesheet processed all the headings and turned them into links which bring a user directly into our catalog. We tossed ’em all into an Excel spreadsheet, uploaded it to the LibGuide software, and then Springshare turned them into link assets!

Pretty cool, right?

Feel free to snag this XSLT and do the same for your institution, or if you’re interested in having something similar but aren’t sure how — let me know and I’ll try to help you out!

Some interesting things to consider:

  • We made no attempt to guarantee that any of the subject searches would actually return results. While that does mean patrons are presented with topics we don’t actually have any resources for, we couldn’t think of how to maintain the upkeep as new books entered our catalog. How would we know if new material matched a topic which we had previously removed?
  • We didn’t add any of the subject headings which were only added toLCSH for validation purposes, e.g. “African American gays–Fiction”, “Gay couples–Legal status, laws, etc” (in truth I’ve never really understood why they do that…)
  • We didn’t add any subdivisions unless the LGBT aspect was in that subdivision, i.e. “World War, 1939-1945–Participation, Gay” but not “Bisexuality — Religious aspects”. We felt that performing the subject search of the main topical term itself without the subdivisions would probably be sufficient.
  • If you do decide to include subdivisions, consider the pattern headings, ex. “Gay rights–Religious aspects–Baptists, [Catholic Church, etc.]” searching that subject string as is would be unfruitful because that isn’t how pattern headings look in the wild.
  • We opted not to include any of the Library of Congress Genre/Form Terms or Library of Congress Demographic Group Terms because in our ILS it is not possible for users to actually target a search to those fields. If it becomes possible, we’ll add ’em in!
  • Scope notes: keep ’em or not?



After the initial transformations were done, I stepped back from the project and Alex now maintains the LibGuide. They check the New LCSH each month and add any new relevant terms themselves.


Emflix – Part Eight – XML – Years, Boxes, Metametadata

Link to Part Seven

Today I’ll be discussing years, box-sets and some meta-metadata, the date created/updated

I chose to use a single year for films: the earliest year I could find that it was released. I handled tv shows differently, but that’ll come up later.

Heading13 contained every 500 field mushed together as anyone who’s cataloged DVDs knows we usually type some phrase like “DVD of the original motion picture released in 1999”, transcribed from the back of the box. Bearing that in mind, I applied this template

   <xsl:template match="Heading13">
         <xsl:variable name="Heading13Data" select="normalize-space(.)"/>
         <xsl:variable name="yearFind" as="xs:string*"
             select="('motion picture in ','release of the ','released in ')"/>
             <xsl:variable name="yearFound">
                 <xsl:for-each select="1 to count($yearFind)">
                     <xsl:variable name="x" select="."/>
                         select="substring(substring-after($Heading13Data, $yearFind[$x]),1,4)"/>
                 <xsl:when test="matches($yearFound, '(19|20)\d{2}')">
                     <xsl:value-of select="$yearFound"/>
                     <xsl:value-of select="normalize-space(../../Heading8)"/>

First I assign all the data (normalizing the space) in Heading 13 to a variable. Next there’s a variable created of the trigger phrases which often indicate a year. As with the writer/director templates, I added these as a I encountered them. Then it counting from 1 to the number of trigger phrases (3 in this case) it will take the four characters found after the trigger phrase and assign them to a variable called ‘yearFound’.

The choose statement then tests if those four characters satisfy a regex match of 19 or 20 followed by two digits. If it does match then that is the year it outputs, if it doesn’t then it outputs whatever was in Heading8 (which is the second of the two date fields in the fixed field.

Box-sets  were a real bugaboo for me. They were much more work and they ended up (or at least, the I way I handled them) breaking a data rule. I had decided to break up box-sets and give equal access and treatment to every movie contained therein. Remember way back in part three when I said I couldn’t use unique IDs after a while? This is why. I had been using the bib ID from the catalog as the unique ID, but of course a box set that had been cataloged as such would have a single ID. That meant that every movie in the set would have the same ID. Bad move, Ganin.

It also meant that I had to spend loads more time manually inputting data because as anyone who’s cataloged a box-set knows…you end up cramming a lot into a single field. Most of my clever little templates would either give me a single entry for the first movie in the set, or not even that. I had to hop over to wikipedia or another source and find the directors/writers/years/titles for all the other movies in the set. Sigh. This is one of the few things that I’m not super sure what I would’ve done differently if I were starting over. I think ultimately, it was beneficial (particularly because this was a visual display) to have the individual movies separated, which I think would just have to mean more work on my part.

As for that tasty bit of meta-metadata (not that I ever actually used it…)  — each media element had two attributes, “dateCreated” and “lastModified” seen here:

<media id="{Heading16}" dateCreated="{$date}" lastModified="{$date}">

They were initially set to the same value and then I would update the lastModified date when I changed something manually. There’s a generally scoped variable giving their data, seen here:

    <xsl:variable name=“date” select=“substring(string(current-date()), 1, 10)”/>

The weirdness is just because I only wanted DD-MM-YYY, but the current-date() function gives you LOTS more than that, so I first converted into a string and stripped anything after the first 10 characters.

Next time I’ll talk about foreign titles.

Emflix – Part Seven – XML – LC Special Topics, Language

Link to Part Six


I’m grouping today’s topics, the LC Special Topics and the language, together, because they both rely on the same thing — a set list of known possibilities.

                 <xsl:apply-templates select="Heading1"/>
                 <xsl:apply-templates select="Heading11"/>

The instructions are much simpler than the ones we’ve seen previously, just applying the data right from the XML, no special filtering or processing. Heading 1 was the “Display Call No.” field (doesn’t map to MARC, not actually sure where it was drawn from, maybe the 852 from MARC-Holdings?) and Heading11 is from the fixed field.

So here’re the matching templates (excerpted…because they’re long)

<xsl:template match="Heading1">
             <xsl:when test="contains(., '.A26 ')">Acting. Auditions</xsl:when>
             <xsl:when test="contains(., '.A3 ')">Adventure films</xsl:when>
             <xsl:when test="contains(., '.A43 ')">Africa</xsl:when>
             <xsl:when test="contains(., '.A45 ')">Alcoholism</xsl:when>
             <xsl:when test="contains(., '.A5 ')">Animals</xsl:when>
             <xsl:when test="contains(., '.A54 ')">Animation</xsl:when>
             <xsl:when test="contains(., '.A72 ')">Armed Forces</xsl:when>
             <xsl:when test="contains(., '.A73 ')">Art and the arts</xsl:when>
             <xsl:when test="contains(., '.A77 ')">Asian Americans</xsl:when>    
            <xsl:when test="contains(., '.W3 ')">War</xsl:when>
             <xsl:when test="contains(., '.W4 ')">Western films</xsl:when>
             <xsl:when test="contains(., '.W6 ')">Women</xsl:when>
             <xsl:when test="contains(., '.Y6 ')">Youth</xsl:when>
     <xsl:template match="Heading11" mode="language">
             <xsl:when test=". = 'ara'">Arabic</xsl:when>
             <xsl:when test=". = 'arc'">Aramaic</xsl:when>
             <xsl:when test=". = 'arm'">Armenian</xsl:when>
             <xsl:when test=". = 'art'">Artificial (Other)</xsl:when>
             <xsl:when test=". = 'und'">Undetermined</xsl:when>
             <xsl:when test=". = 'urd'">Urdu</xsl:when>
             <xsl:when test=". = 'wol'">Wolof</xsl:when>
             <xsl:when test=". = 'zul'">Zulu</xsl:when>
             <xsl:when test=". = 'zxx'">No linguistic content</xsl:when>
                 <xsl:value-of select="."/>

Get the idea? Just a simple ‘choose’ statement with a LOT of options. Drawn from Class Web (for the cutters) and from the MARC code list of languages (for the langs). Now I know what you’re thinking: but Netanel, aren’t you just Jurassic Park-ing?

Here I stop to tell a tale that for whatever reason…has stuck with me over the years. In the novel Jurassic Park, but not the movie, there’s an important scene were Ian Malcolm demonstrates that the animals must be breeding. The InGen scientists are sure that dinos are not breeding because their cameas are equipped with fancy counting technology and on a regular basis count the animals in the park. It always matches their pre-determined correct number of animals, ergo: no breeding.

See, but Malcolm realizes that the camera-counting thing is only counting the number that it assumes will be there. When he has them try to count for higher and higher numbers of animals, and then eventually any number, they get wildly different results. The take away lesson (for me anyway) has always been: if you assume you know what the results will be, you’ll find those results. You must remember to account for the ones you couldn’t have known would be in the mix.

So here’s how I ensured that I wasn’t Jurassic Park-ing. The first thing I did was  spit out a list of every cutter currently in use by the circulating DVD collection, then I looked each one up in Class Web to get the value and added them all to that Choose/When statement. But I also added them to this little number in my “error checker”.

<xsl:variable name="LCSpecialTopics"
       <xsl:for-each-group select="/root/row"
          group-by="substring-before(substring-after(Heading1,'1995.9.'),' ')">
      <xsl:if test="not(index-of($LCSpecialTopics,current-grouping-key()))">
             <xsl:value-of select="concat(current-grouping-key(),' isnt currently in my Circulation Stylesheet!')"/>

I wrote a similar one for the language codes. By running it each time I was given new movies by my boss, I could see if any of them had a cutter/lang which wasn’t accounted for in my pre-existing set!

As an aside, doing this did help me identify (and then correct) many invalid cutters, so that’s a bonus.

 Well that’ll do it for those two. Up next we’ll be talking:

  • Years
  • Box-Sets
  • Dates (of the records, i.e. meta-metadata)

Emflix – Part Six – XML – Directors, Writers

Link to Part Five


So, directors and writers, the people of the movie biz! (yes this would later extend to actors, but in my initial creation of Emflix I didn’t include actors, and had never intended to)

This posed a real challenge for me, as I didn’t have any of the 700 fields in the exports I was given. All I really had to go on was the 245, subfield c.

The instruction was:

<xsl:apply-templates select="Heading2" mode="director"/>
<xsl:apply-templates select="Heading2" mode="writer"/>

and the matching templates are as follows:

<xsl:template match="Heading2" mode="writer">   
             select="functx:extract(normalize-space(.),('written, produced, directed by ','screen play, ','writer and director, ','written for the screen by ','written for the screen, produced and directed by ','produced, written, and directed by ','written &amp; produced by ','written for the screen &amp; directed by ','written, directed and edited by ','script by ','screenplay and directed by ','written and edited by ','screen story by ','screen story &amp; dialogue by ',' written, edited &amp; directed by ','writers and directors, ','writer, ','written, produced &amp; directed by ','screenplay &amp; dialogues, ','writers, ','screenplay by ','written by ','written and directed by ','written for the screen and directed by ','screen play and dialogue by ','screenplay writer, ','screenplay, ','written, produced, and directed by ','written, produced and directed by ','written, directed and produced by ','screen play by ','written and produced by ','written &amp; directed by ','writer-director, ','script, ','screenplay, music, and direction by '),'writer')"
     <xsl:template match="Heading2" mode="director">
             select="functx:extract(normalize-space(.),('written, directed and edited by ','director/writer, ','directed &amp; produced by','directors, ','direction by ','directed by ','director, ','directed and written by ','directed and produced by ','directed, written and produced by ','direction, '),'director')"

Explanation: Because I didn’t have the 700 fields, I had to somehow get the data out of the 245, but I only wanted the names of writers or directors. I wrote a function which accepted three parameters

  1. The 245 field
  2. A series of 1 or more strings
  3. The name of the element I wanted to create (notice that the final parameter in the director template is ‘director’ and the final parameter in the writer template is ‘writer’.)

This is that function:

<xsl:function name="functx:extract" as="element()*">
         <xsl:param name="input" as="xs:string"/>
         <xsl:param name="markers" as="xs:string*"/>
         <xsl:param name="element-name" as="xs:string"/>
         <xsl:analyze-string select="$input" regex="({string-join($markers, '|')})([^;]+)(\s;|$)">
                 <xsl:analyze-string select="regex-group(2)"
                         <xsl:element name="{$element-name}">
                             <xsl:if test="matches(regex-group(1),'^\w+\s\w+$')">
                                 <xsl:attribute name="sort">
                                     <xsl:value-of select="string-length(substring-before(regex-group(1),' ')) + 2"/>
                 <xsl:element name="{$element-name}"/>

Complicated, right?! Lemme break it down.

The function call performs in discrete steps:

  1. It joins together all those markers passed in the function call (I added each one as I encountered it in an actual movie record) separated by a vertical bar (which is the boolean ‘or’ in regex) as group 1, group 2 is one or more characters that are not a semicolon and group 3 is either a space following by a semicolon or the end of the string.
  2. It selects group 2 (which ideally is a person or people’s names!) up until “space semicolon” (the separator used in MARC 245s to separate roles)
  3. Of that string, it extracts everything up until the first comma, or first space-‘and’, or space-& or end of string. (Yes it repeats for multiple people’s names)
  4. Then, finally, it constructs an element of whatever the third parameter passed was and in addition it constructs the all important sort attribute for names that match the pattern of Characters-space-characters. I.e. it doesn’t match any people’s names that are 3 words. Those I had to add manually.
  5. Then finally it removes a trailing period, if there is one, using a function I wrote that is so heavily based on this one, that I’m leaving its creation as an exercise to the reader (jk if you want it i’d be happy to provide)


Some notes:

I started the markers with ones I just wrote myself “written by, ” “directed by, “, etc. But I found it was inefficient to try to make them up and just added them as I encountered them. Eventually I had so many which represented the vast majority of things that’d appear in a 245, I didn’t really need to worry about adding any more

The reason I only had the sort attribute added for two-word names is I couldn’t be sure in a three-word (or more) person which name would be the sort name! So those I had to do manually, after looking them up.

I know you’re probably wondering why I removed a trailing periods the final step rather than just excluding it from the matching strings in the regex! Well at first I had been doing that…until I encountered someone’s name with a period in it. Samuel L. Jackson, D.L. Hughley, etc. so it ended up being better to just grab the whole name, and if it had a period at the end, strip it there. Of course, the thing about names is that you never really can predict every variation or situation. I always recommend visually inspecting the work before moving on.


Whew! That was a complex one! Next time we’re going to look at the LC Special Topics and the Language. Stay tuned!

Up to date Headings

Original Post

I saw this tweet a few days ago, and thought to myself — challenge accepted!

Here’s a very up-to-date and comprehensive subject heading guide for queer folks in the LCSH/LCGFT/LCDGT

If all you want is the link, have at it. If you’re interested in a bit more of the ways and means and hows — here’s more of that:

  1. I went to id.loc.gov and searched the following terms:
    1. Sexual minorities (and minority), queer, lesbian, gay, gender, orientation, intersex, transgender, transexual, and bisexual.
  2. Downloaded each relevant record (some hits with those terms were not) as RDF/XML (MADS and SKOS)
  3. Created a master RDF/XML file, available for your perusal, here
  4. Wrote an XSLT to make this (Class Web inspired) visual display, available for your perusal, here


Emflix – Part Five – XML – Titles

Link to Part Four

Here’s where I introduce my big fancy proudest XSLT that I wrote for this project (and I wrote many). I’ve been talking a big game about how I wanted to transform the data from the catalog into the data of my structure, with as little copying/pasting as possible. There would inevitably still be lots of copying/pasting (all the genres taken from Netflix, for instance) — but I wanted to save myself as much time as possible.

So here goes, let’s take it a bit at a time.

Let’s talk titles! As we all know, in a MARC 245 field, we capitalize the first letter of the first word, and nothing else (save from proper nouns). That’s not how I wanted my interface to look, it’s not how titles are really presented anywhere besides a library catalog.

Learn From My Mistakes

There was no reason to store the data this way! If I wanted to present the data with a more traditional title capitalization, I could’ve left that to the XSLT which turned my structure into the user-visible interface.

     <xsl:apply-templates select="Heading3"/>

This instruction is going to  put whatever data is stored in “Heading3” in a “title” element, so let’s look at the template instruction to see what happens to it.

 <xsl:template match="Heading3">
    <xsl:variable name="title"
    select="functx:substring-before-if-ends-with(functx:substring-befo    re-if-ends-with((normalize-space(.)),' /'), '.')"/>
     select="functx:capitalize-first(string-join(for $x in tokenize($title,'\s') return functx:titleCase($x),' '))"/>

Follow me here…using some functions from http://www.xqueryfunctions.com/, the template first creates a variable named “title” and stores the full value (after normalizing the space) of the title in that variable but stripping a ‘ /’ and then a period, if they are the terminal character. Then it tokenizes the title by word, runs each token through my own little function called “titleCase”, joins the tokens back together, and then capitalizes the first letter.

“Heading3”, I believe, was created from MARC 245 subfields ‘a’ and ‘b’. Again, this is what happens when the person creating the conversion, isn’t the same one who exported the data from the catalog. I was never 100% sure where things were coming from and had to make best-guesses. Why I had to remove trailing ‘ /’ but not ‘ : ‘ I can’t tell you.

Here’s the function I wrote to convert each token into my desired title case:

     <xsl:function name="functx:titleCase" as="xs:string">
         <xsl:param name="s" as="xs:string"/>
                 <xsl:value-of select="lower-case($s)"/>
                     select="concat(upper-case(substring($s, 1, 1)), lower-case(substring($s, 2)))"/>

It checks if each token (if converted to lower case) matched a string that I wanted to be lower case, a set of the little words which shouldn’t be capitalized in a title. I actually can’t, for the life of me, remember where the heck I got this list…that’s bad documentation on my part.

If it matches one of those strings, it returns the lower case value of that string, and begins processing the next token. If it doesn’t match, it falls back on the option of capitalizing the first letter (using some native sub-string functions) of the string.

That’s what happens to every title! Pretty neat, huh? I would then add a ‘sort’ attribute to the titles, essentially duplicating the MARC second indicator for a 245. I had to add 1 to it though, XSLT counts the first character as position “1”, not “position 0” as is more common in computer programming. Come to think of it…I really could’ve written a quick thing to add those ‘sort’ attributes automatically — just some kind of check for first token “the”, “a”, “an” kind of thing. Oh well! You live and learn. That’s why I’m writing this up, to talk through what I did, and realize what I could’ve/should’ve done.

That’s titles, when next we return we’ll talk the programmatic conversion of directors/writers.