Emflix – Part Eight – XML – Years, Boxes, Metametadata

Link to Part Seven

Today I’ll be discussing years, box-sets and some meta-metadata, the date created/updated

I chose to use a single year for films: the earliest year I could find that it was released. I handled tv shows differently, but that’ll come up later.

Heading13 contained every 500 field mushed together as anyone who’s cataloged DVDs knows we usually type some phrase like “DVD of the original motion picture released in 1999”, transcribed from the back of the box. Bearing that in mind, I applied this template

   <xsl:template match="Heading13">
         <xsl:variable name="Heading13Data" select="normalize-space(.)"/>
         <xsl:variable name="yearFind" as="xs:string*"
             select="('motion picture in ','release of the ','released in ')"/>
         <year>
             <xsl:variable name="yearFound">
                 <xsl:for-each select="1 to count($yearFind)">
                     <xsl:variable name="x" select="."/>
                     <xsl:value-of
                         select="substring(substring-after($Heading13Data, $yearFind[$x]),1,4)"/>
                 </xsl:for-each>
             </xsl:variable>
 
             <xsl:choose>
                 <xsl:when test="matches($yearFound, '(19|20)\d{2}')">
                     <xsl:value-of select="$yearFound"/>
                 </xsl:when>
                 <xsl:otherwise>
                     <xsl:value-of select="normalize-space(../../Heading8)"/>
                 </xsl:otherwise>
             </xsl:choose>
         </year>
     </xsl:template>

First I assign all the data (normalizing the space) in Heading 13 to a variable. Next there’s a variable created of the trigger phrases which often indicate a year. As with the writer/director templates, I added these as a I encountered them. Then it counting from 1 to the number of trigger phrases (3 in this case) it will take the four characters found after the trigger phrase and assign them to a variable called ‘yearFound’.

The choose statement then tests if those four characters satisfy a regex match of 19 or 20 followed by two digits. If it does match then that is the year it outputs, if it doesn’t then it outputs whatever was in Heading8 (which is the second of the two date fields in the fixed field.


Box-sets  were a real bugaboo for me. They were much more work and they ended up (or at least, the I way I handled them) breaking a data rule. I had decided to break up box-sets and give equal access and treatment to every movie contained therein. Remember way back in part three when I said I couldn’t use unique IDs after a while? This is why. I had been using the bib ID from the catalog as the unique ID, but of course a box set that had been cataloged as such would have a single ID. That meant that every movie in the set would have the same ID. Bad move, Ganin.

It also meant that I had to spend loads more time manually inputting data because as anyone who’s cataloged a box-set knows…you end up cramming a lot into a single field. Most of my clever little templates would either give me a single entry for the first movie in the set, or not even that. I had to hop over to wikipedia or another source and find the directors/writers/years/titles for all the other movies in the set. Sigh. This is one of the few things that I’m not super sure what I would’ve done differently if I were starting over. I think ultimately, it was beneficial (particularly because this was a visual display) to have the individual movies separated, which I think would just have to mean more work on my part.


As for that tasty bit of meta-metadata (not that I ever actually used it…)  — each media element had two attributes, “dateCreated” and “lastModified” seen here:

<media id="{Heading16}" dateCreated="{$date}" lastModified="{$date}">

They were initially set to the same value and then I would update the lastModified date when I changed something manually. There’s a generally scoped variable giving their data, seen here:

    <xsl:variable name=“date” select=“substring(string(current-date()), 1, 10)”/>

The weirdness is just because I only wanted DD-MM-YYY, but the current-date() function gives you LOTS more than that, so I first converted into a string and stripped anything after the first 10 characters.


Next time I’ll talk about foreign titles.

Advertisements

Updating Headings

For those hoping that this post has something to with LC itself updating their headings…sorry, it’s not.

This is about how I try to update the local headings in my catalog.

Maybe in the Future-Of-Libraries-Era, all headings in all local catalogs will cascadingly update when the term is updated in LCs database. That day is not today. What that means is that as our records sit and sit and sit, statically unchanging, their headings can end up being painfully out of date, which has two big problems:

  1. They can be pretty offensive (though of course even newer ‘correct’ headings can be as well)
  2. They won’t co-locate with other resources of the same subject matter because they aren’t using the most up-to-date term.

Example:

World War, 1939-1945—Art and the war
UF World War, 1939-1945, in art [Former heading]

If people find the term “World War, 1939-1945, in art” and click on it to find other resources, they won’t find anything that has been produced since the heading has been changed, because they’ll all have the new heading.


 

Okay so that establishes why these are an issue, what am I doing about it?

First you have to find them!

There’s no “master list of every former heading” (that I know of) but here’s some technical ways to dig ’em up.

  • If you have the MADS of an LCSH, look for “<madsrdf:hasEarlierEstablishedForm>”
  • If you’re working from MARC21 or MARCXML, look for field 450 with a subfield w with a value of ‘nne’
  • Every heading list that comes out tells you if one has been changed, so that can be a good place to find the changed headings
  • Just guess and browse around!

It’s this last method that I’ve been using lately. I figured that terms for people would be changing often, and also wanted to ensure that my local catalog didn’t retain any particularly offensive terms.

The terms I started with were

  • Afro-Americans/Negroes
  • Gypsies

Which have been changed to

  • African-Americans
  • Romanies

respectively.

I searched our local catalog for LC subjects containing ‘Afro-Americans’ and found 1,885 records. I downloaded them as MARCXML and opened them in oXygen. I knew that I had to be careful because a simple “find/replace” would be way way overkill, and end up corrupting the data. If ‘Afro-Americans’ were in a title, or any other transcription field, it shouldn’t be overwritten. If the term appeared in a 600/610/611/630, it may or may not need to be overwritten because the name of the person/organization/meeting/title may not have changed.

So I first gathered together any LC subject heading which contained the term ‘Afro-Americans’

datafield[@tag = ‘650’][@ind2 = ‘0’]/subfield[starts-with(.,’Afro-Americans’)]

I had to make sure that before I replaced all these, I wouldn’t screw up the new heading. If you look at the list in Class Web that used to begin ‘Afro-Americans’ you’ll notice that most of them can be swapped out for ‘African Americans’ but not all:

Afro-Americans in the press
USE African Americans—Press coverage

Afro-Americans in business
USE African American businesspeople

Afro-Americans as consumers
USE African American consumers

(just as examples) should not be programmatically changed by swapping just the terms, or the new heading won’t be correct!

There are also instances of ‘Afro-Americans’ as a subdivision that I wanted to change, and I had to check Class Web again for those to be sure that those hadn’t changed as well.

Overall, I managed to change a LOT of headings, but it took a lot more care and consideration than I had originally anticipated. It’s much more complex than “find/replace”. But I’m glad that my local catalog will have better co-location and more up-to-date headings.

Emflix – Part Seven – XML – LC Special Topics, Language

Link to Part Six

 

I’m grouping today’s topics, the LC Special Topics and the language, together, because they both rely on the same thing — a set list of known possibilities.

      <LCSpecialTopics>
                 <xsl:apply-templates select="Heading1"/>
       </LCSpecialTopics>
       <language>
                 <xsl:apply-templates select="Heading11"/>
      </language>

The instructions are much simpler than the ones we’ve seen previously, just applying the data right from the XML, no special filtering or processing. Heading 1 was the “Display Call No.” field (doesn’t map to MARC, not actually sure where it was drawn from, maybe the 852 from MARC-Holdings?) and Heading11 is from the fixed field.

So here’re the matching templates (excerpted…because they’re long)

<xsl:template match="Heading1">
         <xsl:choose>
             <xsl:when test="contains(., '.A26 ')">Acting. Auditions</xsl:when>
             <xsl:when test="contains(., '.A3 ')">Adventure films</xsl:when>
             <xsl:when test="contains(., '.A43 ')">Africa</xsl:when>
             <xsl:when test="contains(., '.A45 ')">Alcoholism</xsl:when>
             <xsl:when test="contains(., '.A5 ')">Animals</xsl:when>
             <xsl:when test="contains(., '.A54 ')">Animation</xsl:when>
             <xsl:when test="contains(., '.A72 ')">Armed Forces</xsl:when>
             <xsl:when test="contains(., '.A73 ')">Art and the arts</xsl:when>
             <xsl:when test="contains(., '.A77 ')">Asian Americans</xsl:when>    
.......
            <xsl:when test="contains(., '.W3 ')">War</xsl:when>
             <xsl:when test="contains(., '.W4 ')">Western films</xsl:when>
             <xsl:when test="contains(., '.W6 ')">Women</xsl:when>
             <xsl:when test="contains(., '.Y6 ')">Youth</xsl:when>
         </xsl:choose>
     </xsl:template>
 
 
     <xsl:template match="Heading11" mode="language">
         <xsl:choose>
             <xsl:when test=". = 'ara'">Arabic</xsl:when>
             <xsl:when test=". = 'arc'">Aramaic</xsl:when>
             <xsl:when test=". = 'arm'">Armenian</xsl:when>
             <xsl:when test=". = 'art'">Artificial (Other)</xsl:when>
........
             <xsl:when test=". = 'und'">Undetermined</xsl:when>
             <xsl:when test=". = 'urd'">Urdu</xsl:when>
             <xsl:when test=". = 'wol'">Wolof</xsl:when>
             <xsl:when test=". = 'zul'">Zulu</xsl:when>
             <xsl:when test=". = 'zxx'">No linguistic content</xsl:when>
             <xsl:otherwise>
                 <xsl:value-of select="."/>
             </xsl:otherwise>
         </xsl:choose>
     </xsl:template>

Get the idea? Just a simple ‘choose’ statement with a LOT of options. Drawn from Class Web (for the cutters) and from the MARC code list of languages (for the langs). Now I know what you’re thinking: but Netanel, aren’t you just Jurassic Park-ing?

Here I stop to tell a tale that for whatever reason…has stuck with me over the years. In the novel Jurassic Park, but not the movie, there’s an important scene were Ian Malcolm demonstrates that the animals must be breeding. The InGen scientists are sure that dinos are not breeding because their cameas are equipped with fancy counting technology and on a regular basis count the animals in the park. It always matches their pre-determined correct number of animals, ergo: no breeding.

See, but Malcolm realizes that the camera-counting thing is only counting the number that it assumes will be there. When he has them try to count for higher and higher numbers of animals, and then eventually any number, they get wildly different results. The take away lesson (for me anyway) has always been: if you assume you know what the results will be, you’ll find those results. You must remember to account for the ones you couldn’t have known would be in the mix.

So here’s how I ensured that I wasn’t Jurassic Park-ing. The first thing I did was  spit out a list of every cutter currently in use by the circulating DVD collection, then I looked each one up in Class Web to get the value and added them all to that Choose/When statement. But I also added them to this little number in my “error checker”.

<xsl:variable name="LCSpecialTopics"
  select="('A1','A26','A3','A43','A45','A5','A54','A72','A73','A77','B53','B55','B76','B87','C39','C45','C5113','C512','C513','C543','C55','C58','C85','D37','D4','D55','D6','D68','D78','E44','E79','E96','F35','F36','F54','F56','F67','G26','G3','G57','H3','H34','H5','H53','H55','H6','I48','I55','I72','J3','J6','J8','J87','L28','L37','L5','L6','M27','M3','M45','M46','M463','M6','M65','M86','N36','N4','O8','P44','P57','P6','P67','P68','P7','R4','R63','S24','S26','S284','S297','S45','S55','S557','S6','S62','S655','S67','S68','S74','S76','S8','S85','S87','T4','T46','T5','T69','T75','V3','V44','W3','W4','W6','Y6')"/>
    <ul>
       <xsl:for-each-group select="/root/row"
          group-by="substring-before(substring-after(Heading1,'1995.9.'),' ')">
      <xsl:if test="not(index-of($LCSpecialTopics,current-grouping-key()))">
          <li>
             <xsl:value-of select="concat(current-grouping-key(),' isnt currently in my Circulation Stylesheet!')"/>
          </li>
      </xsl:if>
      </xsl:for-each-group>

I wrote a similar one for the language codes. By running it each time I was given new movies by my boss, I could see if any of them had a cutter/lang which wasn’t accounted for in my pre-existing set!

As an aside, doing this did help me identify (and then correct) many invalid cutters, so that’s a bonus.

 Well that’ll do it for those two. Up next we’ll be talking:

  • Years
  • Box-Sets
  • Dates (of the records, i.e. meta-metadata)

Emflix – Part Six – XML – Directors, Writers

Link to Part Five

 

So, directors and writers, the people of the movie biz! (yes this would later extend to actors, but in my initial creation of Emflix I didn’t include actors, and had never intended to)

This posed a real challenge for me, as I didn’t have any of the 700 fields in the exports I was given. All I really had to go on was the 245, subfield c.

The instruction was:

<xsl:apply-templates select="Heading2" mode="director"/>
<xsl:apply-templates select="Heading2" mode="writer"/>

and the matching templates are as follows:

<xsl:template match="Heading2" mode="writer">   
         <xsl:sequence
             select="functx:extract(normalize-space(.),('written, produced, directed by ','screen play, ','writer and director, ','written for the screen by ','written for the screen, produced and directed by ','produced, written, and directed by ','written &amp; produced by ','written for the screen &amp; directed by ','written, directed and edited by ','script by ','screenplay and directed by ','written and edited by ','screen story by ','screen story &amp; dialogue by ',' written, edited &amp; directed by ','writers and directors, ','writer, ','written, produced &amp; directed by ','screenplay &amp; dialogues, ','writers, ','screenplay by ','written by ','written and directed by ','written for the screen and directed by ','screen play and dialogue by ','screenplay writer, ','screenplay, ','written, produced, and directed by ','written, produced and directed by ','written, directed and produced by ','screen play by ','written and produced by ','written &amp; directed by ','writer-director, ','script, ','screenplay, music, and direction by '),'writer')"
         />
     </xsl:template>
 
     <xsl:template match="Heading2" mode="director">
         <xsl:sequence
             select="functx:extract(normalize-space(.),('written, directed and edited by ','director/writer, ','directed &amp; produced by','directors, ','direction by ','directed by ','director, ','directed and written by ','directed and produced by ','directed, written and produced by ','direction, '),'director')"
         />
     </xsl:template>

Explanation: Because I didn’t have the 700 fields, I had to somehow get the data out of the 245, but I only wanted the names of writers or directors. I wrote a function which accepted three parameters

  1. The 245 field
  2. A series of 1 or more strings
  3. The name of the element I wanted to create (notice that the final parameter in the director template is ‘director’ and the final parameter in the writer template is ‘writer’.)

This is that function:

<xsl:function name="functx:extract" as="element()*">
         <xsl:param name="input" as="xs:string"/>
         <xsl:param name="markers" as="xs:string*"/>
         <xsl:param name="element-name" as="xs:string"/>
         <xsl:analyze-string select="$input" regex="({string-join($markers, '|')})([^;]+)(\s;|$)">
             <xsl:matching-substring>
                 <xsl:analyze-string select="regex-group(2)"
                     regex="([^,]+?)(((\sand|,|\s&amp;)\s)|$)">
                     <xsl:matching-substring>
                         <xsl:element name="{$element-name}">
                             <xsl:if test="matches(regex-group(1),'^\w+\s\w+$')">
                                 <xsl:attribute name="sort">
                                     <xsl:value-of select="string-length(substring-before(regex-group(1),' ')) + 2"/>
                                 </xsl:attribute>
                             </xsl:if>
                             <xsl:value-of
                                 select="functx:substring-before-if-ends-with(normalize-space(regex-group(1)),'.')"/>
                         </xsl:element>
                     </xsl:matching-substring>
                 </xsl:analyze-string>
             </xsl:matching-substring>
             <xsl:fallback>
                 <xsl:element name="{$element-name}"/>
             </xsl:fallback>
         </xsl:analyze-string>
     </xsl:function>

Complicated, right?! Lemme break it down.

The function call performs in discrete steps:

  1. It joins together all those markers passed in the function call (I added each one as I encountered it in an actual movie record) separated by a vertical bar (which is the boolean ‘or’ in regex) as group 1, group 2 is one or more characters that are not a semicolon and group 3 is either a space following by a semicolon or the end of the string.
  2. It selects group 2 (which ideally is a person or people’s names!) up until “space semicolon” (the separator used in MARC 245s to separate roles)
  3. Of that string, it extracts everything up until the first comma, or first space-‘and’, or space-& or end of string. (Yes it repeats for multiple people’s names)
  4. Then, finally, it constructs an element of whatever the third parameter passed was and in addition it constructs the all important sort attribute for names that match the pattern of Characters-space-characters. I.e. it doesn’t match any people’s names that are 3 words. Those I had to add manually.
  5. Then finally it removes a trailing period, if there is one, using a function I wrote that is so heavily based on this one, that I’m leaving its creation as an exercise to the reader (jk if you want it i’d be happy to provide)

 

Some notes:

I started the markers with ones I just wrote myself “written by, ” “directed by, “, etc. But I found it was inefficient to try to make them up and just added them as I encountered them. Eventually I had so many which represented the vast majority of things that’d appear in a 245, I didn’t really need to worry about adding any more

The reason I only had the sort attribute added for two-word names is I couldn’t be sure in a three-word (or more) person which name would be the sort name! So those I had to do manually, after looking them up.

I know you’re probably wondering why I removed a trailing periods the final step rather than just excluding it from the matching strings in the regex! Well at first I had been doing that…until I encountered someone’s name with a period in it. Samuel L. Jackson, D.L. Hughley, etc. so it ended up being better to just grab the whole name, and if it had a period at the end, strip it there. Of course, the thing about names is that you never really can predict every variation or situation. I always recommend visually inspecting the work before moving on.

 

Whew! That was a complex one! Next time we’re going to look at the LC Special Topics and the Language. Stay tuned!

Inconsistency in LGBTQ Terms

As you probably saw in the last post, I’ve spent a bunch of time recently working with all the LCSH that are about, or related to, queer folks. There are many terms that aren’t accurate, in-use by people, or even available.

But despite those oversights, willful or otherwise, an oddity that I noticed was an inconsistency about the use of the term ‘gay’.

There’s a term, ‘Gays’ (whose preferred heading used to be ‘Homosexuals’), which is a BT of ‘Gay men’ as well as ‘Lesbians’ (which itself is a UF for ‘Gay women’). So it would seem that under LCSH ‘Gays’ is a non-gender specific (though binary, as LCSH has no terms as of yet to express any people outside of the gender binary) meant to encompass all gay people.

This usage isn’t unheard of at all outside LCSH, indeed I used to hear ‘gay marriage’ pretty often until ‘same-sex marriage’ became more prevalent. I assume a healthy dose of societal-misogyny and sexism worked to elevate the term traditionally used for men, to be the term used for both men and women.

But the problem is how inconsistent LCSH is about this — consider the following:

Gays–Nazi persecution (May Subd Geog)
UF   Gay Holocaust
Gay men–Nazi persecution
Holocaust, Gay
Nazi persecution of gay men
Nazi persecution of gays

‘Gays’ here is standing in for ‘Gay men’!

There’s also several headings beginning with ‘Gay and lesbian’ ex:

‘Gay and lesbian studies’

‘Gay and lesbian dance parties’

So which is it, LCSH? Does ‘gay’ encompass both gay men, and lesbians? Or is ‘gay’ a shorthand for ‘gay men’?

Up to date Headings

Original Post

I saw this tweet a few days ago, and thought to myself — challenge accepted!

Here’s a very up-to-date and comprehensive subject heading guide for queer folks in the LCSH/LCGFT/LCDGT

If all you want is the link, have at it. If you’re interested in a bit more of the ways and means and hows — here’s more of that:

  1. I went to id.loc.gov and searched the following terms:
    1. Sexual minorities (and minority), queer, lesbian, gay, gender, orientation, intersex, transgender, transexual, and bisexual.
  2. Downloaded each relevant record (some hits with those terms were not) as RDF/XML (MADS and SKOS)
  3. Created a master RDF/XML file, available for your perusal, here
  4. Wrote an XSLT to make this (Class Web inspired) visual display, available for your perusal, here

Update 2017-06-30

  • Male homosexuality in the theater

Came from new LCSH list 1705


Update 2017-03-18

  • Intersex people [LCDGT]
  • Parents of transgender people [LCDGT]
  • Conversion therapy patients [LCDGT]

So these three are actually older LCDGT headings, but id.loc.gov hadn’t been updating for months and months! I emailed Janis Young and she got it sorted out, so I was finally able to add these back-catalog headings. Many thanks to her.


Update 2017-01-03

  • Museums and sexual minorities
  • Sexual minorities’ writings, Australian
  • Gay actors–United States

First two came from new LCSH: List 1611, the third I just noticed!


Update 2016-10-27

  • Neopagan gays
  • Stonewall National Monument (New York, N.Y.)
  • Hispanic American gay men
  • Hispanic American bisexual men
  • Bisexuality and education
  • Bisexual men–Relations with women

all came from new LCSH: List 1609


Update 2016-09-20

  • Gay musicologists

Natch, came from new LCSHList 1608


Update 2016-08-28

  • Discrimination against intersex people
  • Female-to-male transsexuals in art
  • Intersex people [updated]

That last one, Intersex people, had already been in LCSH, but they added a couple new UFs, so I updated mine.

As per uzh, they came from new LCSHList 1607


Update 2016-07-21

  • Asexual people
  • Asexuality (Sexual orientation)
  • Gay detectives

All three came from the newest LCSH: List 1606


Update 2016-06-11

  • Cisgender people
  • African American sexual minorities
  • African American bisexual women

All three came from the newest LCSH: List 1605


Update 2016-05-23

  • Parents of transgender children
  • Same-sex marriage (Islamic law)
  • Sexual minorities (Islamic law)
  • Drag shows

Drag shows is another Alex found, and the other three are from the newest LCSH: List 1604


Update 2016-05-13

Look, I made it ma! These are newly added headings that I found myself — totally missed ’em, but they’re in there now.

  • Conflict of laws–Same-sex marriage
  • Children of same-sex parents
  • Same-sex marriage in literature
  • Same-sex parents
  • Same-sex marriage in art
  • Same-sex marriage–Law and legislation–United States
  • Same-sex marriage–Religious aspects
  • Same-sex marriage–Law and legislation
  • Same-sex marriage–United States
  • Same-sex marriage–Religious aspects–Buddhism, [Christianity, etc.]
  • Same-sex marriage–Religious aspects–Baptists, [Catholic Church, etc.]

Update 2016-05-06

Alex pointed me towards ‘Polari’ another good term I was missing, but I then realized that it was actually an NT of a term already on my list ‘Gay men–Language’. So I added to my error checkers a little code to spit out to any NT that isn’t on my list. By my reckoning, a given BT may not necessarily need to be on the list itself (e.g. ‘Bisexual parents’ has a BT of ‘Parents’) but every NT should be on the list.

So to that end, I also ended up adding:

  • Polari
  • African American bisexuals
  • Pacific Islander American bisexuals
  • Asian American bisexuals
  • Transvestites
  • Leather bars

Update 2016-05-04

Frankly, everyone who isn’t Alex is doing a real bad job of identifying ones I’ve missed!

  • Handkerchief codes
  • Male impersonators
  • Male impersonators in motion pictures
  • Female impersonators
  • Female impersonators in motion pictures
  • Female impersonators on television
  • Homomonument (Amsterdam, Netherlands)
  • Sexual orientation in art
  • Stonewall Riots, New York, N.Y., 1969
  • Androgyny (Psychology) in literature
  • Androgyny (Psychology)
  • Androgyny (Psychology) in art
  • Androgyny (Psychology)–Religious aspects–Buddhism, [Christianity, etc.]
  • Androgyny (Psychology)–Religious aspects

Update 2016-05-03

Another set of missed headings, courtesy of trusty Alex

  • Homophobia in high schools
  • Homophobia in anthropology
  • Homophobia in psychoanalysis
  • Homophobia in children
  • Homophobia in sports
  • Homophobia in art
  • Homophobia in literature
  • Homophobia in schools
  • Homophobia in medicine
  • Homophobia in social work
  • Homophobia in child welfare
  • Homophobia in higher education
  • Homophobia in medical care
  • Homophobia in physical education
  • Homophobia in the military
  • Homophobia in the workplace
  • Homophobia in gerontology
  • Homophobia–United States
  • Homophobia–Religious aspects–Buddhism, [Christianity, etc.]
  • Homophobia–Religious aspects
  • Homophobia–Religious aspects–Baptists, [Catholic Church, etc.]
  • Homophobia–Law and legislation
  • Homophobia–Press coverage

Update 2016-04-30

Thanks again to Alex for their continued noticing of headings I’ve missed! Let’s be real — I should’ve been using truncation when I did my initial searches. Rookie mistake, Ganin, rookie mistake.

I’ve now added:

  • Intersexuality in literature
  • Intersexuality in children
  • Intersexuality in art
  • Intersexuality–Mythology

Update 2016-04-22

Added “LGBT History Month” as it was in the most recent New LCSH


Update 2016-04-20

Big update!

So first off — a tremendous thank-you to my wonderful and observant colleague Alex who noticed something missing from my headings: ‘Astrology and homosexuality’. How could it be?! I’d been so careful! So let’s revisit my original process:

I searched Sexual minorities (and minority), queer, lesbian, gay, gender, orientation, intersex, transgender, transexual, and bisexual at id.loc.gov and grabbed every relevant term from the search. If any of those words appeared in a 150 or 450, I nabbed it.

But did you see what I didn’t search? That’s right — ‘Homosexuality’. In information retrieval terms this is precision and recall. I visually inspected each term to make sure they weren’t false positives (recall) but I wasn’t wide enough with my initial search terms to get perfect precision. I missed many relevant results.

Because this actually goes deeper than not having searched ‘homosexuality’. I searched ‘Homosexual’ and got still more that I needed. I searched ‘Bisexuality‘ and came up with one that wasn’t retrieved on a search for ‘Bisexual’. I searched ‘Lesbianism’ and found still more. So what’s important to note (for both me and others) is that these aren’t ALL the terms, it’s just the 862 I could find.

So if you’ve been maintaining your own list somewhere — and need to know which are the new terms, here they are:

  • Homosexuality
  • Homosexuality on radio
  • Homosexuality and art
  • Homosexuality and dance
  • Homosexuality and literature
  • Homosexuality and motion pictures
  • Homosexuality and the arts
  • Homosexuality and popular music L
  • Homosexuality and theater
  • Homosexuality and music
  • Astrology and homosexuality
  • Psychoanalysis and homosexuality
  • Male homosexuality in art
  • Socialism and homosexuality
  • Male homosexuality in literature
  • Homosexuality and architecture
  • Homosexuality and television
  • Homosexuality and education
  • Children and homosexuality
  • National socialism and homosexuality
  • Male homosexuality
  • Male homosexuality in motion pictures
  • Bible and homosexuality
  • Male homosexuality in music
  • Homosexuality in art
  • Homosexuality in music
  • Homosexuality in motion pictures
  • Homosexuality in dance
  • Homosexuality in video games
  • Homosexuality in opera
  • Homosexuality in literature
  • Homosexuality in the Bible
  • Homosexuality (Canon law)
  • Homosexuality in the theater
  • Homosexuality in animals
  • Homosexuality in the workplace
  • Homosexuality–Bibliography
  • Homosexuality–Netherlands
  • Male homosexuality–Religious aspects
  • Homosexuality–Philosophy
  • Homosexuality–Religious aspects–Judaism
  • Homosexuality–Religious aspects–Buddhism, [Christianity, etc.]
  • Homosexuality–Genetic aspects
  • Homosexuality and literature–Great Britain–History–19th century
  • Homosexuality–Literary collections
  • Homosexuality–Psychological aspects
  • Homosexuality–Social aspects
  • Homosexuality–Fiction
  • Homosexuality and literature–English-speaking countries
  • Homosexuality and literature–United States–History–20th century
  • Male homosexuality–Psychological aspects
  • Homosexuality–United States
  • Homosexuality–Biblical teaching
  • Homosexuality–Law and legislation–United States
  • Homosexuality–Religious aspects–Catholic Church
  • Male homosexuality–Mythology
  • Homosexuality–History
  • Homosexuality–Periodicals
  • Male homosexuality–Religious aspects–Buddhism, [Christianity, etc.]
  • Homosexuality–Religious aspects–Baptists, [Catholic Church, etc.]
  • Homosexuality–Moral and ethical aspects
  • Homosexuality and literature–France
  • Homosexuality–Great Britain
  • Homosexuality–Moral and ethical aspects
  • Homosexuality and education–United States
  • Homosexuality and literature–United States
  • Homosexuality–Religious aspects–Christianity
  • Male homosexuality–United States
  • Homosexuality–Law and legislation
  • Homosexuality–Religious aspects
  • Homosexuality–Mythology
  • Homosexuality–Folklore
  • Male prostitution
  • Bisexuality
  • Lesbianism in motion pictures
  • Lesbianism in literature
  • Lesbianism on television
  • Lesbianism in art
  • Lesbianism in opera
  • Lesbianism–History
  • Lesbianism–Religious aspects–Baptists, [Catholic Church, etc.]
  • Lesbianism–United States
  • Lesbianism–History–To 500
  • Lesbianism–Religious aspects
  • Lesbianism–Religious aspects–Buddhism, [Christianity, etc.]

Update 2016-03-10

Added “Two-spirit people in literature” as it was in the most recent New LCSH


Update 2016-02-20

  • Added two more terms (as they were in the most recent New LCSH)
    • Sexual minority veterans
    • Gay veterans
  • Updated the BT for transgender veterans, to Sexual minority veterans (as per New LCSH)

In so doing, I discovered a mistake! There were about 25 or so subject headings which did not have an attribute of @rdf:about in the madsrdf:Topic of madsrdf:hasBroaderAuthority. Without this attribute, the link constructed in the BT wouldn’t work. I manually went through and added those attributes, so you may cease your panicking.


Update 2016-01-01

Changed the links to its permanent home at my new website


Update: 2015-11-21

  • I finished all the additional 10 pages (about 200 terms!) under “Gay” – and also added any term which had an RT in this list, but the term itself wasn’t on the list. (if that made sense…)
  • I added the ‘May Subd Geog’ to each term that can be, and also ‘Former heading’ to those UFs which are formerly authorized headings, rather than standard variant labels.
  • I added a bit of statistical info to the beginning

Couple other notes [from original posting]

  • I decided to code it so that if the BT/NTs were in this list, they’d be anchor links, but if they weren’t it’d take you to their record at id.loc.gov
  • When you click an anchor, I wrote in a yellow highlighting effect, I like it, what do you think?
  • I didn’t display the “Subd Geog” status, do folks want that? It’s easy enough to add in, all the data is still there
  • I made the un-authorized forms italics to help them stand back a bit, whereas Class Web has them normal-style. Thoughts?

I’d love to hear feedback!

  • is this useful for anyone?
  • Any terms I missed
  • Any terms I should’ve avoided? (while obviously side-stepping the fact that many of these terms aren’t very good, they’re the ones in the vocabulary at the moment)

N.B. I’m still missing about 10 pages of various Gay [profession] but I’ll finish them up soon (there are TONS of these…). I think that there’s enough here that you’ll get the idea

Emflix – Part Five – XML – Titles

Link to Part Four

Here’s where I introduce my big fancy proudest XSLT that I wrote for this project (and I wrote many). I’ve been talking a big game about how I wanted to transform the data from the catalog into the data of my structure, with as little copying/pasting as possible. There would inevitably still be lots of copying/pasting (all the genres taken from Netflix, for instance) — but I wanted to save myself as much time as possible.

So here goes, let’s take it a bit at a time.

Let’s talk titles! As we all know, in a MARC 245 field, we capitalize the first letter of the first word, and nothing else (save from proper nouns). That’s not how I wanted my interface to look, it’s not how titles are really presented anywhere besides a library catalog.

Learn From My Mistakes

There was no reason to store the data this way! If I wanted to present the data with a more traditional title capitalization, I could’ve left that to the XSLT which turned my structure into the user-visible interface.


 <title>
     <xsl:apply-templates select="Heading3"/>
 </title>

This instruction is going to  put whatever data is stored in “Heading3” in a “title” element, so let’s look at the template instruction to see what happens to it.

 <xsl:template match="Heading3">
    <xsl:variable name="title"
    select="functx:substring-before-if-ends-with(functx:substring-befo    re-if-ends-with((normalize-space(.)),' /'), '.')"/>
    <xsl:sequence
     select="functx:capitalize-first(string-join(for $x in tokenize($title,'\s') return functx:titleCase($x),' '))"/>
 </xsl:template>

Follow me here…using some functions from http://www.xqueryfunctions.com/, the template first creates a variable named “title” and stores the full value (after normalizing the space) of the title in that variable but stripping a ‘ /’ and then a period, if they are the terminal character. Then it tokenizes the title by word, runs each token through my own little function called “titleCase”, joins the tokens back together, and then capitalizes the first letter.

“Heading3”, I believe, was created from MARC 245 subfields ‘a’ and ‘b’. Again, this is what happens when the person creating the conversion, isn’t the same one who exported the data from the catalog. I was never 100% sure where things were coming from and had to make best-guesses. Why I had to remove trailing ‘ /’ but not ‘ : ‘ I can’t tell you.

Here’s the function I wrote to convert each token into my desired title case:

     <xsl:function name="functx:titleCase" as="xs:string">
         <xsl:param name="s" as="xs:string"/>
         <xsl:choose>
             <xsl:when
                 test="lower-case($s)=('a','aboard','about','above','absent','across','after','against','alongside','amid','amidst','among','amongst','an','and','around','as','aslant','astride','at','athwart','atop','barring','before','behind','below','beneath','beside','besides','between','beyond','but','by','despite','down','during','except','failing','following','for','from','in','inside','into','like','mid','minus','near','next','nor','notwithstanding','of','off','on','onto','opposite','or','out','outside','over','past','per','plus','regarding','round','save','since','so','than','the','through','throughout','till','times','to','toward','towards','under','underneath','unlike','until','up','upon','via','vs.','when','with','within','without','worth','yet')">
                 <xsl:value-of select="lower-case($s)"/>
             </xsl:when>
             <xsl:otherwise>
                 <xsl:value-of
                     select="concat(upper-case(substring($s, 1, 1)), lower-case(substring($s, 2)))"/>
             </xsl:otherwise>
         </xsl:choose>
     </xsl:function>

It checks if each token (if converted to lower case) matched a string that I wanted to be lower case, a set of the little words which shouldn’t be capitalized in a title. I actually can’t, for the life of me, remember where the heck I got this list…that’s bad documentation on my part.

If it matches one of those strings, it returns the lower case value of that string, and begins processing the next token. If it doesn’t match, it falls back on the option of capitalizing the first letter (using some native sub-string functions) of the string.

That’s what happens to every title! Pretty neat, huh? I would then add a ‘sort’ attribute to the titles, essentially duplicating the MARC second indicator for a 245. I had to add 1 to it though, XSLT counts the first character as position “1”, not “position 0” as is more common in computer programming. Come to think of it…I really could’ve written a quick thing to add those ‘sort’ attributes automatically — just some kind of check for first token “the”, “a”, “an” kind of thing. Oh well! You live and learn. That’s why I’m writing this up, to talk through what I did, and realize what I could’ve/should’ve done.

That’s titles, when next we return we’ll talk the programmatic conversion of directors/writers.

 

Emflix – Part Four – XML – Structure

Link to Part Three

So I’d now received data from my boss, it was the 3000+ movies/tv shows in Emerson’s media collection in an Excel spreadsheet, which I then saved as XML and began to explore.

Here’s an example of what it looked like as XML:

<row>
 <Heading0>PN 19959 C 55 S 68 2002</Heading0>
 <Heading1>[DVD] PN1995.9.C55 S68 2002</Heading1>
 <Heading2>1941 [videorecording] / Universal Pictures and Columbia Pictures ; A-Team production ; screenplay by Robert Zemeckis and Bob Gale ; story by Robert Zemeckis, Bob Gale and John Milius ; produced by Buzz Feitshans ; directed by Steven Spielberg.</Heading2>
 <Heading3>1941</Heading3>
 <Heading4></Heading4>
 <Heading5></Heading5>
 <Heading6>Universal Home Video,</Heading6>
 <Heading7>2002</Heading7>
 <Heading8>1979</Heading8>
 <Heading9>2002-1979</Heading9>
 <Heading10>[2002]</Heading10>
 <Heading11>eng</Heading11>
 <Heading12>English dialogue, optional subtitles in French and Spanish; closed-captioned.</Heading12>
 <Heading13>Originally released as a motion picture in 1979.
Special features: restored footage not included in the original theatrical release; an original documentary on The making of 1941, including new video interviews with Steven Spielberg, Bob Gale, John Miliu</Heading13>
 <Heading14>Dan Aykroyd, Ned Beatty, John Belushi, Lorraine Gary, Murray Hamilton, Christopher Lee, Tim Matheson, Toshiro Mifune, Warren Oates, Robert Stack, Treat Williams.</Heading14>
 <Heading15>This comedy is set in Los Angeles days after the attack on Pearl Harbor, when the fear of a Japanese invasion hung over the city.</Heading15>
 <Heading16>1336944</Heading16>
 <Heading17>1628403</Heading17>
 <Heading18>1461057</Heading18>
 <Heading19>0113503089311</Heading19>
 </row>

This is where it started getting really fun. I dug deeper into what I’d learned in my XML class, further refining the structure of what each movie element or tv element should look like, and I decided to control it with an .xsd file, an XML schema.


    <xs:complexType name="mediaListType">
         <xs:sequence>
             <xs:element name="movie" type="mediaType" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="tvShow" type="mediaType" minOccurs="1" maxOccurs="unbounded"/>
         </xs:sequence>
     </xs:complexType>
    <xs:complexType name="mediaType">
         <xs:sequence>
             <xs:element name="title" type="titleType" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="director" type="personType" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element name="genreWrap" type="genreWrapType" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="writer" type="personType" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element name="screenplay" type="linkType" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element name="language" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="year" type="xs:gYear" minOccurs="0" maxOccurs="1"/>
             <xs:element name="callNumber" type="linkType" minOccurs="1" maxOccurs="1"/>
         </xs:sequence>
         <xs:attributeGroup ref="mediaAttributeGroup"/>
     </xs:complexType>
    <xs:complexType name="personType">
         <xs:simpleContent>
             <xs:extension base="xs:string">
                 <xs:attribute ref="differentiator" use="optional"/>
                 <xs:attribute name="sort" type="xs:positiveInteger"/>
             </xs:extension>
         </xs:simpleContent>
     </xs:complexType>

I’m not going to include the entire .xsd file here, but if anyone it super interested, I’d be happy to do that and discuss it at length. One thing I want to talk about is that “personType”.


Learn From My Mistakes

If you’re going to want to display names of people in alphabetical order, store them that way! Atomic, atomic, atomic!

<personName>
   <lastName>Ganin</lastName>
   <firstName>Netanel</firstName>
<personName>

That way you never have to do what I did which was adding a “sort” attribute to a name stored as a single element. What happened was this: I wanted to display the names in direct order to the patrons, as I think patrons prefer to see them that way, but because I didn’t realize (or trust my own skills?) that I’d be able to un-invert them in the display, I ended up storing them in direct order and recording the character on which I wanted to sort the name:

<personName sort="9">Netanel Ganin<personName>

There’s another attribute you may’ve noticed there, “differentiator”. Who wants to guess what I was re-inventing? That’s right, authority control! Do NOT reinvent the wheel! This is the overarching theme of this project’s retrospective. Authority control has already been done for tens of thousands (millions?) of names. But here came Past-Me all smug that he could do it better!

When two people had the same name (which to be fair, in such a small set of items [and considering this was all showbiz, the various guilds already prevented some of this for me!] there wasn’t much) I added a differentiator attribute to be displayed with the person’s name. I think the only doubles were Harrison Ford (1884) & (1942) and Thomas Lennon (1970) & (1951).


I also added a screenplay element which would contain a permalink to the screenplay (if we had it) in the Emerson catalog. This involved searching each movie individually as I entered it into my structure to see if we had the screenplay. It meant more work, but I’m actually really proud of including that. Build the links! These are related works, and if I can give myself a moment of praise here — I was bringing related works together.

Until this point, everything was still displayed as a table — there was one which functioned as an index of writers and directors with that person’s filmography listed in the cells to the right and below. Another table was an index of the films by title, but I still didn’t know what to do with these genres! I was adding them, but I hadn’t figured out how a movie could be assigned multiple genres/subgenres/subsubgenres and work that into something patrons could use.

I’m still not getting to that yet though, next time I’ll discuss a piece (and then another piece, and then another…) of my XSLT catalog-to-Netanel’s-structure converter.

Emflix – Part Three – Beginnings

Link to Part Two

My original vision was a table.

Screen Shot 2016-01-17 at 3.35.52 PM.png

This was part of my  actual presentation to the higher-ups to get the greenlight. Pretty sophisticated, right? I actually planned to type these all into my table by hand. That was the whole “grand new interface”.

Then I took my XML class and I had a new idea. If I stored the movie data as XML, I could use XSLT to make it look like anything I wanted.  It began with a very simple structure:

<mediaList>

<movie>
<title>1941</title>
<director>Steven Spielberg</director>
<writer>Robert Zemeckis</writer>
<writer>Bob Gale</writer>
<callNumber>PN1995.9.C55 S68 2002</callNUmber
</movie>

<movie>
<title>Duel</title>
<director>Steven Spielberg</director>
<writer>Richard Matheson</writer>
<callNumber>PN1995.8.S87 S65 2004</callNUmber
</movie>

</mediaList>


So yes, initially — I was just typing them into my XML structure, and I was using XSLT to transform that XML into the same darn table. My little homegrown structure grew though, it grew and grew.

  1. The first thing I added was an id attribute, which I copied from the bibliographic id assigned to every item by Voyager, Emerson’s ILS. I figured having a unique id which wouldn’t change would be the best way to identify items.  ….This ended up not working out, more on that later!
  2. The second was a date attribute for when I created and when I last modified the movie element.
  3. I added elements for language, and for year of release.
  4. Most importantly — genres

The above example, of “1941” now looked something like this:

 <movie id="1336944" dateCreated="2014-07-18">
 <title>1941</title>
 <director>Steven Spielberg</director>
 <genreWrap>
 <genre>Comedy</genre>
 <subGenre>Spoofs and Satire</subGenre>
 </genreWrap>
 <writer>Robert Zemeckis</writer>
 <writer>Bob Gale</writer>
 <language>English</language>
 <year>1979</year>
 <callNumber href="http://endeavor.flo.org/vwebv/holdingsInfo?bibId=1336944">[DVD] PN1995.9 .C55 S68 2002</callNumber>
 </movie>

I knew that the catalog had all the data I could want in terms of descriptive data, but where could I turn for genres? I really didn’t relish the idea of performing my own genre analysis of some 3000 movies/tv shows.

A little old juggernaut called Netflix.

I found at http://dvd.netflix.com/AllGenres a very clearly laid out three-layers-deep structure of genres. Was it perfect? Of course not. But it was a beginning. I’d type in a movie into the Netflix search (DVD rental, not streaming) and copy the genres, subgenres, and sub-subgenres found there.

Then, my boss told me that she could deliver me all the data at once, exported from Microsoft Access to an Excel spreadsheet. I jumped at this, because now I’d be able to transform that data, into my new structure.

That’ll bring us, to part 4!


Learn From My Mistakes

Do not create your own XML data structure. Standards exist, they exist all over the place. Use Dublin Core, use MODS, use some third or fourth thing — do not just make up your own, because that is the opposite of shareable.

 

 

Strap on Your Chinderwear, We’re Off to the Memory Bank

First off, there is no cataloging here — so if you’re only reading my blog for Library/Cataloging things, sorry! It’s my blog, sometimes I will blog about things that interest me.


I love MST3K.

I was first introduced to the show by my mother who had VHS tapes given to her by her colleague at work. From the first, I enjoyed it — the adorable robots, the movies themselves, the barrage of riffs. But as I grew older, and as the show became more “mine”, it meant more to me than simply 90 minutes of laughter.

See I was a bit of a nerd growing up, a little isolated, a little weird, and a little shy. But when I put on MST3K, from the moment I first heard “In the not too distant future…” and that grey-purple spaghetti ball filled the screen, I was with friends. Because that’s how it felt — like I was watching a movie with my friends. I know, big insight, right?

The shadowrama made them immediate, made them feel more present. I learned to look not just for jokes, but for Tom turning his head or leaning over to quip something directly to Mike. Those weird moments when Crow would turn his head just so and suddenly the nestor-cube effect would kick in and I’d think he was facing me. When Joel would stand up and actually point at the screen, it was so much more than bad-movie-fun-making, they were here in my living room!

Their use of callbacks (and the occasional call-forward, see: Gerry and Sylvia) increased the intimacy by rewarding consistent-watching and close attention. I should mention here that I didn’t participate in any fandom, as that’s never been my jam. I didn’t prefer Mike to Joel, Trace to Bill, or Pearl to Dr. Forrester (though TV’s Frank is obviously the best). While I read mst3kinfo.com often, I never posted. I would share a short or two with a friend, but rarely a full movie. I found that in small doses many people could enjoy it, but that it could grow long and tiresome in bulk. It remained a solitary passion for me in my life, but shared with the friends in the show itself.

That’s fine. This isn’t some post about like, my superior MST3K fanboy skills or anything — fuck that. I love the show, and I’m musing on why, but I would never say that if someone didn’t like it, they just “didn’t get it”. I 100% know and believe that people can “get the show” and not be into it.

Here’s the rub for me. Here’s what made it so special.

People often say “they watch bad movies and make fun of them!” And sure, they do that. But they could’ve just done that, and that’d be a show. Bad movie plays, they crack wise — everyone laughs. But that wasn’t the show, for me. Obviously I love the movies, and their jokes over the movies — i’m not saying that doesn’t matter, but if it was JUST that, I probably wouldn’t have found the connection that I did.

They were characters! They had little skits! There were robots for no reason. Again, the show could’ve been “Joel Hodgson, Kevin Murphy, and Trace Beaulieu goof on films” — there’s no reason to have robots, and the Satellite of Love, and the Mads, and Deep 13, and the invention exchange, and movie sign, and Gypsy, and Cambot, and Tom singing songs, and putting letters up on stillstore. All of that was the show to me. It was making fake radio shows with my sister, it was shooting silly movies with a friend in his basement, it was skits I’d write in my bedroom, and it was all with my robot pals. It was terrifyingly low- budget and it was wonderful. The care they put into the non-movie parts of the show, translated directly into the movie parts: made with love.

Like a Christopher Guest film, they mocked with love. These are cheesy movies we’d be watching anyway, just enjoying them more — together. And when it got a little mean, it was less funny. When I rewatch, and I still do — I cringe at the occasional joke at the expense of femininity, or “Brain-Guy could you be any more gay” and the many fat jokes. “Mitchell” is not one i put on often… But it usually wasn’t mean, they cared about the movies and it showed.

So I guess that’s it? I explained and mused and rambled enough for one night. I love this show, and I’m supporting it getting a few more episodes over here on kickstarter — even though I know it won’t be the same. Riff Trax/Cinematic Titanic aren’t the same, but hey — nothing ever is.

Aw heck, you know I can’t resist: here’s a little cataloging for y’all

630 0 0 Mystery science theater 3000

650 _ 0 B films $v Humor

655 _ 7 Puppet films. $2 lcgft