This is a follow up post to my previous post on faceting class of people/ethnic groups in LCSH. If you haven’t read that one, go back and read it first!
After I promoted that post a bit on twitter, I got into some lively discussion with MARCinaColdClimate and Ethan Fenichel on twitter. A piece of that discussion centered around our subject indexes, that is — if we were to create new headings based on atomic units of headings (i.e. “Catalogers”+ “Jews” becomes “Catalogers Jews”) someone searching in a subject index for ‘Jews’ wouldn’t find the term.
Two responses to that problem:
- Our indexes are already out of whack because we are inconsistent in our ‘ordering’ of identity facets in LCSH already.
Gay Men, White will file in the ‘Gs’ but ‘African American gay men’ will file in the ‘As’
- A solution! During index generation, permute the 650s (i.e. 3 terms become 6 in the index)
While obviously I have no idea how your (or my!) ILS generates its index, I figured I’d at least give it a shot with some MARCXML/XSLT.
I wanted to try two different examples, one in which the three terms are already in a single $a of a single 650 (as demonstrated in L 410 Section 2: Option 1.a) and one in which each term was in its own 650 as expected.
Given this input:
<?xml version=”1.0″ encoding=”UTF-8″?>
<slim:collection xmlns:slim=“http://www.loc.gov/MARC21/slim”>
<slim:record>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>African Americans</slim:subfield>
<slim:subfield code=“a”>Gays</slim:subfield>
<slim:subfield code=“a”>Jews</slim:subfield>
</slim:datafield>
</slim:record>
<slim:record>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Catalogers</slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Lesbians</slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Older people</slim:subfield>
</slim:datafield>
</slim:record>
</slim:collection>
This transformation:
<?xml version=”1.0″ encoding=”UTF-8″?>
<xsl:stylesheet xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”
xmlns:xs=“http://www.w3.org/2001/XMLSchema” xmlns:slim=“http://www.loc.gov/MARC21/slim”
xmlns:ng=“http://example.com/ng” exclude-result-prefixes=“xs ng” version=“2.0”>
<xsl:output indent=“yes”/>
<xsl:function name=“ng:permute” as=“item()*”>
<xsl:param name=“head” as=“item()*”/>
<xsl:param name=“tail” as=“item()*”/>
<xsl:choose>
<xsl:when test=“count($tail) eq 1”>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>
<xsl:for-each
select=“
$head,
$tail”>
<xsl:value-of select=“concat(normalize-space(.),’ ‘)”/>
</xsl:for-each>
</slim:subfield>
</slim:datafield>
</xsl:when>
<xsl:otherwise>
<xsl:sequence
select=“
for $pos in (1 to count($tail))
return
ng:permute(($head,
$tail[$pos]), $tail[position() ne $pos])”
/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:function name=“ng:permutations” as=“element()”>
<xsl:param name=“input” as=“item()*”/>
<slim:record>
<xsl:sequence
select=“
for $pos in (1 to count($input))
return
ng:permute($input[$pos], $input[position() ne $pos])”
/>
</slim:record>
</xsl:function>
<xsl:template match=“/”>
<slim:collection>
<xsl:apply-templates select=“slim:collection/slim:record/slim:datafield[1]”/>
</slim:collection>
</xsl:template>
<xsl:template match=“slim:datafield[count(slim:subfield) > 1]”>
<xsl:variable name=“sorted-subfields” as=“element()*”>
<xsl:perform-sort select=“slim:subfield[@code = ‘a’]”>
<xsl:sort select=“.”/>
</xsl:perform-sort>
</xsl:variable>
<xsl:copy-of select=“ng:permutations($sorted-subfields)”/>
</xsl:template>
<xsl:template match=“slim:datafield[count(slim:subfield) eq 1]”>
<xsl:variable name=“sorted-subfields” as=“element()*”>
<xsl:perform-sort select=“../slim:datafield”>
<xsl:sort select=“slim:subfield”/>
</xsl:perform-sort>
</xsl:variable>
<xsl:copy-of select=“ng:permutations($sorted-subfields)”/>
</xsl:template>
</xsl:stylesheet>
Produces this result:
<?xml version=”1.0″ encoding=”UTF-8″?>
<slim:collection xmlns:slim=“http://www.loc.gov/MARC21/slim”>
<slim:record>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>African Americans Gays Jews </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>African Americans Jews Gays </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Gays African Americans Jews </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Gays Jews African Americans </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Jews African Americans Gays </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Jews Gays African Americans </slim:subfield>
</slim:datafield>
</slim:record>
<slim:record>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Catalogers Lesbians Older people </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Catalogers Older people Lesbians </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Lesbians Catalogers Older people </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Lesbians Older people Catalogers </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Older people Catalogers Lesbians </slim:subfield>
</slim:datafield>
<slim:datafield tag=“650”>
<slim:subfield code=“a”>Older people Lesbians Catalogers </slim:subfield>
</slim:datafield>
</slim:record>
</slim:collection>
The permutation functions can take any number of arguments (until memory runs out I guess…) and even sorts ’em alphabetically! Pretty neat, huh?
Of course looking at this…another problem jumped out at me:
Grammatical agreement!
Take the first example: when ‘Jews’ is the terminal word, it’s fine — but if it isn’t it really needs to be ‘Jewish’.
In the second example, I can’t really imagine ever having ‘Catalogers’ first as a term. It’d have to be something like “Cataloging Lesbian Older people’ — which sounds like a function not a description.
Simply permuting terms would not be enough, the system would have to be able to tweak them.