This is a follow up post to my previous post on faceting class of people/ethnic groups in LCSH. If you haven’t read that one, go back and read it first!
After I promoted that post a bit on twitter, I got into some lively discussion with MARCinaColdClimate and Ethan Fenichel on twitter. A piece of that discussion centered around our subject indexes, that is — if we were to create new headings based on atomic units of headings (i.e. “Catalogers”+ “Jews” becomes “Catalogers Jews”) someone searching in a subject index for ‘Jews’ wouldn’t find the term.
Two responses to that problem:
Our indexes are already out of whack because we are inconsistent in our ‘ordering’ of identity facets in LCSH already.
Gay Men, White will file in the ‘Gs’ but ‘African American gay men’ will file in the ‘As’
A solution! During index generation, permute the 650s (i.e. 3 terms become 6 in the index)
While obviously I have no idea how your (or my!) ILS generates its index, I figured I’d at least give it a shot with some MARCXML/XSLT.
I wanted to try two different examples, one in which the three terms are already in a single $a of a single 650 (as demonstrated in L 410 Section 2: Option 1.a) and one in which each term was in its own 650 as expected.
The permutation functions can take any number of arguments (until memory runs out I guess…) and even sorts ’em alphabetically! Pretty neat, huh?
Of course looking at this…another problem jumped out at me:
Take the first example: when ‘Jews’ is the terminal word, it’s fine — but if it isn’t it really needs to be ‘Jewish’.
In the second example, I can’t really imagine ever having ‘Catalogers’ first as a term. It’d have to be something like “Cataloging Lesbian Older people’ — which sounds like a function not a description.
Simply permuting terms would not be enough, the system would have to be able to tweak them.
Use as a form subdivision under names of individual religious and monastic orders and under individual religions, individual Christian denominations, classes of persons and ethnic groups for whose use the prayers are intended; under names of individual saints, deities, etc., to whom the devotions are directed; and under topical headings for prayers and devotions on those topics.
Use as a topical subdivision under names of individual religious and monastic orders and under individual religions, individual Christian denominations, classes of persons, and ethnic groups for works about prayers intended for their use; under names of individual saints, deities, etc., for works about devotions directed to them; and under topical headings for works about prayers and devotions on those topics.
That’s….an awful lot of things to divide under, right? If you made it to the end, you found “topical headings.” My error was thinking that a topical heading meant any heading, and therefore a class of person could also be a topical heading. Because after all, everything is a topic, right?
In LCSH terms, if you’re looking for something that’s useable under any heading then you want the word: “subject”
—Electronic information resources
Use as a topical subdivision under subjects for works about electronic information resources on those subjects.
Use as a form subdivision under subjects for lists of publications about the subject that provide information about their location, availability, etc.
Use as a form subdivision under subjects for computer games on those subjects.
Those can be used as subdivisions under anything in LCSH (assuming the resource actually fulfills the qualifications).
For the most part though, use of subdivisions is tightly controlled as you saw in the –Prayers and devotion example.
Before beginning to subdivide any heading, it can be really useful to ask yourself what type of heading am I trying to subdivide? Is this a class of person? A corporate body? A war?
I mentioned use of subdivisions being controlled, this is a bit of a misnomer. We talk about controlling headings all day — hit Shift+F11 in OCLC and control those headings! Turn ’em blue and feel good about yourself. But just because you’ve ‘controlled’ a heading doesn’t mean you’ve used it correctly. OCLC isn’t performing any validation beyond string matching.
Code for the subject category that is associated with the 1XX field in an established heading record or a node label record.
Code indicates the relative position of the heading in a particular hierarchical arrangement in the thesaurus specified by the value in the second indicator position or in subfield $2 (Code source). Field 072 is repeated for each location of the heading in a specific thesaurus and for multiple subject category codes when a heading is common to different thesauri.
the tl;dr version is this: it’s a code to indicate (to a machine) what category of subject this heading is.
As I spent some time praising LCDGT in my last post, let’s hop over to them again and show you what I mean.
See, those 072 fields tell us that ‘Jews’ belongs to the ‘eth’ category and the ‘rel’ category. Now because I know my LCDGT, I know that that means the Ethnic/cultural group and Religion group.
Now imagine a world where all the LCSH were coded in the 072s like that. Thanks to Galen Charlton (and his Evergreen wizardry), here’s an actual example:
072 7 $a H 1145.5 $2 lcsh
151 $a Big Round Lake (Polk County, Wis.)
H1145.5 is the memo governing “Bodies of Water”. There’s also a subdivision –Navigation which can be used as a topical subdivision under names of individual bodies of water, and it has an 073 field (the subdivision counterpart to 072):
073 $a H 1145.5 $z lcsh
180 $x Navigation
So the 072 field says to a machine, “hey I’m a body of water (or at least an H1145.5), if you’re going to subdivide me, make sure the subdivision is pH balanced for me!”
Then the 073 field would confirm or deny and the machine would deliver a helpful message to the user.
Now I know what you’re going to say, “Wow Netanel, that sounds great, and also you’re very handsome! Why doesn’t OCLC already do this?”
That’s a great question, and also I know.
One obstacle is the vast majority of LCSH headings do not have an 072, and same goes for subdivisions and 073.
I wish there were a crowd-sourcing program whereby LC has a little interface: a term pops up, you select from the list of “category types” and maybe if three independent people select the same thing, it goes into the record. I know I’d waste time helping myself and all future catalogers out by doing that.
This feels very do-able, so let’s make it happen! I want to live in the future where we say, “I controlled the headings” and it means controlled and validated.
(Ever-resourceful Galen pointed me towards this document about the PSD tackling this very task…from 2012 — the short of it is, so many of the types of headings have exceptions that they’d need human intervention for most of the headings. (my personal favorite “make sure you remember this” exception is that ‘African Americans’ is an ‘Ethnic Group (H1103)’ and ‘African American dentists’ is a ‘Class of Persons (H1100)’ and same goes for all analogous headings.) They were only going to focus on:
H1120 Names of families
H 1145.5 Bodies of water
H 1151.5 Types of educational institutions
H 1185 Religions
H 1187 Christian denominations
H 1195 Land vehicles
H 1200 Wars
I’d be very curious as to if they have any updates on the project and will contact email@example.com to see)
I want to return to something I eluded to at the top of the post and perhaps danced around.
In order for full-effective use of 072/073s we’ll need to enumerate all possible values.
So here’s my question: are there types besides:
H 1100 Classes of Persons
H 1103 Ethnic Groups
H 1105 Corporate Bodies
H 1110 Names of Persons
H 1120 Names of Families
H 1140 Names of Places
H 1145.5 Bodies of Water
H 1147 Animals
H 1148 Art
H 1149 Chemicals
H 1149.5 Colonies
H 1150 Diseases
H 1151 Individual Educational Institutions
H 1151.5 Types of Educational Institutions
H 1153 Industries
H 1154 Languages
H 1154.5 Legal Topics
H 1155 Legislative Bodies
H 1155.2 Groups of Literary Authors
H 1155.6 Literary Works Entered Under Author
H 1155.8 Literary Works Entered Under Title
H 1156 Literatures
H 1158 Materials
H 1159 Military Services
H 1160 Musical Compositions
H 1161 Musical Instruments
H 1164 Organs and Regions of the Body
H 1180 Plants and Crops
H 1185 Religions
H 1186 Religious and Monastic Orders
H 1187 Christian Denominations
H 1188 Sacred Works
H 1195 Land Vehicles
H 1200 Wars
Absolutely there are!
$x Access control (May Subd Geog)
Use under types of archives, records, computers, computer networks, and statistical and data-gathering services
$x Air conditioning (May Subd Geog)
Use under types of buildings, vehicles, and other constructions.
x Digitization (May Subd Geog)
Use under types of library materials.
Therefore as part of this, we’d have to figure out what codes to use in the 072/073 if there is no specific memo for that type of heading.
Finally my biggest question of all:
What is a topical heading? Is a topical heading anything coded a 150? No, because it could be a Class of Person, or a War (for instance). It certainly can’t be a heading coded 100, 110, 111, 148, 151 or 155, so 150 is necessary but not sufficient. Is it anything which isn’t one of the other types of headings? Because if that’s the case, we really need to work on the full enumeration of what types of headings there are!
I know what you’re thinking, ‘Faceted LCSH’? Netanel! We already have that it’s called FAST and its main purpose is to confuse new catalogers who encounter seemingly duplicate headings in their copy-cataloging work.
(I kid, I kid, OCLC–we all love FAST please don’t come for me)
This is not a proposal to replace LCSH with a faceted subject terminology, rather it’s me musing on something that’s been in the back of my head since my deep dive into the LCDGT, and then came even farther towards the front of my head through increased tangling with LCSH proposals.
So what the heck am I talking about?
Well the other day, I noticed that there was no ‘African American sexual minorities’ as an LCSH term. So I wrote up the proposal and submitted it.
V. int. that there is no Afr. Am. sexual minorities in #LCSH only: Alaska Native Arab Am Asian Am Hispanic Am Pacific Islander Am
But I started thinking about LCDGT again. This is an excerpt from the manual, L 485 (emphasis mine)
If a creator self-identifies as belonging to a group that
includes several discrete elements, assign a separate term for each element that will be useful for discovery purposes. Example:
An author who self-identifies as a lesbian teenager. Assign the terms
Lesbians and Teenagers.
Now in our old friend, LCSH we of course have the specific term, “Lesbian teenagers” as well as the more general terms “Lesbians” and “Teenagers“. But I started thinking about the flexibility if our cataloging rules let us assign this way in subjects settings.
Because here’s the thing. Sometimes the resource in hand is about “Gay flight attendants“, well great — LCSH has you covered! But what if your resource is about African American flight attendants? No dice, that’s not a term…yet. You’d have to spend the time (and have had the training) to create it, wait for months to hear back, and even then maybe you messed it up and need to resubmit.
The fact is LCSH suffers from some bloat in that the ‘classes of person’, ‘occupations’, and ‘age groups’ (yes the very same groups identified by LCDGT) end up crossing and recrossing with each other to create new terms and each term requires its own authority record and research for proposals and time for the PSD to approve.
Another advantage of allowing us to assign terms like the LCDGT is that it would eliminate some of the inconsistencies (every cataloger’s bane) found in LCSH.
Take “Gay youth” and “Young gay men” for example — one uses ‘youth’ the other uses ‘young’. Under this modification one would be the array of:
and the other assigned:
This way you drop the inconsistency over “youth” vs. “young” as well as the inconsistent ordering of terms.
Anyway — this isn’t a formal proposal, because I haven’t thought it all the way through to the inevitable problems, just something to think about.
I know you knew this was coming. How could I post about the LCGFT manual and not post about the Library of Congress Demographic Group Terms manual? I actually heard Janis Young (from the Library of Congress Policy and Standards Division dontchaknow) give her talk on on the LCDGT and the demonym conundrum three times at ALA Midwinter. It got better every time I heard it.
Wait, the what manual?
Okay, so this hasn’t been around as long as LCGFT, and there’s a legitimate chance that you aren’t really sure what I’m even talking about here, so let me do a quick recap on what this vocab is even for. (For the official deets on the LCDGT’s purpose and creation, see the formal document)
Briefly, the LCDGT is a vocabulary consisting of people-characteristics. Where they’re from, what they do, their age, etc. There are 10 categories:
Educational level group
Medical, psychological, and disability group
Occupational/field of activity group
Sexual orientation group
Okay. That’s a lot of groups, and it is fraught. I know it’s fraught, and I know you know its fraught. Heck, I emailed LC on May 12th 2015 to express some concerns I had.
This is a primer though, so I’m not digging deep, check out the full biz for yourselves at Class Web, or if you don’t have access to that, here are some other options:
You can browse it at id.loc.gov — though last I checked it’s a bit janky. When you click on a given group…it doesn’t actually display every term in that group. Not sure why, but I’ve seen it posted on a list serv and I know LC knows about it.
You can download datasets through this Class Web download — unfortunately though, those aren’t in RDF, just .mrc files
You could always use the one I built here *plug plug* — because I had to transform MARCXML rather than RDF, I couldn’t just use the XSLT I’d built for my QueerLCSH project, but I was able to re-use chunks of that code. (and I included some RDFa data (SKOS) in the html, so that’s cool)
So what’s the point of the LCDGT? Why do we need yet another vocab?
LCSH has a lot of headings which secretly have contributor/creator data or intended audience data embedded in them. That’s not great. So much of what the library-data community has been working on in the last…..aeons is to UN-mix our data. Get your title info out of my non title field! Stop putting your expression biz into my manifestation baz!
So it makes sense that we’d want to keep our subject headings subjects and keep our people info somewhere else.
Fantasy fiction, Korean
Revolutionary poetry, Lithuanian
See how these LCSH are smashing together aspects of their subject and creator/audience characteristics? This is what we want to fix.
Another reason is that patrons often select their resources based on these characteristics and we want to make that easier for them. If a patron wants to find a resource on “Finance for Spanish speaking women who are doctors, written by French speaking accountants”, our fancy new ILS-of-the-future will help us help them find it. (I assume it’ll be very good at limiting by facets.
MARC 385 field is where we’re putting intended audience. Use no indicators, and no ending punctuation (unless term ends with punctuation). Toss a $2 lcdgt on it, and call it a day. It’s repeatable, and LC’s practice is to place each term used in a separate 385, though you can double up (or more!) if you want.
Remember, you can use these in bibliographic records as well as authority records for works.
245 00 $a Canadian Bates’ guide to health assessment for nurses.
385 ## $a Nurses $2 lcdgt
100 1# $a Blume, Judy.
245 10 $a It’s not the end of the world.
520 ## $a When her parents divorce, a sixth grader struggles to understand
that sometimes people are unable to live together.
385 ## $a Children of divorced parents $2 lcdgt
385 ## $a Preteens $2 lcdgt
385 ## $a Middle school students $2 lcdgt
385 ## $a Junior high school students $2 lcdgt
MARC 386 is where we’re putting creator/contributor information. Just as above in the 385, use no indicators, and no ending punctuation (unless term ends with punctuation). Toss a $2 lcdgt on it, and call it a day. It’s repeatable, and LC’s practice is to place each term used in a separate 386, though you can double up (or more!) if you want.
I want to highlight here how much the words ‘self-identifies’ appears in this memo (and throughout the manual. I think that’s a great decision on LCs part to make sure it’s clear that we are to listen to the people whose work we are describing. They are the final (and best) arbiters of their own lives.
Interestingly, unlike the increased use of $3 in the LCGFT manual to indicate different genres for different pieces, there is no use of $3 to distinguish which LCDGT terms are to apply to which creator/contributor (in the common case of multiple). We just code all the LCDGT terms for all the creators/contributors. I’m curious to see how that shakes out in practice.
Just as with the 385, you can use these in bibliographic records as well as authority records for works.
100 1# $a Russell, Rachel Renée.
245 10 $a Dork diaries : $b tales from a not-so-fabulous life / $c
Rachel Renée Russell.
386 ## $a Virginians $2 lcdgt
386 ## $a Lawyers $2 lcdgt
[“Rachel Renée Russell is an attorney. … Rachel lives in Chantilly, Virginia”–Author blurb.]
100 1# $a Sadler, Matthew.
245 $a Tips for young players / $c Matthew Sadler.
386 ## $a Chess players $2 lcdgt
386 ## $a Britons $2 lcdgt
[“Britain’s No. 3 ranked player Grandmaster Sadler answers key questions…”–Page 4 of cover.]
This isn’t the hill I want to die on, but this memo happened to alert me to the fact that Marines is an NT of Soldiers. The marines hate that. My understanding is that they see the U.S. Army members = Soldiers, not just any member of the U.S. Armed Forces. Like I said, it’s not my hill nor battle, but there it is.
An interesting example given is:
Children of gay men
not BT Children
[Although Children of gay men appears to refer to people under thirteen years of age, the term refers to any child of gay men, including those who have reached adulthood. The BT Children is therefore not appropriate.]
It’s a fine line and perhaps one that should have been avoided, because essentially this means that the vocabulary is using the same word ‘Children’ in two different ways. One of those ways is an absolute technical definition, “someone who is the child of someone else.” The other is dependent on the subject’s age. So for example, while for now any books authored by li’l Harper Harris-Burtka could be given 386s of
386 _ _Children $2 lcdgt
386 _ _Children of gay men $2 lcdgt
As soon as he hits adulthood, his future works can only be given the second term.
There’s a general rule about assigning multiple BTs
when the term is intrinsically part of two or more groups. The BTs may be from different categories.
BT Information scientists
BT Library employees
I quibble here again that I know many librarians who are not library employees, thus seeming to violate the ‘intrinsic’ aspect of the definition. To some, a librarian is anyone with an ML[I]S, to others its someone who works in a library in a professional capacity.
Another interesting thing I thought was that no terms are made for language groups. That is, German speakers and English speakers share no BT with some term like, ‘Germanic language speakers’.
8. Pejorative or outdated terminology. Generally avoid making UFs for pejorative or long disused terminology.
Words or phrases that were formerly pejorative but are not any longer may be provided as UFs. For example, research indicates that use of the word Cheeseheads to refer to people from Wisconsin was pejorative but is now considered acceptable, so it may be a UF to Wisconsinites.
This was the rule I cited way back when to request that LC remove a term from their UFs for the LCDGT. I also just like really like the idea of people doing intense research to figure out if ‘Cheeseheads’ was offensive or not.
Again, much weight is placed on the self-identification of the creators/contributors. I especially like the phrase:
Avoid assigning terms based on a photograph or picture of the creator, or based on the creator’s name, because they can be misleading regarding age, ethnicity, gender, etc.
That’s a great caveat to include. Another warning is under the edition section
Demographic terms assigned to earlier or later editions of the same resource may be reused with caution in the new cataloging record. Creators may self-identify with different demographic groups over the course of their lifetimes.
I appreciate that LC is aware of these realities and provides guidance in the memos.
One thing that I want to highlight is the ‘Overlapping terms’ instruction in both this and the previous memo. There are many terms which may seem to overlap with each other and thus adding both would be redundant. An example given is:
Here’s the thing though, terms from different categories are not overlapping terms, within the LCDGT. Remember the 11 groups above — African Americans is an ethnic/cultural group, whereas Americans is a national/regional group. Though outside LCDGT-land, all African Americans are obviously also American, within this sphere, it isn’t redundant because they’re part of different groups.
Each group has its own memo, most of which have special provisions in addition to the general rules given in the broader memos. (The Gender category, Medical, Psychological and Disability category, and Sexual Orientation categories have no additional provisions so I didn’t note anything about them)
Reciprocal instruction of the above instruction re: Age/Education.
If the resource is American use terms reflecting the American educational system, but if it isn’t then you’re welcome to propose (or use, if already accepted) new terms for the non-American educational system.
The memo includes special instructions on proposing such terms.
There’s a lot of instructions here on creating and assigning demonyms. It’s all well and good if we stay at or above the first-order administrative subdivision divisions. Unfortunately SOMEONE suggested to the Policy and Standards Division that they create demonyms for below that level.
There’s a note in the background section that members of religious orders (e.g. Bendectines) are in the Social category, not this one — why aren’t members of religious orders in the religion category?
Common sense note not to apply terms to creators when its redundant with the work being cataloged. I.e. don’t add “Authors” to a creator characteristic when cataloging a book.
No BTs may be constructed from an Occupation/Field of activity term to a Gender, Religion, or Sexual orientation category. This is a good rule, as it allows that any given occupation is not limited to a single gender, religion or sexual orientation — acknowledgement of the great variance in humanity is a good thing.
The social group is the catch-all for things that don’t really fit in the other groups.
No BT references are made to indicate age/gender/sexual orientation (good job, again)
In terms of “members of a specific organization” the only allowed terms are world wide scouting organizations and political parties. Otherwise they prefer you to construct a term based on the identity of that organization.
i.e. rather than adding a term, “American Humanist Association members”, add “Humanists”
That’s all! Go forth and start applying those terms. Write to LC, write to Janis, post stuff on your blog, yell at me on twitter. Do what yer gonna do.
One last thing….
Here’s something cool — unwashed randos, just like you and me (i.e. non SACO members) can submit LCDGT proposals through this survey monkey for a limited time! Before you click through though! Let me save you some time — they’re going to want you to have AT THE READY several memos from the manual. But for some reason, they didn’t actually link them. I am a kinder, gentler person and will provide those links for your LCDGT-creation convenience.
*Yes! It must be noted that Children’s X is actually one of the stated exceptions in the LCGFT manual (See memo J 270), but the fact that they had to make an exception for it, demonstrates that they’re aware it breaks the rules. I provided it here to see if you were paying attention.
So these three are actually older LCDGT headings, but id.loc.gov hadn’t been updating for months and months! I emailed Janis Young and she got it sorted out, so I was finally able to add these back-catalog headings. Many thanks to her.
Alex pointed me towards ‘Polari’ another good term I was missing, but I then realized that it was actually an NT of a term already on my list ‘Gay men–Language’. So I added to my error checkers a little code to spit out to any NT that isn’t on my list. By my reckoning, a given BT may not necessarily need to be on the list itself (e.g. ‘Bisexual parents’ has a BT of ‘Parents’) but every NT should be on the list.
So to that end, I also ended up adding:
African American bisexuals
Pacific Islander American bisexuals
Asian American bisexuals
Frankly, everyone who isn’t Alex is doing a real bad job of identifying ones I’ve missed!
Thanks again to Alex for their continued noticing of headings I’ve missed! Let’s be real — I should’ve been using truncation when I did my initial searches. Rookie mistake, Ganin, rookie mistake.
I’ve now added:
Intersexuality in literature
Intersexuality in children
Intersexuality in art
Added “LGBT History Month” as it was in the most recent New LCSH
So first off — a tremendous thank-you to my wonderful and observant colleague Alex who noticed something missing from my headings: ‘Astrology and homosexuality’. How could it be?! I’d been so careful! So let’s revisit my original process:
I searched Sexual minorities (and minority), queer, lesbian, gay, gender, orientation, intersex, transgender, transexual, and bisexual at id.loc.gov and grabbed every relevant term from the search. If any of those words appeared in a 150 or 450, I nabbed it.
But did you see what I didn’t search? That’s right — ‘Homosexuality’. In information retrieval terms this is precision and recall. I visually inspected each term to make sure they weren’t false positives (recall) but I wasn’t wide enough with my initial search terms to get perfect precision. I missed many relevant results.
Because this actually goes deeper than not having searched ‘homosexuality’. I searched ‘Homosexual’ and got still more that I needed. I searched ‘Bisexuality‘ and came up with one that wasn’t retrieved on a search for ‘Bisexual’. I searched ‘Lesbianism’ and found still more. So what’s important to note (for both me and others) is that these aren’t ALL the terms, it’s just the 862 I could find.
So if you’ve been maintaining your own list somewhere — and need to know which are the new terms, here they are:
Added “Two-spirit people in literature” as it was in the most recent New LCSH
Added two more terms (as they were in the most recent New LCSH)
Sexual minority veterans
Updated the BT for transgender veterans, to Sexual minority veterans (as per New LCSH)
In so doing, I discovered a mistake! There were about 25 or so subject headings which did not have an attribute of @rdf:about in the madsrdf:Topic of madsrdf:hasBroaderAuthority. Without this attribute, the link constructed in the BT wouldn’t work. I manually went through and added those attributes, so you may cease your panicking.
Changed the links to its permanent home at my new website
I finished all the additional 10 pages (about 200 terms!) under “Gay” – and also added any term which had an RT in this list, but the term itself wasn’t on the list. (if that made sense…)
I added the ‘May Subd Geog’ to each term that can be, and also ‘Former heading’ to those UFs which are formerly authorized headings, rather than standard variant labels.
I added a bit of statistical info to the beginning
Couple other notes [from original posting]
I decided to code it so that if the BT/NTs were in this list, they’d be anchor links, but if they weren’t it’d take you to their record at id.loc.gov
When you click an anchor, I wrote in a yellow highlighting effect, I like it, what do you think?
I didn’t display the “Subd Geog” status, do folks want that? It’s easy enough to add in, all the data is still there
I made the un-authorized forms italics to help them stand back a bit, whereas Class Web has them normal-style. Thoughts?
I’d love to hear feedback!
is this useful for anyone?
Any terms I missed
Any terms I should’ve avoided? (while obviously side-stepping the fact that many of these terms aren’t very good, they’re the ones in the vocabulary at the moment)
N.B. I’m still missing about 10 pages of various Gay [profession] but I’ll finish them up soon (there are TONS of these…). I think that there’s enough here that you’ll get the idea