So I’ve been flirting with the idea of working on a project for a certain someone out there in Library Land #Cryptic — and the project involves subject headings, our old friend LCSH to be specific. In my XML projects in the past, both at Simmons and at Emerson, I never used “real” encoding standards. Here’s an example of a ‘media’ element from Emflix, the best thing I ever did, and my single proudest achievement in my whole life? I guess?
<media id=“732373” dateCreated=“2015-05-12” lastModified=“2015-05-12”>
<title>Pirates of the Caribbean: The Curse of the Black Pearl</title>
<director sort=“6”>Gore Verbinski</director>
<actor sort=“8”>Johnny Depp</actor>
<actor sort=“10”>Geoffrey Rush</actor>
<actor sort=“9”>Orlando Bloom</actor>
<actor sort=“7”>Keira Knightley</actor>
<actor sort=“6”>Jack Davenport</actor>
<actor sort=“10”>Jonathan Pryce</actor>
<actor sort=“5”>Lee Arenberg</actor>
<actor sort=“11”>Mackenzie Crook</actor>
<actor sort=“8”>Damian O’Hare</actor>
<genre>Action and Adventure</genre>
<genre>Children and Family</genre>
<summary>When a young swain recruits rascally, charismatic pirate Capt. Jack Sparrow to help rescue a maiden from rival buccaneers, he and his motley crew soon find themselves up against frightening supernatural forces and an ancient curse.</summary>
<writer sort=“5”>Ted Elliott</writer>
<writer sort=“7”>Terry Rossio</writer>
<screenplay href=“http://endeavor.flo.org/vwebv/holdingsInfo?bibId=731245”>PN1997 .P49 2002</screenplay>
<callNumber href=“http://endeavor.flo.org/vwebv/holdingsInfo?bibId=732373”>[DVD] PN1995.9 .A3 V47 2003</callNumber>
Pretty wild, right? There are only about a zillion encoding standards I could’ve used to describe each movie and tv show, that would’ve made the project shareable, and the data easily transformable. Instead, I ploughed ahead with some home-grown, wheel-reinventing mess.
Don’t get me wrong, my project is awesome, and I learned so much along the way (one of the things I learned being, DONT REINVENT THE WHEEL).
So on this new project, and on any future projects, I want to stick to what’s already out there with regards to encoding standards and content standards. To that end, I’ve been playing around with MARCXML. Now on some level, sure, I understand it. Just looking at it — the datafields, subfields, tags and codes, are all pretty clearly 1:1 mapped to MARC as it appears in our OPACs and OCLC.
I decided to download the XML schema, the .xsd file — to make sure all my data was valid and conforming. What I found shocked me.
Yes, it was that dramatic.
On the official site for the MARCXML, there’s a link to the .xsd file and it is just utterly inadequate! I mean a good schema will validate with good data that matches, and invalidate with bad data that doesn’t match. But this schema doesn’t know the bad from the good!! What’s even the point!? You could just parp a 100 with a second indicator 4 and it will validate! It shouldn’t validate because that’s meaningless noise in the MARC standard, but the schema has no specifications for individual fields at all. It made me sad to see it.
Now when we upload records to ALMA, or create them in OCLC, these systems get cranky when indicators are invalid, or when subfields have the wrong punctuation, so there are clearly some better validators out there. Does anybody know where they live? Are there better MARCXML schemas out there in some other form that I just didn’t find, or do I have to write my own…