MARCXML, Part Three

Alright — So the leader:

Despite the spec of all types of MARC saying that the leader should be 23 characters, it is abundantly clear that when there are multiple spaces in a row, they are normalized to a single space in MARCXML. Rather than my assertions failing on every single MARC record I’m testing it on…I modified the assertions to allow that. *shrug* Gotta deal with the reality of the situation, and not just the spec.

Weirdness though!! So I’m using the oXygen XML editor. I cannot guarantee that all ya’ll can replicate this problem in your editors. When I add an error to my main MARCXML record to test my validator…suddenly I pop 10,000+ errors! Then I run it again (and any subsequent times) and get just the single error I added. Fix the error, and same thing: I have over a ten thousand errors, then run it second time (or more) — it’s clean and validated. Weird. At first it was making me nervous that I’d royally screwed something up. Glad that wasn’t the case (I hope).

So I had been adding assertion by assertion field by field, which became a bit tedious and I realized it was rather inefficient. There are so many fields whose indicators are the same, and it’d be more efficient to first categorize fields by indicators and then to create an assertion en masse.

To that end, I made this — a google sheet of indicators, it’s every single field possible in bibliographic, authority, holdings, and classification MARC organized by their indicators. I didn’t do community MARC because 1. I was super sick of doing it by that point, and 2. I’ve still never seen an example of community MARC! Is it in use? Anywhere?

The next step is then to add the assertions back into my .xsd file and test the heck out of the darn thing.


4 thoughts on “MARCXML, Part Three

  1. I’m not sure you meant this, but not all MARCXML collapses spaces in the leader – how you interpret the leader if this has happened?

    If of any interest I wrote a scraper for the LoC Bib MARC pages – data and code available from Sadly I didn’t include scraping the indicators in that, but it does include information on valid subfields and repeatability if that is of any interest.


  2. I will revisit this again — and reply back with my thoughts when I get a sec, super busy with NACO right now…but I really appreciate you taking a look!

    Thanks for the pointing of your scraper, I’m afraid I don’t understand what it is or what it does, or how to use it, or even how to post or use github…so I have many many more miles to go ahead of me!


    1. No problem. In terms of the scraper I hadn’t realised you had to sign up to github to download the data – apologies for that. If you’d prefer not to create a github a/c I could post the data somewhere else if you think it would be useful.

      The scraper retrieves the web page for each MARC field from the LoC MARC Bibliographic Format pages ( and captures each subfield that is mentioned on the page and records:

      Subfield code + Subfield label
      MARC Field (to which the subfield belongs)
      Repeatability of the Subfield
      URL from which the scraper retrieved the information
      MARC Field label

      I guess I was thinking the details of which subfields are valid for each MARC field, and the repeatability of those subfields might enable you to incorporate this into your xsd (although I should say I know little about writing xsds). I’d be happy to look at expanding the scraper to collect information about other MARC formats and to collect additional information from the LoC web pages.

      No pressure and no hurry of course! But drop me a line here or on Twitter (@ostephens) if you are interested.


      1. oh my glob!!! I have to learn how to write that kind of thing!! I finally finished getting every indicator possibility…but it took forever, I opened each page and wrote them down. I will eventually get all this Git-Biz figured out, so no need to transform it into any other form, but I appreciate the offer!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s