My original vision was a table.
This was part of my actual presentation to the higher-ups to get the greenlight. Pretty sophisticated, right? I actually planned to type these all into my table by hand. That was the whole “grand new interface”.
Then I took my XML class and I had a new idea. If I stored the movie data as XML, I could use XSLT to make it look like anything I wanted. It began with a very simple structure:
<mediaList> <movie> <title>1941</title> <director>Steven Spielberg</director> <writer>Robert Zemeckis</writer> <writer>Bob Gale</writer> <callNumber>PN1995.9.C55 S68 2002</callNUmber </movie> <movie> <title>Duel</title> <director>Steven Spielberg</director> <writer>Richard Matheson</writer> <callNumber>PN1995.8.S87 S65 2004</callNUmber </movie> </mediaList>
So yes, initially — I was just typing them into my XML structure, and I was using XSLT to transform that XML into the same darn table. My little homegrown structure grew though, it grew and grew.
- The first thing I added was an id attribute, which I copied from the bibliographic id assigned to every item by Voyager, Emerson’s ILS. I figured having a unique id which wouldn’t change would be the best way to identify items. ….This ended up not working out, more on that later!
- The second was a date attribute for when I created and when I last modified the movie element.
- I added elements for language, and for year of release.
- Most importantly — genres
The above example, of “1941” now looked something like this:
<movie id="1336944" dateCreated="2014-07-18"> <title>1941</title> <director>Steven Spielberg</director> <genreWrap> <genre>Comedy</genre> <subGenre>Spoofs and Satire</subGenre> </genreWrap> <writer>Robert Zemeckis</writer> <writer>Bob Gale</writer> <language>English</language> <year>1979</year> <callNumber href="http://endeavor.flo.org/vwebv/holdingsInfo?bibId=1336944">[DVD] PN1995.9 .C55 S68 2002</callNumber> </movie>
I knew that the catalog had all the data I could want in terms of descriptive data, but where could I turn for genres? I really didn’t relish the idea of performing my own genre analysis of some 3000 movies/tv shows.
A little old juggernaut called Netflix.
I found at http://dvd.netflix.com/AllGenres a very clearly laid out three-layers-deep structure of genres. Was it perfect? Of course not. But it was a beginning. I’d type in a movie into the Netflix search (DVD rental, not streaming) and copy the genres, subgenres, and sub-subgenres found there.
Then, my boss told me that she could deliver me all the data at once, exported from Microsoft Access to an Excel spreadsheet. I jumped at this, because now I’d be able to transform that data, into my new structure.
That’ll bring us, to part 4!
Learn From My Mistakes
Do not create your own XML data structure. Standards exist, they exist all over the place. Use Dublin Core, use MODS, use some third or fourth thing — do not just make up your own, because that is the opposite of shareable.