Emflix – Part Four – XML – Structure

Link to Part Three

So I’d now received data from my boss, it was the 3000+ movies/tv shows in Emerson’s media collection in an Excel spreadsheet, which I then saved as XML and began to explore.

Here’s an example of what it looked like as XML:

<row>
 <Heading0>PN 19959 C 55 S 68 2002</Heading0>
 <Heading1>[DVD] PN1995.9.C55 S68 2002</Heading1>
 <Heading2>1941 [videorecording] / Universal Pictures and Columbia Pictures ; A-Team production ; screenplay by Robert Zemeckis and Bob Gale ; story by Robert Zemeckis, Bob Gale and John Milius ; produced by Buzz Feitshans ; directed by Steven Spielberg.</Heading2>
 <Heading3>1941</Heading3>
 <Heading4></Heading4>
 <Heading5></Heading5>
 <Heading6>Universal Home Video,</Heading6>
 <Heading7>2002</Heading7>
 <Heading8>1979</Heading8>
 <Heading9>2002-1979</Heading9>
 <Heading10>[2002]</Heading10>
 <Heading11>eng</Heading11>
 <Heading12>English dialogue, optional subtitles in French and Spanish; closed-captioned.</Heading12>
 <Heading13>Originally released as a motion picture in 1979.
Special features: restored footage not included in the original theatrical release; an original documentary on The making of 1941, including new video interviews with Steven Spielberg, Bob Gale, John Miliu</Heading13>
 <Heading14>Dan Aykroyd, Ned Beatty, John Belushi, Lorraine Gary, Murray Hamilton, Christopher Lee, Tim Matheson, Toshiro Mifune, Warren Oates, Robert Stack, Treat Williams.</Heading14>
 <Heading15>This comedy is set in Los Angeles days after the attack on Pearl Harbor, when the fear of a Japanese invasion hung over the city.</Heading15>
 <Heading16>1336944</Heading16>
 <Heading17>1628403</Heading17>
 <Heading18>1461057</Heading18>
 <Heading19>0113503089311</Heading19>
 </row>

This is where it started getting really fun. I dug deeper into what I’d learned in my XML class, further refining the structure of what each movie element or tv element should look like, and I decided to control it with an .xsd file, an XML schema.


    <xs:complexType name="mediaListType">
         <xs:sequence>
             <xs:element name="movie" type="mediaType" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="tvShow" type="mediaType" minOccurs="1" maxOccurs="unbounded"/>
         </xs:sequence>
     </xs:complexType>
    <xs:complexType name="mediaType">
         <xs:sequence>
             <xs:element name="title" type="titleType" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="director" type="personType" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element name="genreWrap" type="genreWrapType" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="writer" type="personType" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element name="screenplay" type="linkType" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element name="language" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
             <xs:element name="year" type="xs:gYear" minOccurs="0" maxOccurs="1"/>
             <xs:element name="callNumber" type="linkType" minOccurs="1" maxOccurs="1"/>
         </xs:sequence>
         <xs:attributeGroup ref="mediaAttributeGroup"/>
     </xs:complexType>
    <xs:complexType name="personType">
         <xs:simpleContent>
             <xs:extension base="xs:string">
                 <xs:attribute ref="differentiator" use="optional"/>
                 <xs:attribute name="sort" type="xs:positiveInteger"/>
             </xs:extension>
         </xs:simpleContent>
     </xs:complexType>

I’m not going to include the entire .xsd file here, but if anyone it super interested, I’d be happy to do that and discuss it at length. One thing I want to talk about is that “personType”.


Learn From My Mistakes

If you’re going to want to display names of people in alphabetical order, store them that way! Atomic, atomic, atomic!

<personName>
   <lastName>Ganin</lastName>
   <firstName>Netanel</firstName>
<personName>

That way you never have to do what I did which was adding a “sort” attribute to a name stored as a single element. What happened was this: I wanted to display the names in direct order to the patrons, as I think patrons prefer to see them that way, but because I didn’t realize (or trust my own skills?) that I’d be able to un-invert them in the display, I ended up storing them in direct order and recording the character on which I wanted to sort the name:

<personName sort="9">Netanel Ganin<personName>

There’s another attribute you may’ve noticed there, “differentiator”. Who wants to guess what I was re-inventing? That’s right, authority control! Do NOT reinvent the wheel! This is the overarching theme of this project’s retrospective. Authority control has already been done for tens of thousands (millions?) of names. But here came Past-Me all smug that he could do it better!

When two people had the same name (which to be fair, in such a small set of items [and considering this was all showbiz, the various guilds already prevented some of this for me!] there wasn’t much) I added a differentiator attribute to be displayed with the person’s name. I think the only doubles were Harrison Ford (1884) & (1942) and Thomas Lennon (1970) & (1951).


I also added a screenplay element which would contain a permalink to the screenplay (if we had it) in the Emerson catalog. This involved searching each movie individually as I entered it into my structure to see if we had the screenplay. It meant more work, but I’m actually really proud of including that. Build the links! These are related works, and if I can give myself a moment of praise here — I was bringing related works together.

Until this point, everything was still displayed as a table — there was one which functioned as an index of writers and directors with that person’s filmography listed in the cells to the right and below. Another table was an index of the films by title, but I still didn’t know what to do with these genres! I was adding them, but I hadn’t figured out how a movie could be assigned multiple genres/subgenres/subsubgenres and work that into something patrons could use.

I’m still not getting to that yet though, next time I’ll discuss a piece (and then another piece, and then another…) of my XSLT catalog-to-Netanel’s-structure converter.

Advertisements

2 thoughts on “Emflix – Part Four – XML – Structure

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s