Genres, subgenres, and sub-subgenres, oh my!
This was one of the entire reasons I built the darn thing, and it involved many many many revisions decisions, re-decisions, and changes. So bare with me as I try to remember the process.
First things first, the XML structure of genres was like this:
<genreWrap> <genre>Some Genre</genre> <subGenre>great subgenre</subGenre> <subGenre>another subgenre</subGenre> <subSubGenre>wow! another level!</subSubgenre> </genreWrap>
Each <media> element could have as many <genreWrap> elements as necessary (in any order), but each <genreWrap> element must have had one (and only one) <genre> element.
Each <genreWrap> element could have as many <subGenre> elements as necessary (in any order), but each <subSubGenre> element had to follow directly after the <subGenre> element to which it belonged.
I used the enumeration declaration of xsd to control for every possible value of a <genre>, <subGenre>, and <subSubGenre> element.
Why three? How did I decide on three levels of hierarchy for genre-analysis? Easy! Remember, I didn’t want to have to analyze 3000+ films myself, and because I was going to be using Netflix for the genre analysis, I chose three levels because that’s their maximum.
Check out the full list of Netflix’s genres.
Here is the declaration of my genreType.
<xs:simpleType name="genreType"> <xs:restriction base="xs:string"> <!--=============================== FILM GENRE ENUMERATION --> <xs:enumeration value="Action and Adventure"/> <xs:enumeration value="Anime and Animation"/> <xs:enumeration value="Children and Family"/> <xs:enumeration value="Classics"/> <xs:enumeration value="Comedy"/> <xs:enumeration value="Documentary"/> <xs:enumeration value="Drama"/> <xs:enumeration value="Faith and Spirituality"/> <xs:enumeration value="Foreign"/> <xs:enumeration value="Horror"/> <xs:enumeration value="Independent"/> <xs:enumeration value="LGBTQ"/> <xs:enumeration value="Music and Musicals"/> <xs:enumeration value="Romance"/> <xs:enumeration value="Sci-Fi and Fantasy"/> <xs:enumeration value="Sports"/> <xs:enumeration value="Special Interest"/> <xs:enumeration value="Thrillers"/> <!--=============================== TV GENRE ENUMERATION --> <xs:enumeration value="Television"/> </xs:restriction> </xs:simpleType>
Careful comparers will notice some differences in the main genres between Netflix and Emflix.
- No ampersands
- ‘Gay & Lesbian’ changed to ‘LGBTQ’
- ‘Sports & Fitness’ changed to ‘Sports’
- XML doesn’t like ampersands and they have to be represented as an entity ‘&’ — I chose to just change them all to ‘and’. (I’m sure a better computer coder person could’ve kept them all as entities and still handled it well on the other end with the html transformation and php, but I just ignored the issue)
- ‘Gay & Lesbian’ is not a particularly inclusive category term. Frankly, Netflix is actually treating it as more inclusive than the term implies — take a look at its subgenres, or NTs if you will. Bisexual is listed there. While I’m glad that they’re including such films and giving them metadata, it’s not great to subsume bisexuality as a subset of gay and lesbian sexualities. That’s the kind of shit bi folks have to put up with all day everyday everywhere else, I didn’t want Emflix to continue the same practice.
- There’s also the matter of films about trans folks or otherwise gender non conforming people. Our collection had a bunch of films which would fall into this type of category, and I didn’t feel comfortable including them under ‘Gay & Lesbian’, as again, that term doesn’t cover gender identity. I wanted to be explicit.
- I actually met with Emerson’s director of diversity education and human relations about terminology. While Emerson doesn’t have a house-style-guide, I thought it’d be good to run my choices by someone who worked professionally in the field of knowing things about inclusion.
- The dropping of ‘Fitness’ was a lot less fraught. While Netflix’s collection has LOTS of workout videos and exercise videos…our collection didn’t. Zero of them. So…that’s it.
Okay, so that explains the main genres. A note here about potential for errors.
I had a file (that’ll I’ll probably be referring back to from time to time) called “ErrorChecker.xsl”. Good name, I know. As I would encounter an error that I couldn’t control through the .xsd file (either through ignorance of how to do it, or because schema just can’t do all that I want it to do) I would add testers to my error checker, and then run it periodically.
Two that are relevant to main genres are the following:
<xsl:apply-templates select="mediaList/media[count(genreWrap[genre = 'Action and Adventure']) > 1]" mode="repeated"/>
(Yes, I had 18 additional versions of this one for every main genre) checks to see if a single <media> element had more than one <genreWrap> element with the same term as its primary. I was trying to prevent something like this:
<media> <title>Cool Action Movie</title> <genreWrap> <genre>Action and Adventure</genre> <subGenre>Action Sci-Fi and Fantasy</subGenre> </genreWrap> <genreWrap> <genre>Action and Adventure</genre> <subGenre>Action Thrillers</subGenre> </genreWrap> </media>
If I remember correctly, it didn’t actually screw anything up in the sorting, but it did screw up my statistical data. I couldn’t find a way to control that in the .xsd, so I just had to do it the way I did. Though now that I’m revisiting of course, I think rather than 19 versions of that apply-templates command, I could’ve done a for-each-group on each media element to see if it had a repeated genre element!
<xsl:apply-templates select="mediaList/media/genreWrap[count(genre) > 1]/../title" mode="multGenre"/>
This second tester made sure that a single <genreWrap> element had only one <genre> element. The errors caused by this DID screw up the sorting as the XSLT which created the browsing interface was only looking for one <genre> element per <genreWrap>.
Check back for part 12 when we talk subgenres (or some anyway, probably not ALL of them at one go)!