This research project concerns the depiction of gender in historical English language novels, exploring how authors of various backgrounds and experiences described gender in their works.
Currently, we have analyzed a corpus of over 4,200 books from Project Gutenberg, an online book repository, utilizing programming methods we developed. Among our findings, we discovered the ratio of male pronouns to female pronouns, the most common words after male and female pronouns, and the distance between repetitions of male and female pronouns.
This MIT Digital Humanities Lab project is part of the MIT/SHASS Programs in Digital Humanities funded by the Mellon Foundation.
To use our tools or contribute to the project, please view our guide to contributing, CONTRIBUTING.md
. It includes information on how to install the tools we used as well as style guidelines for adding code. We are open to contributions and would love to see other people’s ideas, thoughts, and additions to this project, so feel free to leave comments or make a pull request!
For anybody who wants to use our code, here’s a little outline of where everything is.
In the gender_novels/gender_novels
folder, there are six folders:
analysis
— programming files focused on textual analysis and research write-ups, including data visualizations and conclusionscorpora
— metadata information on each book (including author, title, publication year, etc.), including sample data sets and instructions for generating a Gutenberg mirrordeployment
— this directory holds programming files and assets related to the Gender / Novels Flask websitepickle_data
— pickled data for various analyses to avoid running time-consuming computationtesting
— files for code teststutorials
— tutorials used by the lab to learn about various technical subjects needed to complete this project
For a user who’ll need some readily available methods for analyzing documents, the files you’ll most likely want are corpus.py
and novel.py
. These include methods used for loading and analyzing texts from the corpora. If you’d like to generate your own corpus rather than use the one provided in the repo, you’ll want to use corpus_gen.py
. If you’d only like a specific part of our corpus, the method get_subcorpus()
may be useful.
This document was prepared by the MIT Digital Humanities Lab.
Copyright © 2018, MIT Programs in Digital Humanities. Released under the BSD license. Some included texts might not be out of copyright in all jurisdictions of the world.