Variation in Creativity and Imagination

Phylogenies of Literature

Research team

Oleg Sobchuk, Principal Investigator
Angela Chira, Co-Investigator
Artjoms Šeļa, Co-Investigator
Mason Youngblood, Co-Investigator
Olivier Morin, Collaborator
Ted Underwood, Collaborator

Key Information

Full title: Phylogenies of Literature
Host institution: Max Planck Institute for Evolutionary Anthropology (Leipzig, Germany)
Research location: Germany

This project is one of three additional Seedcorn awards.

Project overview 

Most people read fictional stories, which are extremely diverse in their topics, styles, and genres. There are modern genres, for example: cyberpunk science fiction, true crime, urban fantasy. But there are also genres that were popular centuries ago, and are now read only by literary historians: silver-fork novel, picaresque, invasion literature… The stunning diversity of literary fiction – past and present – resembles the diversity of the biological world, with its millions of species. But, unlike biologists, literary historians do not have an overarching view of the diversity of books. Biologists use phylogenetic trees to reconstruct the large-scale relationships between species; literary scholars know a lot about individual authors or books, but they do not have an equivalent “map” of literary evolution. Why? Because, until recently, making such a map would have required reading and comparing dozens of thousands of books in a single study – an impossible task.

This image has been generated with Midjourney text-to-image model (version 5.1)

This project aims to draw such a macro-map of literary evolution. For this, we use the tools and techniques that have become available only recently: large digital libraries (including dozens of thousands of books), advanced techniques of “text mining” (that is, “reading” the texts using algorithms of machine learning), and new methods for building “rooted phylogenetic networks”, which differ from traditional phylogenies but are probably more suitable for cultural phenomena such as literary fiction. This combination of new methods will allow us to do what seemed impossible just a few years ago: building a phylogenetic network of literary evolution over three hundred years.

Project contact

If you would like to contact the project team, please email the grant management team in the first instance, at

Oleg Sobchuk: Twitter
Angela Chira: Twitter
Artjoms Sela: Twitter
Mason Youngblood: Twitter
Olivier Morin: Twitter
Ted Underwood: Twitter