Lovecraft with NLP. No, not the dodgy cultic ‘neuro linguistic programming’. NLP as in proper hardcore computer programming, in the form of ‘Natural Language Processing’ for digital humanities work. Towards Data Science currently has long articles showing exactly how to have a computer crunch the Lovecraft fiction corpus and thus help to answer questions such as…
Are the stories as negative as we thought? What are the most used adjectives, are they “horrible” and “unknown” and “ancient”?
Ideally the corpus would first be carefully chunked, split into distinct sections relating to his phases and places. Each would be probed separately. It’s probably big enough to chunk. Otherwise you’d get a bit of a smushy answer to such questions. “The Quest of Iranon” (1921) is not the same beastie as “The Shadow out of Time” (1935) etc.
Lovecraft with NLP: Part 1: Rule-Based Sentiment Analysis
Lovecraft with NLP: Part 2: Tokenisation and Word Counts
It looks like more parts are planned.
Update: Lovecraft with NLP: Part 3: TF-IDF and K-Means Clustering. At which point, having seen two articles, you hit the paywall.
Update: Lovecraft with NLP: Part 4: Latent Semantic Analysis.
gbsteve said:
If you chunk the data, there’s often not enough of it to get reliable answers, but digital humanities is a big area of development for data science.