Text Mining: Definitions


Roger Bilisoly, Practical Text Mining with Perl, (Hoboken, N.J.: Wiley 2008).

Michael S. Evans, “A Computational Approach to Qualitative Analysis in Large Textual Datasets”, in: PLoS ONE, 9:2 (2014).

Ronen Feldman, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, (Cambridge: Cambridge University Press 2007).

Louise Francis, “Taming Text: An Introduction to Text Mining”, in: Casualty Actuarial Society Forum, (2006), <>.

H. E. Green, “Under the Workbench: An Analysis of the Use and Preservation of MONK Text Mining Research Software”, in: Literary and Linguistic Computing, (9 April 2013), p. fqt014-, <>.

Sholom Weiss et al., Text Mining Predictive Methods for Analyzing Unstructured Information, (New York: Springer 2004).

Ian H. Witten, “Text Mining”, in: Munindar P . Singh (ed.), The Practical Handbook of Internet Computing, (Boca Raton; London: Chapman and Hall/CRC 1999), p. 198.

Text Mining: Critical reflections on possibilities and limitations


Jean-Baptiste Michel et al., “Quantitative Analysis of Culture Using Millions of Digitized Books.”, in: Science (New York, N.Y.), 331:6014 (14 January 2011), pp. 176–82, <>.

John Bradley, “What you (fore)see is what you get: Thinking about usage paradigms for computer assisted text analysis”, in: Text Technology, 14:2 (2005), pp. 1-19.

P. Gooding, M. Terras & C. Warwick, “The Myth of the New: Mass Digitization, Distant Reading, and the Future of the Book”, in: Literary and Linguistic Computing, 28:4 (13 August 2013), pp. 629–639, <>.

Stephen Marche, “Literature Is Not Data: Against Digital Humanities”, in: The Los Angeles Review of Books, 28 October 2012 <> ( 22 June 2014).

Matthew Kirschenbaum, “The Remaking of Reading: Data Mining and the Digital Humanities”.

Martin Mueller, “Digital Shakespeare, or towards a Literary Informatics”, in: Shakespeare, 4:3 (September 2008), pp. 284–301, <>.

Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism, (Urbana: University of Illinois Press 2011).

J. Berenike Herrmann, Karina van Dalen-Oskam & Christof Schöch, “Revisiting Style, a Key Concept in Literary Studies”, in: Journal of Literary Theory, 9:1 (2015), pp. 25–52, <>.

Franco Moretti, “The Slaughterhouse of Literature”, in: Distant Reading, (London: Verso 2013), pp. 207–227.

Matt Erlin & Lynne Tatlock, “Introduction: “Distant Reading” and the Historiography of Nineteenth-Century German Literature”, in: Matt Erlin & Lynne Tatlock (eds.), Distant Readings: Topologies of German Culture in the Long Nineteenth Century, (Rochester, New York: Camden House 2014).

Franco Moretti, “Conjectures on World Literature”, in: Distant Reading, (London: Verso 2013).

Adam Kirsch, “Technology Is Taking Over English Departments: The False Promise of the Digital Humanities”, in: New Republic, :May 2 (2014), <>.

Franco Moretti, Distant Reading, (London: Verso 2013).

D L Hoover, J Culpeper & K O’Halloran, Digital Literary Studies: Corpus Approaches to Poetry, Prose, and Drama, (2014).

Martin Mueller, “Stanley Fish and the Digital Humanities”, 2012, <>.

David M Berry, “The Computational Turn: Thinking about the Digital Humaities”, 12 (2011), pp. 1–22.

Shawna Ross, “In Praise of Overstating the Case: A Review of Franco Moretti, Distant Reading (London: Verso, 2013)”, in: Digital Humanities Quarterly, 8:1 (2014).

Katie Trumpener, “Critical Response: I. Paratext and Genre System: A Response to Franco Moretti”, in: Critical Inquiry, 36:1, pp. 159–71.

Stefan Gradmann & Jan Christoph Meister, “Digital Document and Interpretation: Re-Thinking “text” and Scholarship in Electronic Settings”, in: Poiesis & Praxis, 5 (17 January 2008), pp. 139–153.



Tools criticism


Chris Anderson, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”, in: Wired Magazine, 16:7 (2008), <>.

Jean Bauer, “Who You Calling Untheoretical?”, in: Journal of Digital Humanities, 1:1 (2011), <>.

David Berry, Critical Theory and the Digital, (London: Bloomsbury Publishing, 2014).

Johanna Drucker, “Theory as Praxis: The Poetics of Electronic Textuality”, in: Digital Poetics, (Tuscaloosa: University of Alabama Press 2002).

Tom Eyers, “The Perils of the “Digital Humanities”: New Positivisms and the Fate of Literary Theory”, in: Postmodern Culture, 23:2 (2013).

A. Galey & S. Ruecker, “How a Prototype Argues”, in: Literary and Linguistic Computing, 25:4 (27 October 2010), pp. 405–424, <>.

Mark Sample, “When Does Service Become Scholarship?”, <>.





Jessica Hullman & Nicholas Diakopoulos, “Visualization Rhetoric: Framing Effects in Narrative Visualization”, in: IEEE Transactions on Visualization and Computer Graphics, (2011).

Stephen Ramsay, “In Praise of Pattern”, in: TEXT Technology, 14:2 (2005), pp. 177–190, <>.

Mario Petrucci, “Scientific Visualizations: Bridge-Building between the Sciences and the Humanities via Visual Analogy“Everything One Invents Is True” Gustave Flaubert”, in: Interdisciplinary Science Reviews, 36:4 (1 December 2011), pp. 276–300, <>.

Stéfan Sinclair, Stan Ruecker & Milena Radzikowska, “Information Visualization for Humanities Scholars”, in: Literary Studies in the Digital Age, (Modern Language Association of America 2013).

Edward Tufte, The Visual Display of Quantitative Information, (Cheshire: Graphics Press 1983).

Stefan Bertschi et al., “What is knowledge visualization? Perspectives on an emerging discipline”, in: Proceedings of the International Conference on Information Visualisation, (2011), pp. 329–336.

Lev Manovich, “What Is Visualization?”, in: Visual Studies, (2011).

Johanna Drucker, “Humanities Approaches to Graphical Display”, in: Digital Humanities Quarterly, 5:1 (2011), <>.

Jacques Bertin, Semiology of Graphics, (Madison: University of Wisconsin Press 1983).

Stephen Few, “Data Visualization for Human Perception”, in: Mads Soegaard & Rikke Friis Dam (eds.), The Encyclopedia of Human-Computer Interaction, (Aarhus: The Interaction Design Foundation 2014).

Leland Wilkinson, The Grammar of Graphics, (New York: Springer 2005).

Michelle A. Borkin et al., “What Makes a Visualization Memorable”, in: IEEE Transactions on Visualization and Computer Graphics, 19:12 (2013), pp. 2306–2315.

Ben Fry, Visualizing Data, (Cambridge: O’Reilly Media Inc. 2008).

Maureen Stone, “Information Visualization: Challenge for the Humanities”, in: Working Together or Apart : Promoting the next Generation of Digital Scholarship : Report of a Workshop Cosponsored by the Council on Library and Information Resources and the National Endowment for the Humanities, (Washington D.C.: 2009), pp. 43–57.

Research projects based on text mining

Dawn Archer, Jonathan Culpeper & Paul Rayson, “Love – “a Familiar or a Devil”? An Exploration of Key Domains in Shakespeare’s Comedies and Tragedies”, in: Dawn Archer (ed.), What’s in a Word-List?: Investigating Word Frequency and Keyword Extraction, (Farnham: Ashgate 2009), pp. 137–158.

Shlomo Argamon et al., “Vive La Différence! Text Mining Gender Difference in French Literature”, in: Digital Humanities Quarterly, 3:2 (2009).

Shlomo Argamon & Mark Olsen, “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis”, in: Digital Humanities Quarterly, 3:2 (2009).

C. Brierley & E. Atwell, “Holy Smoke: Vocalic Precursors of Phrase Breaks in Milton’s Paradise Lost”, in: Literary and Linguistic Computing, 25:2 (14 April 2010), pp. 137–151.

J. Burrows, “Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship”, in: Literary and Linguistic Computing, 17:3 (1 September 2002), pp. 267–287, <>.

T. E. Clement, ““A Thing Not Beginning and Not Ending”: Using Digital Tools to Distant-Read Gertrude Stein’s The Making of Americans”, in: Literary and Linguistic Computing, 23:3 (5 September 2008), pp. 361–381.

T. Clement et al., “Distant Listening to Gertrude Stein’s “Melanctha”: Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence”, in: Literary and Linguistic Computing, 28:4 (23 July 2013), pp. 582–602, <>.

N. Coffee et al., “The Tesserae Project: Intertextual Analysis of Latin Poetry”, in: Literary and Linguistic Computing, 28:2 (20 July 2012), pp. 221–228, <>.

Neil Coffee et al., “Modelling the Interpretation of Literary Allusion with Machine Learning Techniques”, in: Digital Humanities 2013, (Nebraska–Lincoln: 2013).

M. Eder, “Does Size Matter? Authorship Attribution, Small Samples, Big Problem”, in: Literary and Linguistic Computing, (14 November 2013), p. fqt066-, <>.

W. E. Y. Elliott & R. J. Valenza, “Two Tough Nuts to Crack: Did Shakespeare Write the “Shakespeare” Portions of Sir Thomas More and Edward III? Part I”, in: Literary and Linguistic Computing, 25:1 (14 August 2009), pp. 67–83, <>.

C. W. Forstall, S. L. Jacobson & W. J. Scheirer, “Evidence of Intertextuality: Investigating Paul the Deacon’s Angustae Vitae”, in: Literary and Linguistic Computing, 26:3 (30 May 2011), pp. 285–296, <>.

Richard S Forsyth, “Stylochronometry with Substrings, or: A Poet Young and Old”, in: Literary and Linguistic Computing, 14:4 (1999), pp. 467–477, <>.

R. Hohl Trillini & S. Quassdorf, “A “Key to All Quotations”? A Corpus-Based Parameter Model of Intertextuality”, in: Literary and Linguistic Computing, 25:3 (18 May 2010), pp. 269–286, <>.

D. I. Holmes & D. W. Crofts, “The Diary of a Public Man: A Case Study in Traditional and Non-Traditional Authorship Attribution”, in: Literary and Linguistic Computing, 25:2 (14 April 2010), pp. 179–197, <>.

Sonia Howell et al., “A Digital Humanities Approach to Narrative Voice in The Secret Scripture: Proposing a New Research Method”, in: Digital Humanities Quarterly, 8:2 (2014).

M. L. Jockers, “Testing Authorship in the Personal Writings of Joseph Smith Using NSC Classification”, in: Literary and Linguistic Computing, 28:3 (26 October 2012), pp. 371–381, <>.

M. L. Jockers & D. M. Witten, “A Comparative Study of Machine Learning Methods for Authorship Attribution”, in: Literary and Linguistic Computing, 25:2 (12 April 2010), pp. 215–223, <>.

D. Li, C. Zhang & K. Liu, “Translation Style and Ideology: A Corpus-Assisted Analysis of Two English Translations of Hongloumeng”, in: Literary and Linguistic Computing, 26:2 (3 March 2011), pp. 153–166, <>.

K. Luyckx & W. Daelemans, “The Effect of Author Set Size and Data Size in Authorship Attribution”, in: Literary and Linguistic Computing, 26:1 (16 August 2010), pp. 35–55, <>.

K. Mahowald, “A Naive Bayes Classifier for Shakespeare’s Second-Person Pronoun”, in: Literary and Linguistic Computing, 27:1 (10 November 2011), pp. 17–23.

E. C. Papakitsos, “Computerized Scansion of Ancient Greek Hexameter”, in: Literary and Linguistic Computing, 26:1 (17 September 2010), pp. 57–69, <>.

L. Pearl & M. Steyvers, “Detecting Authorship Deception: A Supervised Machine Learning Approach Using Author Writeprints”, in: Literary and Linguistic Computing, 27:2 (7 March 2012), pp. 183–196, <>.

J. Rybicki & M. Eder, “Deeper Delta across Genres and Languages: Do We Really Need the Most Frequent Words?”, in: Literary and Linguistic Computing, 26:3 (14 July 2011), pp. 315–321, <>.

J. Rybicki & M. Heydel, “The Stylistics and Stylometry of Collaborative Translation: Woolf’s Night and Day in Polish”, in: Literary and Linguistic Computing, 28:4 (27 May 2013), pp. 708–717, <>.

G. B. Schaalje et al., “Extended Nearest Shrunken Centroid Classification: A New Method for Open-Set Authorship Attribution of Texts of Varying Sizes”, in: Literary and Linguistic Computing, 26:1 (18 January 2011), pp. 71–88, <>.

Martha Nell Smith et al., ““Undiscovered Public Knowledge”: Mining for Patterns of Erotic Language in Emily Dickinson’s Correspondence with Susan Huntington (Gilbert) Dickinson”, in: Digital Humanities, (2006), pp. 252–255.

T. Suzuki et al., “Co-Occurrence-Based Indicators for Authorship Analysis”, in: Literary and Linguistic Computing, 27:2 (15 April 2012), pp. 197–214, <>.

Andreas van Cranenburgh & Rens Bod, “A Data-Oriented Model of Literary Language”, (12 January 2017), <>.

Q. Wang & D. Li, “Looking for Translator’s Fingerprints: A Corpus-Based Study on Chinese Translations of Ulysses”, in: Literary and Linguistic Computing, 27:1 (3 November 2011), pp. 81–93, <>.

W. Wiersma, J. Nerbonne & T. Lauttamus, “Automatically Extracting Typical Syntactic Differences from Corpora”, in: Literary and Linguistic Computing, 26:1 (11 October 2010), pp. 107–124, <>.

A. Wilson, “The Regressive Imagery Dictionary: A Test of Its Concurrent Validity in English, German, Latin, and Portuguese”, in: Literary and Linguistic Computing, 26:1 (20 December 2010), pp. 125–135, <>.