Text Mining: Definitions
Roger Bilisoly, Practical Text Mining with Perl, (Hoboken, N.J.: Wiley 2008).
Michael S. Evans, “A Computational Approach to Qualitative Analysis in Large Textual Datasets”, in: PLoS ONE, 9:2 (2014).
Ronen Feldman, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, (Cambridge: Cambridge University Press 2007).
Louise Francis, “Taming Text: An Introduction to Text Mining”, in: Casualty Actuarial Society Forum, (2006), <http://www.casact.net/pubs/forum/06wforum/06w55.pdf>.
H. E. Green, “Under the Workbench: An Analysis of the Use and Preservation of MONK Text Mining Research Software”, in: Literary and Linguistic Computing, (9 April 2013), p. fqt014-, <http://llc.oxfordjournals.org/content/early/2013/04/08/llc.fqt014.full>.
Sholom Weiss et al., Text Mining Predictive Methods for Analyzing Unstructured Information, (New York: Springer 2004).
Ian H. Witten, “Text Mining”, in: Munindar P . Singh (ed.), The Practical Handbook of Internet Computing, (Boca Raton; London: Chapman and Hall/CRC 1999), p. 198.
Text Mining: Critical reflections on possibilities and limitations
Jean-Baptiste Michel et al., “Quantitative Analysis of Culture Using Millions of Digitized Books.”, in: Science (New York, N.Y.), 331:6014 (14 January 2011), pp. 176–82, <http://www.sciencemag.org/content/331/6014/176.full>.
John Bradley, “What you (fore)see is what you get: Thinking about usage paradigms for computer assisted text analysis”, in: Text Technology, 14:2 (2005), pp. 1-19.
P. Gooding, M. Terras & C. Warwick, “The Myth of the New: Mass Digitization, Distant Reading, and the Future of the Book”, in: Literary and Linguistic Computing, 28:4 (13 August 2013), pp. 629–639, <http://dx.doi.org/10.1093/llc/fqt051>.
Stephen Marche, “Literature Is Not Data: Against Digital Humanities”, in: The Los Angeles Review of Books, 28 October 2012 <http://lareviewofbooks.org/essay/literature-is-not-data-against-digital-humanities> ( 22 June 2014).
Matthew Kirschenbaum, “The Remaking of Reading: Data Mining and the Digital Humanities”.
Martin Mueller, “Digital Shakespeare, or towards a Literary Informatics”, in: Shakespeare, 4:3 (September 2008), pp. 284–301, <http://www.tandfonline.com/doi/abs/10.1080/17450910802295179>.
Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism, (Urbana: University of Illinois Press 2011).
J. Berenike Herrmann, Karina van Dalen-Oskam & Christof Schöch, “Revisiting Style, a Key Concept in Literary Studies”, in: Journal of Literary Theory, 9:1 (2015), pp. 25–52, <http://www.jltonline.de/index.php/articles/article/view/757/1764>.
Franco Moretti, “The Slaughterhouse of Literature”, in: Distant Reading, (London: Verso 2013), pp. 207–227.
Matt Erlin & Lynne Tatlock, “Introduction: “Distant Reading” and the Historiography of Nineteenth-Century German Literature”, in: Matt Erlin & Lynne Tatlock (eds.), Distant Readings: Topologies of German Culture in the Long Nineteenth Century, (Rochester, New York: Camden House 2014).
Franco Moretti, “Conjectures on World Literature”, in: Distant Reading, (London: Verso 2013).
Adam Kirsch, “Technology Is Taking Over English Departments: The False Promise of the Digital Humanities”, in: New Republic, :May 2 (2014), <https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch>.
Franco Moretti, Distant Reading, (London: Verso 2013).
D L Hoover, J Culpeper & K O’Halloran, Digital Literary Studies: Corpus Approaches to Poetry, Prose, and Drama, (2014).
Martin Mueller, “Stanley Fish and the Digital Humanities”, 2012, <http://cscdc.northwestern.edu/blog/?p=332>.
David M Berry, “The Computational Turn: Thinking about the Digital Humaities”, 12 (2011), pp. 1–22.
Shawna Ross, “In Praise of Overstating the Case: A Review of Franco Moretti, Distant Reading (London: Verso, 2013)”, in: Digital Humanities Quarterly, 8:1 (2014).
Katie Trumpener, “Critical Response: I. Paratext and Genre System: A Response to Franco Moretti”, in: Critical Inquiry, 36:1, pp. 159–71.
Stefan Gradmann & Jan Christoph Meister, “Digital Document and Interpretation: Re-Thinking “text” and Scholarship in Electronic Settings”, in: Poiesis & Praxis, 5 (17 January 2008), pp. 139–153.
Chris Anderson, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”, in: Wired Magazine, 16:7 (2008), <http://www.wired.com/science/discoveries/magazine/16-07/pb_theory>.
Jean Bauer, “Who You Calling Untheoretical?”, in: Journal of Digital Humanities, 1:1 (2011), <http://journalofdigitalhumanities.org/1-1/who-you-calling-untheoretical-by-jean-bauer/>.
David Berry, Critical Theory and the Digital, (London: Bloomsbury Publishing, 2014).
Johanna Drucker, “Theory as Praxis: The Poetics of Electronic Textuality”, in: Digital Poetics, (Tuscaloosa: University of Alabama Press 2002).
Tom Eyers, “The Perils of the “Digital Humanities”: New Positivisms and the Fate of Literary Theory”, in: Postmodern Culture, 23:2 (2013).
A. Galey & S. Ruecker, “How a Prototype Argues”, in: Literary and Linguistic Computing, 25:4 (27 October 2010), pp. 405–424, <http://dx.doi.org/10.1093/llc/fqq021>.
Mark Sample, “When Does Service Become Scholarship?”, <http://www.samplereality.com/2013/02/08/when-does-service-become-scholarship/>.
Jessica Hullman & Nicholas Diakopoulos, “Visualization Rhetoric: Framing Effects in Narrative Visualization”, in: IEEE Transactions on Visualization and Computer Graphics, (2011).
Stephen Ramsay, “In Praise of Pattern”, in: TEXT Technology, 14:2 (2005), pp. 177–190, <http://digitalcommons.unl.edu/englishfacpubs/57/>.
Mario Petrucci, “Scientific Visualizations: Bridge-Building between the Sciences and the Humanities via Visual Analogy“Everything One Invents Is True” Gustave Flaubert”, in: Interdisciplinary Science Reviews, 36:4 (1 December 2011), pp. 276–300, <http://openurl.ingenta.com/content/xref?genre=article&issn=0308-0188&volume=36&issue=4&spage=276>.
Stéfan Sinclair, Stan Ruecker & Milena Radzikowska, “Information Visualization for Humanities Scholars”, in: Literary Studies in the Digital Age, (Modern Language Association of America 2013).
Edward Tufte, The Visual Display of Quantitative Information, (Cheshire: Graphics Press 1983).
Stefan Bertschi et al., “What is knowledge visualization? Perspectives on an emerging discipline”, in: Proceedings of the International Conference on Information Visualisation, (2011), pp. 329–336.
Lev Manovich, “What Is Visualization?”, in: Visual Studies, (2011).
Johanna Drucker, “Humanities Approaches to Graphical Display”, in: Digital Humanities Quarterly, 5:1 (2011), <http://digitalhumanities.org:8080/dhq/vol/5/1/000091/000091.html>.
Jacques Bertin, Semiology of Graphics, (Madison: University of Wisconsin Press 1983).
Stephen Few, “Data Visualization for Human Perception”, in: Mads Soegaard & Rikke Friis Dam (eds.), The Encyclopedia of Human-Computer Interaction, (Aarhus: The Interaction Design Foundation 2014).
Leland Wilkinson, The Grammar of Graphics, (New York: Springer 2005).
Michelle A. Borkin et al., “What Makes a Visualization Memorable”, in: IEEE Transactions on Visualization and Computer Graphics, 19:12 (2013), pp. 2306–2315.
Ben Fry, Visualizing Data, (Cambridge: O’Reilly Media Inc. 2008).
Maureen Stone, “Information Visualization: Challenge for the Humanities”, in: Working Together or Apart : Promoting the next Generation of Digital Scholarship : Report of a Workshop Cosponsored by the Council on Library and Information Resources and the National Endowment for the Humanities, (Washington D.C.: 2009), pp. 43–57.
Research projects based on text mining
Dawn Archer, Jonathan Culpeper & Paul Rayson, “Love – “a Familiar or a Devil”? An Exploration of Key Domains in Shakespeare’s Comedies and Tragedies”, in: Dawn Archer (ed.), What’s in a Word-List?: Investigating Word Frequency and Keyword Extraction, (Farnham: Ashgate 2009), pp. 137–158.
Shlomo Argamon et al., “Vive La Différence! Text Mining Gender Difference in French Literature”, in: Digital Humanities Quarterly, 3:2 (2009).
Shlomo Argamon & Mark Olsen, “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis”, in: Digital Humanities Quarterly, 3:2 (2009).
C. Brierley & E. Atwell, “Holy Smoke: Vocalic Precursors of Phrase Breaks in Milton’s Paradise Lost”, in: Literary and Linguistic Computing, 25:2 (14 April 2010), pp. 137–151.
J. Burrows, “Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship”, in: Literary and Linguistic Computing, 17:3 (1 September 2002), pp. 267–287, <http://dx.doi.org/10.1093/llc/17.3.267>.
T. E. Clement, ““A Thing Not Beginning and Not Ending”: Using Digital Tools to Distant-Read Gertrude Stein’s The Making of Americans”, in: Literary and Linguistic Computing, 23:3 (5 September 2008), pp. 361–381.
T. Clement et al., “Distant Listening to Gertrude Stein’s “Melanctha”: Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence”, in: Literary and Linguistic Computing, 28:4 (23 July 2013), pp. 582–602, <http://llc.oxfordjournals.org.ezproxy.leidenuniv.nl:2048/content/28/4/582.abstract>.
N. Coffee et al., “The Tesserae Project: Intertextual Analysis of Latin Poetry”, in: Literary and Linguistic Computing, 28:2 (20 July 2012), pp. 221–228, <http://dx.doi.org/10.1093/llc/fqs033>.
Neil Coffee et al., “Modelling the Interpretation of Literary Allusion with Machine Learning Techniques”, in: Digital Humanities 2013, (Nebraska–Lincoln: 2013).
M. Eder, “Does Size Matter? Authorship Attribution, Small Samples, Big Problem”, in: Literary and Linguistic Computing, (14 November 2013), p. fqt066-, <http://dx.doi.org/10.1093/llc/fqt066>.
W. E. Y. Elliott & R. J. Valenza, “Two Tough Nuts to Crack: Did Shakespeare Write the “Shakespeare” Portions of Sir Thomas More and Edward III? Part I”, in: Literary and Linguistic Computing, 25:1 (14 August 2009), pp. 67–83, <http://llc.oxfordjournals.org/content/25/1/67.abstract>.
C. W. Forstall, S. L. Jacobson & W. J. Scheirer, “Evidence of Intertextuality: Investigating Paul the Deacon’s Angustae Vitae”, in: Literary and Linguistic Computing, 26:3 (30 May 2011), pp. 285–296, <http://llc.oxfordjournals.org/content/26/3/285.abstract>.
Richard S Forsyth, “Stylochronometry with Substrings, or: A Poet Young and Old”, in: Literary and Linguistic Computing, 14:4 (1999), pp. 467–477, <http://dx.doi.org/10.1093/llc/14.4.467>.
R. Hohl Trillini & S. Quassdorf, “A “Key to All Quotations”? A Corpus-Based Parameter Model of Intertextuality”, in: Literary and Linguistic Computing, 25:3 (18 May 2010), pp. 269–286, <http://llc.oxfordjournals.org/content/25/3/269.abstract>.
D. I. Holmes & D. W. Crofts, “The Diary of a Public Man: A Case Study in Traditional and Non-Traditional Authorship Attribution”, in: Literary and Linguistic Computing, 25:2 (14 April 2010), pp. 179–197, <http://llc.oxfordjournals.org/content/25/2/179.abstract>.
Sonia Howell et al., “A Digital Humanities Approach to Narrative Voice in The Secret Scripture: Proposing a New Research Method”, in: Digital Humanities Quarterly, 8:2 (2014).
M. L. Jockers, “Testing Authorship in the Personal Writings of Joseph Smith Using NSC Classification”, in: Literary and Linguistic Computing, 28:3 (26 October 2012), pp. 371–381, <http://llc.oxfordjournals.org.ezproxy.leidenuniv.nl:2048/content/28/3/371.full>.
M. L. Jockers & D. M. Witten, “A Comparative Study of Machine Learning Methods for Authorship Attribution”, in: Literary and Linguistic Computing, 25:2 (12 April 2010), pp. 215–223, <http://llc.oxfordjournals.org/content/25/2/215.abstract>.
D. Li, C. Zhang & K. Liu, “Translation Style and Ideology: A Corpus-Assisted Analysis of Two English Translations of Hongloumeng”, in: Literary and Linguistic Computing, 26:2 (3 March 2011), pp. 153–166, <http://llc.oxfordjournals.org/content/26/2/153.abstract>.
K. Luyckx & W. Daelemans, “The Effect of Author Set Size and Data Size in Authorship Attribution”, in: Literary and Linguistic Computing, 26:1 (16 August 2010), pp. 35–55, <http://llc.oxfordjournals.org/content/26/1/35.abstract>.
K. Mahowald, “A Naive Bayes Classifier for Shakespeare’s Second-Person Pronoun”, in: Literary and Linguistic Computing, 27:1 (10 November 2011), pp. 17–23.
E. C. Papakitsos, “Computerized Scansion of Ancient Greek Hexameter”, in: Literary and Linguistic Computing, 26:1 (17 September 2010), pp. 57–69, <http://llc.oxfordjournals.org/content/26/1/57.abstract>.
L. Pearl & M. Steyvers, “Detecting Authorship Deception: A Supervised Machine Learning Approach Using Author Writeprints”, in: Literary and Linguistic Computing, 27:2 (7 March 2012), pp. 183–196, <http://llc.oxfordjournals.org/content/27/2/183.abstract>.
J. Rybicki & M. Eder, “Deeper Delta across Genres and Languages: Do We Really Need the Most Frequent Words?”, in: Literary and Linguistic Computing, 26:3 (14 July 2011), pp. 315–321, <http://dx.doi.org/10.1093/llc/fqr031>.
J. Rybicki & M. Heydel, “The Stylistics and Stylometry of Collaborative Translation: Woolf’s Night and Day in Polish”, in: Literary and Linguistic Computing, 28:4 (27 May 2013), pp. 708–717, <http://llc.oxfordjournals.org.ezproxy.leidenuniv.nl:2048/content/28/4/708.full>.
G. B. Schaalje et al., “Extended Nearest Shrunken Centroid Classification: A New Method for Open-Set Authorship Attribution of Texts of Varying Sizes”, in: Literary and Linguistic Computing, 26:1 (18 January 2011), pp. 71–88, <http://llc.oxfordjournals.org/content/26/1/71.abstract>.
Martha Nell Smith et al., ““Undiscovered Public Knowledge”: Mining for Patterns of Erotic Language in Emily Dickinson’s Correspondence with Susan Huntington (Gilbert) Dickinson”, in: Digital Humanities, (2006), pp. 252–255.
T. Suzuki et al., “Co-Occurrence-Based Indicators for Authorship Analysis”, in: Literary and Linguistic Computing, 27:2 (15 April 2012), pp. 197–214, <http://llc.oxfordjournals.org/content/27/2/197.abstract>.
Andreas van Cranenburgh & Rens Bod, “A Data-Oriented Model of Literary Language”, (12 January 2017), <http://arxiv.org/abs/1701.03329>.
Q. Wang & D. Li, “Looking for Translator’s Fingerprints: A Corpus-Based Study on Chinese Translations of Ulysses”, in: Literary and Linguistic Computing, 27:1 (3 November 2011), pp. 81–93, <http://llc.oxfordjournals.org/content/27/1/81.abstract>.
W. Wiersma, J. Nerbonne & T. Lauttamus, “Automatically Extracting Typical Syntactic Differences from Corpora”, in: Literary and Linguistic Computing, 26:1 (11 October 2010), pp. 107–124, <http://llc.oxfordjournals.org/content/26/1/107.abstract>.
A. Wilson, “The Regressive Imagery Dictionary: A Test of Its Concurrent Validity in English, German, Latin, and Portuguese”, in: Literary and Linguistic Computing, 26:1 (20 December 2010), pp. 125–135, <http://llc.oxfordjournals.org/content/26/1/125.abstract>.