Week 1. 31 January 2017
- Course objectives
- What is Text and Data Mining?
- Introduction to the PERL Programming language
- Getting started
- Perl basics:
- Printing texts
- Making calculations
- Reading a text file and writing to a text file
Week 2. 7 February 2017
- Submit the solution to Coding Challenge 1. Send by mail to email@example.com before Monday 6 February, 18:00.
- Read the tutorial on Perl Basics, p. 1-7 (including the section about regular expressions).
- Create a text corpus for your individual research project. Texts may be taken from existing research corpora. Your own corpus should consist of at least ten texts, of 5000 words or more. The texts need to be saved in the .txt (or plain text) format.
- Read Martin Mueller, ‘Digital Shakespeare, or towards a literary informatics’, in: Shakespeare, 4, 3, September 2008, pp. 284–301. URL
- Regular Expressions
- Regular expressions
- Iteration and selection
Week 3. 15 February 2017
This class will take place on Wednesday 15 February, from 10:00 to 13:00, in Lipsius 126-A. Apologies for any inconvenience.
- Read the full tutorial on Perl Basics. Present any questions that you have about this text in class.
- Submit the solution to Coding Challenge 2. Send by mail to firstname.lastname@example.org before Monday 13 February, 18:00.
- Read Kathryn Schultz, What is Distant Reading?, in New York Times, June 24, 2011. URL
- Read Shawna Ross, “In Praise of Overstating the Case: A review of Franco Moretti, Distant Reading (London: Verso, 2013)”, in: Digital Humanities Quarterly, 008, 1, 2014. URL
- Tokenisation and frequency lists
- Copyright issues connected to research based on Text Mining
- Source Criticism
- Arrays and hashes
- Exercises on tokenisation and frequency lists
- Survey of existing text analysis tools
- Franco Moretti, ‘Conjectures on World Literature’, in: New Left Review, 1, 2000. URL
- Franco Moretti, ‘The Slaughterhouse of Literature’, in: Modern Language Quarterly, 61, 1, 2000, pp. 207–227. URL
All students who have questions about the topics that have been explained during the first three weeks in DTDP are very welcome to attend a remedial class which has been scheduled on Monday 20 February, at 13:30. The location is Eyckhof 1 / 003A. No new materials will be explained during this class!
Week 4. 21 February 2016
- Begin to define your research project by formulating an initial research question. You may use this list of possible topics as inspiration.
- Submit the solution to Coding Challenge 3. Send by mail to email@example.com before Monday 20 February, 18:00.
- Working with stopwords
- Distribution graphs
- Type-token ratios
Week 5. 28 February
- Introduction to the R statistical package
- Variables and data structures in R
- Data visualisation
- Introduction to GGPlot
Week 6. 7 March 2016
- Submit the solution to Coding Challenge 4. Send by mail to firstname.lastname@example.org before Monday 6 March, 18:00.
- Natural Language Processing
- Write a brief text (max. 500 words) about your individual research project. Answer the following questions: (1) Which texts have you selected for your corpus? (2) Which research question do you intend to answer? (3) Which types of analyses will be most useful for your research question?
- The text about the final assignment for this course suggests a number of topics that you can focus on in the theoretical section of your essay. If you want to focus on a topic which is not listed, give a brief explanation of the question that you want to answer instead.
- Send by mail to email@example.com before Monday 13 March, 18:00.
Week 7. 21 March 2016
- Submit the solution to Coding Challenge 5. Send by mail to firstname.lastname@example.org before Monday 20 March, 18:00.
- Read the full R Tutorial
- Read Stephen Ramsay & Geoffrey Rockwell, “Developing Things: Notes Towards an Epistemology of Building in the Digital Humanities”, in: Matthew K. Gold (ed.), Debates in the Digital Humanities, (Minneapolis: University of Minnesota Press 2012), pp. 75–84.
- Building tools as a theoretical activity
- Sentence segmentation and readability metrics
- Topic Modelling
- Principal Component Analysis
Week 8. 28 March 2016
- Requirements for final essay
- Other topics in Text and Data Mining (e.g. network analysis, mapping in R)
- Demonstration of TDM analyses on the basis of a case study
The final essay for DTDP needs to be submitted before Friday 28 April 2017, 18:00. Send by mail to email@example.com
The essay needs to consists of two sections:
- A description of your individual research project (2000 words)
- A critical reflection on digital humanities research (2000 words). you may choose a topic from the list that is provided, but you are also free to focus on another topic.