Week 1. 6 February 2018
- Course objectives
- What is Text and Data Mining?
- Introduction to the Python Programming language
- Getting started
- Python basics:
- working with the command line
- String functions
Week 2. 13 February 2018
- Submit the solution to Coding Challenge 1. Send by mail to firstname.lastname@example.org before Monday 12 February, 18:00. N.B. Do NOT spend more than two hours on this exercise!!
- Read Part 1 of the tutorial on Python Basics, and part 2 on Tokenisation
- Write about your experiences or in learning about these topics or about anything which may be unclear in a comment on the discussion forum.
- Create a text corpus for your individual research project. Texts may be taken from existing research corpora. Your own corpus should consist of at least ten texts, of 5000 words or more. The texts need to be saved in the .txt (or plain text) format.
- Martin Mueller, ‘Digital Shakespeare, or towards a literary informatics’, in: Shakespeare, 4, 3, September 2008, pp. 284–301. URL
- Types and tokens
- Authorship recognition
- Reading a file
- Frequency counts
Week 3. 20 February 2018
- Read part 3 on Regular Expressions
- Franco Moretti, ‘The Slaughterhouse of Literature’, in: Modern Language Quarterly, 61, 1, 2000, pp. 207–227. URL
- Read Shawna Ross, “In Praise of Overstating the Case: A review of Franco Moretti, Distant Reading (London: Verso, 2013)”, in: Digital Humanities Quarterly, 008, 1, 2014. URL
- Tokenisation and frequency lists
- Source Criticism
- Regular expressions: HTML or download Jupyter Notebook
- Tokenisation: HTML or download Jupyter Notebook
- Franco Moretti, ‘Conjectures on World Literature’, in: New Left Review, 1, 2000. URL
- Read Kathryn Schultz, What is Distant Reading?, in New York Times, June 24, 2011. URL
- Jupyter Notebook Tutorial: The Definitive Guide on dataCamp
Week 4. 27 February 2018
- Begin to define your research project by formulating an initial research question. You may use this list of possible topics as inspiration.
- Submit the solution to Coding Challenge 2. Send by mail to email@example.com before Monday 26 February, 18:00.
- Natural Language Processing
- Working with stopwords
- Distribution graphs
- Type-token ratios
Week 5. 6 March 2018
- Introduction to the R statistical package
- Variables and data structures in R
- Data visualisation
- Introduction to GGPlot
Week 6. 13 March 2018
- Submit the solution to Coding Challenge 4. Send by mail to firstname.lastname@example.org before Monday 6 March, 18:00.
- Natural Language Processing
- Write a brief text (max. 500 words) about your individual research project. Answer the following questions: (1) Which texts have you selected for your corpus? (2) Which research question do you intend to answer? (3) Which types of analyses will be most useful for your research question?
- The text about the final assignment for this course suggests a number of topics that you can focus on in the theoretical section of your essay. If you want to focus on a topic which is not listed, give a brief explanation of the question that you want to answer instead.
- Send by mail to email@example.com before Monday 13 March, 18:00.
Week 7. 27 March 2018
- Submit the solution to Coding Challenge 5. Send by mail to firstname.lastname@example.org before Monday 20 March, 18:00.
- Read the full R Tutorial
- Read Stephen Ramsay & Geoffrey Rockwell, “Developing Things: Notes Towards an Epistemology of Building in the Digital Humanities”, in: Matthew K. Gold (ed.), Debates in the Digital Humanities, (Minneapolis: University of Minnesota Press 2012), pp. 75–84.
- Building tools as a theoretical activity
- Sentence segmentation and readability metrics
- Topic Modelling
- Principal Component Analysis
Week 8. 3 April 2016
- Requirements for final essay
- Other topics in Text and Data Mining (e.g. network analysis, mapping in R)
- Demonstration of TDM analyses on the basis of a case study
The final essay for DTDP needs to be submitted before Friday 28 April 2017, 18:00. Send by mail to email@example.com
The essay needs to consists of two sections:
- A description of your individual research project (2000 words)
- A critical reflection on digital humanities research (2000 words). you may choose a topic from the list that is provided, but you are also free to focus on another topic.