Tokenisation

Exercise 1
The list below contains the titles of the books which were in the New York Times list of bestsellers on 7 January, 2018.

 

bestsellers = [ "Origin" , "The Rooster Bar" , "The Sun and her Flowers" , "The People vs Alex Cross" , "Milk and Honey" , "Darker" , "The Midnight Line" , "Artemis" , "Year One" , "Ready Player One" ]

Write a program which can count the number of books in this list.

Also print the title of the book which is on the number one position.

Finally, add the following five titles to this list, and print the full list in alphabetical order.
Little Fires Everywhere
End Game
Tom Clancy Power and Empire
Sleeping Beauties
Hardcore Twenty-Four

 

Exercise 2
Using the list that you have created for exercise 1, print a list of the five bestselling books in the first week of 2018. Use the keyword ‘while’ to create this list.

 

Exercise 3
Use the code below to create a dictionary that contains information about European countries and their capitals.

 eu = dict()
 eu["Italy"] = "Rome"
 eu["Luxembourg"] = "Luxembourg"
 eu["Belgium"] = "Brussels"
 eu["Denmark"] = "Copenhagen"
 eu["Finland"] = "Helsinki"
 eu["France"] = "Paris"
 eu["Slovakia"] = "Bratislava"
 eu["Slovenia"] = "Ljubljana"
 eu["Germany"] = "Berlin"
 eu["Greece"] = "Athens"
 eu["Ireland"] = "Dublin"
 eu["Netherlands"] ="Amsterdam"
 eu["Portugal"] = "Lisbon"
 eu["Spain"] = "Madrid"
 eu["Sweden"] = "Stockholm"
 eu["United Kingdom"] = "London"
 eu["Cyprus"] = "Nicosia"
 eu["Lithuania"] = "Vilnius"
 eu["Czech Republic"] = "Prague"
 eu["Estonia"] = "Tallin"
 eu["Hungary"] = "Budapest"
 eu["Latvia"] = "Riga"
 eu["Malta"] = "Valetta"
 eu["Austria"] = "Vienna"
 eu["Poland"] = "Warsaw"
 eu["Croatia"] = "Zagreb"
 eu["Romania"] = "Bucharest"
 eu["Bulgaria"] = "Sofia"

Print a sentence which can gives information about the current number of countries in the EU.

Secondly, print the sentence “the capital of X is Y” for each item in the dictionary. The sentences should be sorted alphabetically by the names of the countries.

 

Jupyter Notebook containing the answers to exercises 1 to 3 can be downloaded here

 

Exercise 4
Create a Python application which can read the text file “prufrock.txt” ( (right-click (Windows) or use CTRL + Click (Mac) to download the file). It is a full text version of the “The Love Song of J. Alfred Prufrock” by T.S. Eliot. The file is also provided in the DTDP file repository).

Calculate the total number of lines in this poem; but ignore all the empty lines (those lines in which the number of characters is zero).

 

Exercise 5
Calculate the average number of characters per line in the “The Love Song of J. Alfred Prufrock” by T.S. Eliot  (N.B. this is the total number of characters divided by the total number of lines).

 

 

Exercise 6
Download the poem “A Coat” by W.B. Yeats. What is the total number of words in this poem?

Additionally, create a frequency list of the words used in this poem. Print a list which shows the 10 most frequents words only.

 

Exercise 7
Create a variable representing the opening paragraph of Charles Dickens’ novel A Tale of Two Cities:

“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair”

Calculate the number of occurrences of the word “of”.  To test whether two strings are equal you can you the following code:

 

word = 'Charles'
test = 'Dickens'

if word == test:
 print("The string variables have the same value.")
 else:
 print("These are different strings")

Jupyter Notebook containing these exercises can be downloaded here.
Answers to the exercises shall be posted shortly.