Booktrade Correspondence Project:
Guidelines for the Transcription and Encoding of Primary Sources



1. Transcribe the text of the source as meticulously as possible (see General remarks about the transcription below).
2. Proofread the completed transcription against the source very carefully.
3. Provide editorial annotations (see Editorial annotations below).
4. Create a summary of the text.
5. If the source is not in English, create an English translation.
6. Create a file using the file naming conventions below.
7. Encode the transcription, using the instructions below for the TEI Header and the text of the letters. Note that these contain merely the briefest of abstracts, and only those most germane to the project, of the comprehensive instructions detailed in the fifth edition of the guidelines of the Text Encoding Initiative ("P5"). The following two chapters of the TEI Guidelines are the most directly relevant to the project: 11. Representation of Primary Sources (especially from 11.3 onwards), and 13. Names, Dates, People, and Places.

A. General remarks about the encoded transcription

B. The encoding

NB: by convention, in what follows attribute names are preceded by the @-sign.

The TEI Header

The file description (fileDesc)

In the title element within fileDesc, provide the title of the document in the following form:

[letter, draft letter, form letter (=Dutch 'circulaire'), postcard, picture postcard, prospectus, note, or telegram] from [Sender] to [Recipient], [date in English in the form dd Month yyyy]: a machine readable transcription

Personal names of senders and recipients are written using the following form: Firstnames, or initial(s) with full stop(s) not separated from each other by a space, followed by a space, followed by the surname.

In idno, under publicationStmt, record the ID of the file. Follow the instructions that are given in Appendix A.

The source description (sourceDesc)

The title element within sourceDesc contains a bibliographic description of the document that has been transcribed, in the following form:

[letter, draft letter, form letter (=Dutch 'circulaire'), postcard, picture postcard, prospectus, note, or telegram] from [Sender] to [Recipient], [date in English in the form dd Month yyyy]

This strongly resembles the title element within fileDesc. However, it omits the phrase 'A machine-readable transcription', and it encodes the names of the correspondents using the orgName or the persName element, and the date using a date element with @when attribute with the value "yyyy-mm-dd". Be careful to avoid leading or trailing spaces inside persName and orgName elements: they may pe part of the sentence, but not of the name.

<title>Letter from <persName>K. Fuhri</persName> to <persName>A.C. Kruseman</persName>, <date when="1850-11-18">18 November 1850</date>.</title>

Under idno, fill in the shelfmark of the source. Fill in the library ID (UBA for University Library Amsterdam; UBL for University Library Leiden; KB for Koninklijke Bibliotheek, etc.) followed by the shelf mark. Examples:

<idno type="callNo">UBA Fu 8-25</idno>
<idno type="callNo">UBL Ltk 1795 8, nr. 1</idno>
<idno type="callNo">UBL BOH C17, fol. 231</idno>

The text


Under front, include a short paragraph summing up the contents of the letter in a div type="summary". The summary must be in English.


The opener

The opener contains all information about the recipient (name, address), and the information about when the letter was sent. Note that the recipient's name goes within a <persName> or <orgName> element within an <addrLine> element. The Opener does not include the sender's name and address details (though the sender's placeName is part of the dateline). These go in the closer. The following example is of a letter sent by an Amsterdam bookseller to Sijthoff publishers in Leiden:

<date when="1883-01-19">19 Januari 1883</date>
<addrLine><persName>J. La Bree</persName></addrLine>
<addrLine>c/o <orgName key="ASYT">Sijthoff</orgName></addrLine>
<addrLine>Doezastraat 1, 3 & 5</addrLine>
<salute>Mijne Heeren</salute>

The closer

The closer contains information about the sender (name and address). The name of the sender must be encoded in a signed element. Use address to encode the address of the sender if present in the source.

If a closing salute consists of two parts, use two salute elements. Examples:

<salute>Na vriendelijke groeten</salute>
<salute>Uw vriend</salute>

Abbreviated honorifics such as "ZEd" "Zed.", "Ued", "Yrs" are expanded using the abbr and expand elements (see below).

If the source is a preprinted letterhead, the text printed on the letterhead is indicated by means of {} surrounding the text in the appropriate elements. Example:

<placeName>{'s Gravenhage}</placeName>
<date when="1848-01-16">16 Januarij {184}8</date>

The name and address of the sender may or may not be present in the form of a preprinted letterhead. In the following example the letterhead only contains the address, but not the name:

<signed>R. de Tracy Gould</signed>
<address><addrLine>{4, Garden Court,}</addrLine>
<addrLine>{London. E.C.}</addrLine>

If a letter contains a postscriptum, it is encoded using a seg element. This element takes a @type attribute, with the value "postscript"

<salute>your obedient servants</salute>
<signed>de Erven F. Bohn</signed>
<seg type="postscript">A sending by post of the sheets immediately after been printed would be much better to <lb/> us. </seg>

Page breaks and line breaks

Line and page breaks in the source document are indicated using the (empty) <lb/> and <pb/> elements. They are preceded by a space, except where a space would not normally occur if the <lb/> was not present, e.g. in case of a hyphenated word. Note that no <lb/> is used immediately beforethe closing </p>, </addrLine>, </salute> tags, etc., as these already indicate that a line ends there.

Book titles and personal names

The titles of all books, journals or articles must be encoded using the <title> element.

All personal names must be encoded using the <persName> element. Names of organisations must be encoded using the <orgName> element.

Omitted and supplied text

When it is impossible to transcribe a certain section of the text as a result of illegible handwriting or damaged text use the <gap> element. This element must remain empty. The @reason attribute specifies the reason for the omission., e.g. "illegible", or "cancelled". The @extent attribute gives an indication of the number of words or characters that are omitted.

Dear sirs, in reaction to your quote of <gap reason="ink stain" extent="1 word"/> we are forced to ...

If new text has been supplied for any reason, use the supplied element. For example, if the text that is transcribed is a copy of a letter that would have been sent out on a sheet of preprinted letterhead, the preprinted information contained in the letterhead is absent from the copy. In such a case some of that information supplied element, as follows:

<placeName><supplied reason="on preprinted letterhead missing from copybook">Haarlem</supplied></placeName>
<date when="1848-01-16">16 Januarij <supplied reason="on preprinted letterhead missing from copybook">184</supplied>8</date>

Unclear text

Use unclear to indicate a section which is difficult to read in the source. The @reason attribute indicates why the source is difficult to transcribe. For instance, the text may be damaged, the ink may be faded, or the author may have used a very unclear handwriting. Example:

Tell <unclear reason="bad handwriting" cert="medium">Harmer</unclear> that he can come tomorrow.

Always use a @reason attribute for unclear, gap and supplied. In addition, unclear and supplied take a @cert attribute to indicate the encoder's degree of certainty that the text given is correct (possible degrees of certainty in attribute value: "high", "medium", "low", "unknown"). The gap takes an @extent attribute (value: any string of letters and numbers).

Graphically distinct text

Underscores and italics used for stress or emphasis for linguistic or rhetorical effect are rendered through the emph element, in combination with the @rend attribute. For example:

... of which he understands <emph rend="underlined">nothing</emph>

The following values can be used with the @rend attribute: "bold", "italic", "bold-italic", "underlined", "double-underlined".

If in the MS a book title is enclosed in quotation marks or underlined, this is not indicated in the encoding. That it is a book title is indicated solely by the <title> tag, which replaces any typographic indication in the source.

Correction, abbreviation, regularisation

Book titles are often shortened or otherwise referred to rather impressionistically. In such cases the written title is given in an orig element and the full title is given in a reg element. These two elements are together enclosed in turn within a choice element. Example:

<orig>Haafner's reize</orig>
<reg>J. Haafner's reize te voet door Ceylon</reg>

Abbreviations are expanded as follows:

Your <choice><abbr>obdt.</abbr><expan>obedient</expan><choice> servant

Uw <choice><abbr>dw.</abbr><expan>dienstwillige</expan></choice> dienaar

Misspellings (which occur frequently in letters not written by native speakers, as in the letters by Dutch publishers with their foreign correspondents) are corrected using the choice, sic and corr elements:

It would be very <choice><sic>agreable</sic><corr>agreeable</corr></choice> to us ...

Unusual spellings that were current at the time are maintained without adding <sic>.


Figures (e.g. money) are always given with a full stop as a decimal marker. Sums of money may be preceded by a currency marker. E.g. in Dutch guilders the guilder sign (ƒ, represented as the entity &#x192;); in English pounds the pound sign (£, represented as the entity &#xA3;).

Foreign words

Use <foreign> to identify a word or phrase as belonging to some language other than that of the surrounding text. Identify the language used by means of the codes of ISO 639-2 (note that 3-letter codes are in lower case; 2-letter codes are in capitals). E.g.:

Wij hebben inmiddels de <foreign xml:lang="eng" >proof sheets</foreign> uit Londen ontvangen


Dates are standardised with the @when attribute, using the ISO 8601 standard, which represents the date in the form YYYY-MM-DD. In the event of uncertainty, digits can be left off from the end. For example, if a letter is known to date from 1854, while the day and month are unknown, the date would be given as "1854".

<date when="1848-01-16">16 Januarij {184}8</date>
<date when="1864-08-06">6 Aug. '64</date>


Names encountered in the text should always be marked up according to their type. The available elements are persName (for persons), placeName (for geographical names) and orgName (for organisations)

<persName>K. Fuhri</persName>

Editorial annotations

Editorial annotations can be used to provide more contextual information on the text that is transcribed. When deciding what information to annotate, think of what is typically annotated in a printed edition of correspondence. For example, identify bibliographic details of publications mentioned (author, full title, place of publication, publisher, and year), and identify people and organisations.

For your research, use the Leiden library OPAC (and other OPACs depending on the nature of the bibliographic item), Google, Wikipedia, biographical dictionaries and other reference works intelligently. For example: in BOH C107 fol. 1. the author (J.G.C. Schlenker) asks De Erven F. Bohn:

would you be so kind as to send the us the 125 copies of the Tijdschrift, the ones for the exchange journals, to Professor Van Haren Noman [...]

The search for 'tijdschrift bohn' in the Leiden OPAC yields 79 results, but Wikipedia has an article on Van Haren Noman, which mentions Tijdschrift two times: 'Tijdschrift der Nederlandse dierkundige vereniging', and 'Ned. Tijds. v. Geneesk.'. The query 'tijdschrift bohn dierkundige' yields no result in the Leiden OPAC, but 'tijdschrift bohn geneeskunde' does. A Google search for 'tijdschrift geneeskunde bohn "Haren Noman"' yields a number of interesting results, among which this interesting article (in Dutch). So the Tijdschrift in question must be the Nederlands tijdschrift voor geneeskunde.

Include the note element directly after the text the note applies to. The text of the annotation must be give in a p element. A note may contain various paragraphs.

<p> ... the calendering <note><p>Calendering, or satinizing, is a process of treating paper.</p></note>
is carried out by the foreman ...</p>

In the case of letters written in Dutch, French or German, an English translation is given in the first note. Such a note is given a @type attribute with the value "translation". Include such a note at the beginning of the text, directly after the first <p> opening element.

<p><note type="translation"><p>In response to your postcard…</p></note>En rèponse à votre carte postale…</p>

Appendix A: File names and IDs

Each letter that is transcribed for the Booktrade Corespondence Project must be given an ID. This ID is created as follows. Choose the initials and/or first letters of the sender's company name or surname as a prefix, followed by the letter's date in the form YYYYMMDD. A list of existing IDs for can be found through the links below (note that the lists are not sorted alphabetically):

The ID must also be used in the file name. Examples:

Sender Date ID File name
William Blackwood 7 Sept. 1892 WBLA18920907 WBLA18920907.xml
Erven F. Bohn 25 maart '85 EFBO18850325 EFBO18850325.xml
K. Fuhri 28 Januari [186]4 KFUH18640128 KFUH18640128.xml
A.C. Kruseman 31 mei '58 ACKR18580531 ACKR18580531.xml
Lord Lytton 6th Nov. [1876] LYTT18761106 LYTT18761106.xml
Richard Bentley and Son 23rd October 1888 RBEN18881023 RBEN18881023.xml
Ouida July 8th, '84 OUID18840708 OUID18840708.xml

When a correspondent has sent more than one letter on he same day, use letters to distinguish the files. Examples: EFBO18760613a.xml and EFBO18760613b.xml were written by De Erven F. Bohn on the same day.


Written by: Adriaan van der Weel, Andrew Stevens and Peter Verhaar.
Last updated: 01 October 2014 (FP).