]>
This eBook is for the use of anyone anywhere at no cost and
with almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License online at www.gutenberg.org/license This document does not represent any official PG standard. It
describes the The Gnutenberg Press will convert any TEI conformant text, but texts
marked up according to this guide will look better and contain all the
necessary headers and footers for posting on PG. To Basically you say: this (point in the text) is the start of a
paragraph; this is the end of a paragraph; this is the start of a
chapter; this the end, etc. In TEI-speak a text component is referred to as
To mark a text region as element you have to insert an
Here's an example of how you would mark up a paragraph: Don't worry about the line breaks, the text will get
reformatted anyway. The formatter knows where a paragraph ends by
the Let's do some more markup. In TEI the In TEI the Every opening tag needs a corresponding closing tag. Opening
and closing tags must always nest like parentheses in a
mathematical equation. This is right: and this is wrong: Most elements can take In TEI the The attribute name must be followed by an = and the attribute
value must be put in quotes. An element can have zero or more
attributes but every attribute must have a different name. In TEI you can specify characters you don't have on your
keyboard with In TEI the entity Entities start with an ampersand (&) and end with a
semicolon (;). You can find a list of supported TEI entities in
Chapter 18. You can and should mark up a text incrementally. That is:
make more than one pass over the whole text and in each pass mark
up a subset of elements. You may start marking only the most prominent text features like
chapters and paragraphs. Later you make a second pass marking all
italicized text. If you still want to do more, make another pass
replacing all quotation marks with the TODO: a PG working group needs to codify different
Most probably you will start with a TEI text automatically generated
by a some program from the plain vanilla etext. Your task will then be
to proof the tags inserted by the program. If you cannot state with confidence the reason why a text passage is
highlighted, use the generic If you encounter a passage in a foreign language unknown to you just
use the bare You can insert comments any place you want. These will stay in
the TEI text but not show up in the formatted output. By using
the word: FIXME you can mark positions that require further
inspection. A comment starts with Later it will be easy to search for all FIXME in the text
and fix them. One of the advantages of XML is that a program can check the markup
for you. To do this you need a validator and the You can get XML validators from here: And here is the
For all of you who don't want to install a validator on your own PC
there is an
As primary source of information refer to A still smaller subset of TEI is described in:
The complete TEI markup language The homepage of the Language Codes: The rest of this guide explains the implementation details and
limitations of the pg-press system and shows more examples.
Numbered headers refer to the corresponding section in the These are examples for the official header and footer in a PGTEI
text. The This eBook is for the use of anyone anywhere at no cost and
with almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License online at www.gutenberg.org/license And this is the footer: Composite texts are not supported. Unsupported Unsupported. On a block element the attribute This block is left-adjusted. Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas
sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores
eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est,
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore
magnam aliquam quaerat voluptatem. This block is centered. Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas
sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores
eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est,
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore
magnam aliquam quaerat voluptatem. This block is right-adjusted. Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas
sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores
eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est,
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore
magnam aliquam quaerat voluptatem. This block is left- and right-justified. Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas
sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores
eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est,
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore
magnam aliquam quaerat voluptatem. This entity is rendered as a block and has wider margins. Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas
sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores
eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est,
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore
magnam aliquam quaerat voluptatem. Use for examples of code. This block is rendered in a
monospaced font. Line breaks are preserved. This block gets indented by n em-spaces. n may be negative. Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas
sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores
eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est,
qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore
magnam aliquam quaerat voluptatem. This element starts a new page.
... This element starts a new right-hand page.
... Floats the division to the left or right margin (HTML mode)
or to the top or bottom of the page or to a special page (PDF mode).
Valid value is a string composed of one or more of the option letters:
The division is floated to the left margin.
HTML mode only. The division is floated to the right margin.
HTML mode only. The floated division may stay here if there is enough room
left on this page. PDF mode only. The division may float to the top of the current page, if
there is enough room for both, it and the previous text. If this
is not the case, it is added at the top of the next page. The
subsequent text continueson the current page. PDF mode only. The division may float to the bottom of the current page.
The subsequent text continues until the room left on the current
page is just enough for the float. If there is already
insufficient room, the float will be put at the bottom of the
next page. PDF mode only. The division may float to a special page containig only
floats. PDF mode only. The picture in the next example will float to the left margin
in HTML mode. In PDF mode it will appear at this point in
the text if there is enough room left on the page, else it will
float to the top of the next page.
You may also use one of the following shortcuts:
Shortcut for Shortcut for Shortcut for Shortcut for Shortcut for Attribute This paragraph will not have its first line indented. If you try to mark up an embedded letter (piece of correspondence)
you'll be surprised to find that the simple approach doesn't validate.
Use this approach instead:
His book on animals and plants ...
...
Oh, bless you, it doesn't matter in the least.
If the man is caught, it will be
]]><!-- and ends with
-->.Oh, bless you, it doesn't matter in the least.
If the man is caught, it will be
]]>Un sot trouve toujours un plus sot qui
l'admire.
All Block Elements
:
His book on animals and plants ...
...
Use
In the year 1878 I took my degree of Doctor ...
The campaign brought honours and promotion to many, ...
...We met next day as he had arranged, and inspected the rooms at No. 221b, Baker Street, of which he had spoken at our meeting. ...
... Attribute
Example:
Will be rendered as:
Will be rendered as:
This tag has a different semantic than in TEI: without an
You can generate thought-breaks if you set the
These are the supported values for the generates a thought-break consisting of n stars (asterisks). generates a horizontal rule that is n % the width of the text.
This are the supported values for the
for text in italics
for bold text
for underlined text
for text in Small Capitals
for superscript text
for subscript text
for expanded text
for strikeout text
for smaller text
for small text
for large text
for larger text
for tty-type text
where x is a font family name:
Times New Roman,
Courier or
Zapf Chancery.
Note that display depends also on the fonts actually
available on the user's machine.
where x is a percentage value:
50%
75%
100%
150%
200%
where x is a value between 100 and 900:
400
700
900.
Note that display depends also on the fonts actually
available on the user's machine.
The default rendering for italic.
Attribute
This quote is rendered as a displayed paragraph.
This quote has the opening mark only.
This quote has only the closing mark.
This quote has no quotation marks.
The first thing that put us out was that advertisement. Spaulding, he came down into the office just this day eight weeks, with this very paper in his hand, and he says:
I wish to the Lord, Mr. Wilson, that I was a
red-headed man.
Will be rendered as:
The first thing that put us out was that
advertisement. Spaulding, he came down into the office just this
day eight weeks, with this very paper in his hand, and he
says:
I wish to the Lord, Mr. Wilson, that I was a
red-headed man.
This note ref should not display in the toc.
The
The
The note text should always be enclosed in paragrafs.
Hannibal, Missouri.
Will be rendered as:
When I was a boy, there was but one permanent ambition among
my comrades in our village Hannibal, Missouri.
The handling of footnotes depends on the output format: if the format has facilities for pagination (PDF) the footnote appears at the bottom of the current page. If the format has no such facilities (HTML, TXT, PDB) the footnote appears at the end of the text. In HTML the footnote marker will be linked to the footnote text.
The endnote is less intrusive than the footnote and you should use it for any notes you add to the text yourself.
In the PDF format the endnotes get listed in the back matter with
the page number. Because the user can only see the page number and not
the exact position the note is attached, you should insert a
short
Will be rendered as:
Today about three o'clock the proofs of this paper arrived
from the printers. The exercise consists of half a chapter of
Thucydides
Note: links work only in the HTML and PDF formats.
Use these for internal links.
Attribute
Wouldn't you like to know? ...
...Use these for external links.
Attribute
New attribute
Attribute
Always put one or more paragraphs (
Attribute Use this to give the table rules around every cell. PDF output only. Use to give &tex; hints about the table columns. The table is
implemented using the La&tex; longtable environment.
TXT output only. Use to give nroff hints about the table columns. The table is
implemented using the
Attribute
Only PNG and JPEG formats are supported at present.
Attribute
New attribute
Attribute rend
Use for examples. This is a block element, with line breaks preserved. In HTML it is also rendered as a shaded box.
Will be rendered:
Attribute In PDF output mode this will pipe the contents of the
In HTML output mode the contents of the In all other output modes it will be ignored. In HTML output mode the contents of the In all other output modes it will be ignored. In HTML and PDF output modes the SVG contents of the In all other output modes it will be ignored.
Example:
Will display as:
This is an inlined formula:
Example:
Note the use of a CDATA section to avoid having to replace all &s with
<![CDATA[
and ends with ]]>.
Will display as:
An embedded SVG image.
Will display as:
Attribute
Attribute
Generates a standard title page
from the
Generates a colophon from
the Credits
.
Generates a table of contents from
Contents
.
Generates a standard PG header appropriate for the output format.
Generates a standard PG footer appropriate for the output format.
Generates a footnotes
section. This section is automatically populated with the contents of
the Notes
.
Here is an example for the front matter:
And this is an example for the back matter:
Attributes
Attribute
The table of contents.
The bookmarks section of a PDF file.
Note that no special characters (like
The bookmarks section of a PDB file. Note that PDB can accomodate a maximum of 15 characters per bookmark. Strings exceeding this length will be truncated.
Element
You should use a unicode-capable editor to edit your files and save them in utf-8 encoding. If you cannot do that, you'll have to choose a different encoding and enter all characters your encoding cannot handle with XML entities. To do that, you'll have to find out the unicode code point of the character first.
You
If you use the ISO-8859-1 encoding to save your TEI file, you will not be able to enter these characters directly. You can still get them if you write:
If you are using a UNICODE-capable editor, you can just enter the characters directly.
I pity the man who can travel from Dan to Beersheba, and say'Tis all barren;and so is all the world to him who will not cultivate the fruits it offers.
An epigraph contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page. An epigraph is rendered in smaller type and right adjusted.
Monte Video — Maldonado — Excursion to R Polanco — Lazo and Bolas — Partridges — Absence of Trees — Deer — Capybara, or River Hog — Tucutuco — Molothrus, cuckoo-like habits — Tyrant Flycatcher — Mocking-bird — Carrion Hawks — Tubes formed by Lightning — House struck
A formal list or prose description of the topics addressed by a subdivision of a text.
Experimental
Partially supported through the
You should not try to build a conformant header by yourself (unless you are smarter than I am) but just copy the provided header template and modify the appropriate entries.
Used to insert conditional text.
Attribute
Test if the text requires a footnote section.
Only paginated output formats like PDF can place the footnotes at the foot of the page. Other formats like HTML don't know pages at all, so we have to place the footnotes at the end of the whole text. (PDF too can have endnotes — notes that appear at the end of the book instead of at the foot of the page.)
This example creates a
Attribute
Test if the output format is HTML.
Test if the output format is &tex;. &tex; is presently used for PDF generation.
Test if the output format is NROFF. NROFF is presently used for TXT and PDB generation.
If you use this feature your text will need revision to accomodate any change in the TEI processing system. For instance, it is not guaranteed that PDF output will always be generated by &tex; nor that TXT will always go through NROFF.
Will be rendered as (if you are viewing the PDF file you will see
true mirrored text Technical information: You may wonder why we don't use the
convert formula to image feature here to generate the
reflected text in HTML. Actually \reflectbox is a command of the
pdflatex driver. To convert formulas into images we use the dvips
driver because of its higher output quality.
The
This is a diagram showing how the conversion is done.
The XSLT stylesheets do the bulk of the work. The Perl script calls XSLT at the right moments and fixes up things that are just too difficult to get right with XSLT, like the correct placement of newlines, which is crucial to &tex; and nroff.
nroff is called twice with slight differing parameters: with the latin1 device and line breaking on for TXT, with a custom PDB device and line breaking off for PDB. The PDB device is customized towards the special Palm-OS character set.
The Gnutenberg Press is released under the
You may
To use the Gnutenberg Press you need these tools:
Get libxml2 and libxslt from the The Pathologically Eclectic Rubbish Lister by Larry
Wall in a version >= 5.8.0. Get Perl from the The typesetting system invented by Donald Knuth. Get &tex;
from the Get You need a patched version of
If you are running a fairly recent Linux distribution you
should already have got most of them. If you are on Windows
you'll have to sweat some to get them all, but, if you run
Windows, you
If you have non-iso-8859-1 characters in the headings, the pdf
conversion will choke. You'll have to use the
In this example the pdf converter would choke on the mdash character in the heading. Thus you have to provide an alternate heading for the pdf bookmark section.