Archimedes Logo Archimedes Project Digital Library









4. The technology applied in the Archimedes Project

The way in which the Archimedes Project deals with the digitization of historical sources and their analysis is determined by the necessary interplay between technical and scholarly work. This interplay is reflected in the development and use of some key instruments of the project. Indeed, the first phase of the project was, apart from the continuation of data entry, mainly dedicated to the development of these instruments and their implementation in the project's production line.

The production line follows a number of well-defined steps:

  • text selection and the data entry of sources,
  • automatic minimal XML tagging,
  • interactive tagging of formal source structures such as chapter divisions,
  • automatic generation of metadata such as the morphological analysis of words and the establishment of links to an integrated system of sources and metadata, such as dictionaries and bibliographic refernces,
  • interactive creation of scholarly metadata within the content-based access system.

All of these steps require tools which are continuously being refined in tandem with the improvement of the production line. The aim of the project is to make all the tools openly available as soon as they have reached an appropriate state of development.

Three key instruments will briefly be described here:

  • "Digilib," an image server for which pilot installations with some 10.000 images exist on two servers, one in Germany, the other in Switzerland,
  • an image and text display environment for which a test implementation exists on the internal Archimedes server in Berlin,
  • "Arboreal," a working environment for content analysis allowing the production of scholarly metadata--although this is still under development, a release with basic functions can be down-loaded from the Harvard Archimedes server.

In addition to these three key instruments, a content-based access system allowing collaborative scholarly work to be performed on the web and to turn the results of such work into navigation devices for the Archimedes sources is currently under design.

4.1 The image server "Digilib"

"Digilib" is an image server with annotation facilities. It has been developed in close cooperation with the University of Bern where Gerd Grasshoff, a former member of the research group at the Max Planck Institute for the History of Science , now holds the chair for the philosophy of science.


The image server enables scholars to collaborate via the web on a distributed collection of images whose high-speed transmission to a local site is made possible by pre-scaling the images before they are transferred. Apart from consulting a scaled image, the user has the possibility of zooming in on the high resolution image stored on the server and of placing up to eight marks per page which may be used as references for linking commentaries to locations on the image via URLs provided by the image server. The image server furthermore generates on-the-fly thumbnails of the images which assist orientation in lengthy texts.

4.2 The basic internet display environment

While the image server allows sources to be dealt with as soon as digital reproductions are available, that is, in an early phase of the production line, the Archimedes basic display environment makes it possible to combine the display of digital facsimiles with the transcribed Archimedes texts, which in turn are enhanced by language technology. Even in its provisional form it allows for browsing with the help of thumbnails, page images, and text retrieval, and also allows for connection to high-resolution images of a given page.


The language tools, which are based on an implementation and further development of the Perseus technology hinge on the idea of combining the morphological analysis of a word with a link to one or several freely available dictionaries. Following up on earlier work of the Perseus Project on ancient Greek and Latin, and in close collaboration with the Perseus Project, the Archimedes Project is building analogous environments for Italian (which is available in a provisional form), as well as Arabic, Dutch, English, and German (which are under development).

The crucial point of this technology is the choice and the unrestricted availability of appropriate dictionaries. At present, a modern English-Italian dictionary has been made available, as well as two historical dictionaries from the early modern period, one for Italian-English, the other for Latin-English. A major Arabic-English dictionary is currently being digitized.

4.3 The XML tool "Arboreal"

True content-based access requires a tight integration of scholarly work and technology. Based on research in the history of mechanical knowledge at the Max Planck Institute for the History of Science, the structures documenting this knowledge in the historical sources can be systematically identified with the help of formal and linguistic clues in these sources. Among these structures are concepts expressed in technical terminology, mental models shaping, for instance, ideas about how a balance works - typically reflected by aggregations of technical terms - as well as larger chunks of content that are more or less coherently transmitted in history and represented by entire texts or larger parts of texts dealing, for instance, with the so-called "simple machines." The Archimedes Project has therefore been developing a working environment for content analysis which not only helps to identify these structures but also to generate the scholarly metadata, harvesting the results of such an analysis in the form of structured annotations to an XML source text. The idea of the "Arboreal" software, conceived specifically for this purpose, is to create an editor supporting the identification of technical terms, mental models, and chunks of content but which also allows for a comparison between different text editions and translations.

An example may elucidate this idea. One of the key sources in the Archimedes collection is a treatise on mechanics published in 1577 in Latin by Galileo Galilei's patron Guidobaldo del Monte. This text represents the first comprehensive early modern treatise on mechanics synthesizing both contemporary and ancient knowledge. Four years later Guidobaldo published an Italian translation of this text prepared by Pigafetta.


Assume now that the XML file of Guidobaldo del Monte's original Latin version has been loaded into "Arboreal" as a "master text" together with Pigafetta's Italian translation and with a modern English translation as "slave texts," matched to the Latin version by way of the XML structure. Chunks of these three texts automatically appear - rendered via Unicode - in separate windows of Arboreal as soon as they are selected in the main window of the XML master text. In order to facilitate browsing, Arboreal comprises links to figures in the text and also allows page images to be loaded.

For an illustration of the functionalities that "Arboreal" provides for analyzing the text, take the example of a study on the role of Aristotelean natural philosophy in early modern mechanics as is represented by Guidobaldo's text. Did Guidobaldo still make use of the Aristotelean distinction between natural and violent motion? Or does a concept such as that of a "neutral motion" - in some sense a predecessor of the classical notion of inertia - already make its appearance in this central text of pre-classical mechanics?

Using the sophisticated search facilities of Arboreal including morphological forms and regular expressions, candidates for the term "neutral motion" can be easily searched and displayed in a separate window as well as highlighted in the master text window. Working in this way on the terminology for motion, a term list can be created and instances can be attached to the list. Such lists can be saved and reloaded, and even used for seeding the terms to other texts.

The particular list which can easily be generated in this way makes it immediately evident that Guidobaldo still used the traditional Aristotelean distinctions, but also that he introduced the term "neutral motion" which played such a prominent role in the later work of Galileo. Each of the terms in this list is linked to instances in the master text that have been identified as pertaining to the relevant term. The instances already identified can now actually be used to navigate further through the text, thus providing a powerful example of content-based access.