Behind the Curtains of DDI, Why Not a Relational Metadatabase?

TitleBehind the Curtains of DDI, Why Not a Relational Metadatabase?
Publication TypePresentation
Year of Publication2001
AuthorsHadorn R
Abstract

The initiative taken in 1995, to develop a new format for metadata, was an important one and the scheme we work with at present, in its XML tenure, very promising with the development of so many new concepts around XML. The co-operative work at the origin of that format lets it appear as the upcoming standard. Nevertheless, there are every day more data producers looking for the metadata model they are lacking, so the discussion on what metadata structure is appropriate for given data and specific uses is far from coming to an end.

The same can be said about the language in which the metadata structure is expressed. In fact, the XML DDI defines a metadata structure that could have been expressed in other languages, for example a data scheme made of tables and relations between tables. This would have been as general a stance as an SGML or XML DTD. The basic format is as simple and general as XML files, in as far as you can convert tables to tabbed text files whenever you want. The advantage of a relational DB structure is that it can be managed with a lot of RDBMS and their advanced tools. RDBMS are management tools that make it easy to fulfil any kind of control on the coherence of the data. In addition, The relational approach encourages making explicit the possible interrelations between datasets, where the SGML/SML approach starts with individual documents. The advantage we can see now in the DDI format, thanks to the creation of XML and the conversion from SGML to XML, is in communication: an XML DTD is a good vector to carry structured information from one application to another. At present time, we miss the general management instruments that would make it a good preservation and management format.

The DDI is a model for metadata structure in the first place; that this structure takes the form of an XML DTD is secondary. In as far as the potentials of a relational and of an XML structure are complementary, we could just need both of them. We do not yet know what XML can do and are somewhat stuck between glamorous promises and very few concrete applications. So the question as to the best way to keep metadata for long term preservation and management is still legitimate. The SGML/XML DDI was planned as an exchange format; the development of the various XML facets suggests that one day, we will be able to do much more out of it; but for now, it could be just as wise to manage the metadata in some other kind of repository.

The Data Documentation Initiative has made the choice of SGML/XML, all right. When it comes to the production of these files, the procedures are very different from one archive to the other, depending on the tools already available and the prospects. Some use dedicated XML editors and keep metadata in this form. NSD calls on the NSDStat format for several operations and others probably use some kind of database. Unfortunately, the present competition for the best on line data access has the effect to push the elaboration on these alternatives into the archive's back-kitchen, not to say their cellar, as if the DDI was expected to solve all our documentation problems.

These questions converge with another preccupation. Our first problem as data archivists is to look for structured and detailed metadata to put into the DDI. Wherefrom will we get them? In most cases, we are happy when some kind of codebook is deposited with the data, encompassing question wordings and frequency distributions. The format allows rarely doing more than to browse it on screen or print it out. Wouldn't it be nice if the researchers had a tool that would invite them to structure their metadata in a way that would make possible other uses? Could not a relation DB with nice interfaces do it?

At SIDOS we consider that a relational database gives AT THE MOMENT the best flexibility and the most extended  opportunities

  • to manage the metadata in an economical way
  • to program interfaces for metadata capture
  • to program interfaces for metadata edition
  • to program  various output formats for different uses of the metadata (including XML files for Nesstar)
  • to update changing information or correcting errors
  • to offer the researchers themselves a tool for metadata management
  • to do all this in a container which allows for various relations among datasets.

So SIDOS started to develop a relational database for variable level metadata, as a complement to its existing database that holds research project and dataset descriptions for the catalogue. We want it

  • to encourage metadata creation during research,
  • to facilitate the publication of a coherent set of metadata by the researcher him/herself
  • to make the transmission of structured metadata to the archive at time of deposit managable,
  • to serve as a tool to edit metadata for XML DDI production
  • to feed some kind of on line question database.

The presentation will show the position of the database in the production process of SIDOS and highlight some of its features. Examples will be taken from the documentation of the newly created Swiss Household Panel and from a comparative project of the ICRC, People on War.

URLhttp://iassist.dans.knaw.nl/us/ia2001/list/hadorn.htm