DDI Alliance, 2020-04-12
This document provides an overview of what the work products of the DDI Alliance are, what purpose they serve, and how they are maintained by the Alliance.
The overarching aim of the DDI Alliance is to “create an international standard for describing data from the social, behavioral, and economic sciences”. Since its inception in 1995, the development of the DDI Alliance work products has moved in step with the evolving needs of the user communities seeking to document the data and the processes around its collection.
The work products have also evolved alongside developments in technology and in the computing infrastructure that support this. XML1, in which the current work products are expressed, has itself changed and developed. The standards to which the work products are aligned such as Dublin Core2, ISO/IEC 111793, and most recently GSIM4 have also evolved and the technologies available to implement the work products have matured.
Such broad aims necessitate a level of complexity, but with an underlying logic to what has been produced and how the products relate. This document will clarify the purposes of each of the work products, and attempt to explain why there is sometimes overlap, but rarely duplication.
This overview will help users to determine which of the work products to implement, and to know what to expect in light of future developments. New development work continues and is described on the Developing Products page.
The document is aimed at the wide range of audiences that use and envisage using the work products of the DDI Alliance. These include Data Archives, National Statistical Agencies, individual and networks of researchers, data managers, and Intergovernmental Organizations.
There are several DDI Alliance work products, each of which exists for a different purpose. Each is discussed separately. It should be noted that, as a standards organization, the DDI Alliance is not in the business of requiring users to implement a particular work product or a particular version of a work product. It offers each work product for an intended purpose, in a fashion that is useful to some members of the DDI community of users. These work products are simply tools to be used – there is no necessary value in upgrading to a newer version of any given DDI work product or specification, unless doing so provides benefit to the implementer (different from software where upgrading to the latest version is often recommended).
Because of this, all existing DDI work products will be maintained over time, to support the users who have implemented that work product. Current work products include:
- DDI-Codebook – DDI-Codebook is an XML structure for describing codebooks (or data dictionaries), for a single study.
- DDI-Lifecycle – DDI-Lifecycle (also expressed in XML) expands on the coverage of a single study along the data lifecycle and can describe several waves of data collection, and even ad hoc collections of datasets grouped for the purposes of comparison. It is very useful when dealing with serial data collection as is often seen in data production within statistical offices and long-standing research projects.
- XKOS – Extended Knowledge Organization System -- XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
- Controlled Vocabularies - The DDI Controlled Vocabularies are recommended sets of terms and definitions. They are used to describe various types of activities or artefacts which exist within the DDI-Codebook, DDI-Lifecycle, and DDI-RDF Discovery structures.
The DDI-Codebook specification is the original work product of the Alliance, and has gone through several version changes. The latest version is DDI-Codebook version 2.5.1. DDI-Codebook was not always referred to as such – the “codebook” part of the title was added when DDI-Lifecycle was developed, to distinguish between the two. DDI-Codebook is an XML structure for describing codebooks (or data dictionaries), for a single study (a study in DDI parlance is a single wave of data collection). DDI-Codebook is designed to be used after-the-fact: it assumes that a data file exists as the result of a collection, and is then described using the XML structure provided by the specification.
DDI-Codebook is heavily used, being the structure behind such tools as Nesstar5, which is used by data archives within CESSDA6 and among Canadian academic institutions. The largest single user-base is probably the International Household Survey Network (IHSN7), which provides tools for documenting studies conducted by statistical agencies in the developing world, some of which are based on the Nesstar tools.
DDI-Codebook is – and will be – maintained within the limited scope of its specified coverage. Any bugs will be corrected in future releases, and minor changes may be made to allow DDI-Codebook to align with other of the work products. As an example, when DDI-Codebook 2.5 was released, the system for identification was updated to allow for DDI-Lifecycle identifiers to be included so that metadata which overlapped between these two work products could be expressed using either the Codebook or Lifecycle specification. In addition, content was added to support additional coverage of GSIM descriptive objects, which were also being added to DDI-Lifecycle.
DDI-Lifecycle is the result of a more demanding set of requirements emerging from the use of DDI-Codebook. It is again a complex metadata structure expressed using XML, but it is designed for different purposes. Lifecycle is capable of describing not only the results of data collection, but describing the metadata throughout the data collection process, from the initial conceptualization through to the archiving of the resulting data. It can describe several versions of the data and metadata as they change across the data lifecycle, hence the name “DDI-Lifecycle”.
DDI-Lifecycle can describe several waves of data collection, and even ad hoc collections of datasets grouped for the purposes of comparison. It is very useful when dealing with serial data collection as is often seen in data production within statistical offices and long-standing research projects.
DDI-Lifecycle was first released as DDI version 3.0, and is now in version 3.3.
It should be noted that, because of the purpose of DDI-Lifecycle (which includes after-the-fact data description for single studies, as a necessary part of its broader scope), any metadata which can be expressed using DDI-Codebook can also be described using DDI-Lifecycle. DDI-Lifecycle is – again, because of the purpose it was designed to serve – a more complex structure than DDI-Codebook, and requires a more complex computing infrastructure.
DDI-Lifecycle can also be used to document specific sets of metadata outside of the description of a single study or set of studies. For example, areas of commonly shared metadata such as concepts, statistical classification, or geographic structures can be described and referenced by any number of studies. Processing activities can be described and used to support a metadata-driven approach to the collection, processing and publication of data.
C. XKOS -- Extended Knowledge Organization System
XKOS extends Simple Knowledge Organization System (SKOS) for the needs of needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
D. Controlled Vocabularies
The Controlled Vocabularies produced by the DDI Alliance are sets of terms and definitions used to describe various types of activities or artefacts which exist within the DDI-Codebook, DDI-Lifecycle, and DDI-RDF Discovery structures. They can be used with either XML specification (or indeed, with any system that needs such terminologies). They are developed and maintained independent of any other DDI work products, and indeed, are more a supporting work product than one intended to be used in its own right. The Controlled Vocabularies are in the process of moving to a production and maintenance process in cooperation with CESSDA. This will change the primary expression of the Controlled Vocabularies from Genericode (XML) to SKOS. Additional formats will be added as required.
The DDI Controlled Vocabularies are recommended sets of terms and definitions – organizations are free to use modified or entirely different vocabularies in their place if that better meets their organizations’ needs.
5System for the dissemination of statistical information and related documentation, http://www.nesstar.com/
6Consortium of European Social Science Data Archives, https://www.cessda.eu/