Skip to content

The Data Documentation Initiative (DDI) is a suite of products that describes metadata about both quantitative and qualitative research data in the social, behavioral, economic, and health sciences. The DDI suite is a set of free standards that document and manage different stages of the research data lifecycle, including conceptualization, collection, process, distribution, discovery, and archiving.

The content areas of DDI cover the following areas:

  • Conceptual objects: concept, unit, unit type, universe, population, geographic structures, and representation
  • Methodological objects: approaches to sample selection, data capture, weighting, quality control, and process management
  • Processing: data capture, data processing, analysis, and data management
  • Quantitative and qualitative data objects: concept, universe, representation, usage, data type, record, record relationships, storage, access, and descriptive statistics
  • Data management: ownership, access, rights management, restrictions, quality standards, organization, agent management, relationship between products, versioning, and provenance

Products within the DDI suite differ in terms of their area of coverage within DDI, supported activities, and required level of infrastructure. From simple descriptive content for human understanding to structures that support metadata-driven statistics production and analysis, DDI addresses a broad area of data management needs. As a suite of standards, DDI provides a common means of identification for information objects, support for common cross-product content, and an informed means of transforming content between products.

Current DDI Products

[DDI also has several products under development. Descriptions of those products are found here.]

DDI-Codebook - Structured, descriptive documentation of the content, meaning, provenance, and access for a single data set.
DDI-Lifecycle - Expands on the idea of DDI-Codebook in terms of content coverage, depth, metadata management over time, reusable metadata, and support for the planning, capture, processing, storage, discovery and dissemination of data. It allows grouping and comparing related studies or series of studies.
Controlled Vocabularies - A set of controlled vocabularies commonly used in social science and other disciplines to support systems designed to identify, locate, and access data for research purposes.
XKOS - Extended Knowledge Organization System (XKOS) leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems. XKOS adds the extensions that are needed to meet the requirements of the statistical community. 
SDTL: Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands

 

Product Description Supports Activities Points of Contact with other DDI Products Available metadata syntax representations
DDI-Codebook Originally developed as an XML DTD, Codebook retains the hierarchical structure of a DTD in describing the contents of a descriptive codebook for a data set including: identification, authorship, ownership, purpose, background methodologies, source information, provenance, quality control, access, physical file structures, variables/variable groupings, and related materials. Extensive information is found within the variable description covering the data source, derivation activity, representation, data typing, variable role, and restrictions. Content Coverage
Codebook covers all major content areas but in general, s limited to descriptive narrative
  • Descriptive documentation of the content, meaning, provenance, and access for a single data set
  • Archival preservation of descriptive content
  • Input basis for more complex descriptions
  • Input content for discovery and exchange of data at the study, data file, variable, and question level
  • Input content for a structured human-readable codebook for the data set as a whole
  • Populate variable and question banks to explore available data and question structures for reuse in new surveys
  • DDI-Lifecycle: supports all of the descriptive content of DDI-Codebook as well as retaining content relationships expressed as hierarchies in DDI-Codebook.
  • Controlled Vocabularies: a Controlled Vocabulary can be assigned to any object (element or attribute) in DDI-Codebook. DDI Controlled Vocabularies indicate the intended use of the vocabulary by DDI-Codebook.
  • XKOS: a Controlled Vocabulary assigned to a set of categories within a variable can link to an XKOS expression of a statistical classification
  • SDTL: supports the use of SDTL content in derivations and processing information.
  • XML Schema
DDI-Lifecycle DDI-Lifecycle expands on the idea of DDI-Codebook in terms of content coverage, depth, metadata management over time, reusable metadata, and support for the planning, capture, processing, storage, discovery and dissemination of research data. DDI-Lifecycle is the most comprehensive of the DDI products covering conceptual and methodological objects, processing, quantitative and qualitative data objects, and data management. Lifecycle is appropriate for longitudinal, linked, and other complex datasets.
  • Descriptive documentation of the content, meaning, provenance, and access for a single data set or series
  • Archival preservation of descriptive and production content
  • Metadata driven statistical systems
  • Input content for discovery and exchange of data at the study, data file, variable, and question level
  • Input content for a structured human-readable codebook for the data set as a whole
  • Populate variable and question banks to explore available data and question structures for reuse in new surveys
  • Metadata reuse for quality control and consistency
  • Reuse of metadata within and between studies
  • Defining intended processes for data capture, processing, preservation, and access
  • Management of single or multi-model data capture including generation of data capture forms and instrument content
  • DDI-Codebook: all of the descriptive content of DDI-Codebook as well as hierarchical content relationships are expressed in DDI-Lifecycle
  • Controlled Vocabularies: the object Code Value Type is used in over 125 locations in DDI-Lifecycle to support the use of externally defined controlled vocabularies.
  • XKOS: an External Classification can link to an XKOS expression of a statistical classification for use as a variable representation or question response domain
  • SDTL: supports the use of SDTL content in all command codes locations.
  • XML Schema
Controlled Vocabularies

A set of controlled vocabularies commonly used in social science research. Reflects uses of controlled vocabulary to support systems designed to identify, locate, and access data for research purposes. Content coverage is driven by the needs of the DDI community, but use is not limited to this community.

  • Common terms across multiple social sciences data sets
  • Common search and classification terms across data sets
  • Supports the se of common controlled vocabularies for the purpose data discovery, filtering, and data management
  • Provides source of commonly used controlled vocabularies easily accessible to related content areas (environment, land use, science, etc.) to support cross disciplinary research and understanding
  • DDI-Codebook: contains a number of specific locations where controlled vocabularies are commonly used; Controlled Vocabularies can be attached to any element or attribute by designating the XPath of the object using the identified vocabulary. 
  • DDI-Lifecycle: contains over 125 locations supporting the use of a controlled vocabulary using the structure Code Value Type; all Identifiable elements support the user identified key value pairs which allow for the use of controlled vocabularies for both the key and value. 
  • RDF: SKOS
  • XML: DDI Lifecycle Code List structure
XKOS XKOS extends Simple Knowledge Organization System (SKOS) for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
  • Publication of statistical classifications for within- or between-agency reuse
  • Management of statistical classification terms, lists, families, and collections over time
  • DDI-Lifecycle: maps to the Statistical Classification modeled in DDI Lifecycle 3.3; can be used as an External Classification for a variable representation or question response domain
  • DDI-Codebook: a Controlled Vocabulary assigned to a set of categories within a variable can link to an XKOS expression of a statistical classification
  • RDF/OWL
SDTL Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands. Statistical analysis packages (e.g., SPSS, Stata, SAS, and R) provide similar functionality, but each one has its own proprietary language. SDTL consists of JSON schemas for common operations, such as RECODE, MERGE FILES, and VARIABLE LABELS. SDTL provides machine-actionable descriptions of variable-level data transformation histories derived from any data transformation language. Provenance metadata represented in SDTL can be added to documentation in DDI and other metadata standards.
  • Generate key machine-actionable metadata on production processes for inclusion in DDI and other
  • Capture transformation processes for provenance purposes
  • Capture a metadata life cycle that parallels the data life cycle
  • Capture processing information in a structure that can be used to create syntax for a range of statistical packages
  • DDI-Codebook: SDTL content can be integrated into Codebook objects to provide clear consistent descriptions of processing
  • DDI-Lifecycle: SDTL content can be integrated into Codebook objects to provide clear consistent descriptions of processing. Provides machine-actionable content to code content objects.
  • JSON Schema
  • XML Schema
  • RDF/OWL
  • UML/XMI