DDI Best Practice Definitions

This is a compilation of terms and definitions that appear in the set of DDI 3 Best Practices. All terms are defined in the specific context of DDI.


An API (or Application Programming Interface) is a language and message format used by an application program to communicate with the operating system or some other control program such as a database management system (DBMS) or communications protocol. APIs are implemented by writing function calls in the program, which provide the linkage to the required subroutine for execution.



Catalogues contain sets of metadata entities, identifiers, and descriptions of associated items included in a registry. Registries can be thought of as smart catalogues with enhanced functionality which allow for the classification of objects.



Being a contributing member of the full data life cycle and realizing that one is part of a bigger scientific picture.


Cleaning operations

Methods used to "clean" the data collection, e.g., consistency checking, wildcode checking, etc. The "agency" attribute permits specification of the agency doing the data cleaning.



A document that provides information on the structure, contents, and layout of a data file.



In these best practices, the term community is used to identify any grouping of personal or organizational entities, at different levels of formal organization, that are considering or undertaking implementation of DDI. Examples: a national statistical service, a data producer, an archive, a consortium of data archives.



A piece of software with a specific purpose with a well-defined input and well-defined output.


Control operations

Methods to facilitate data control performed by the primary investigator or by the data archive. In this DDI element, any special programs used for such operations should be specified. The "agency" attribute maybe used to refer to the agency that performed the control operation.


Controlled vocabulary (CV)

Broadly speaking, a CV can range from a short list of clearly defined, mutually exclusive, and exhaustive terms, which are the only choices for usage in a specific context (e.g., populating certain DDI elements or attributes) through a classification to something as complex as a thesaurus with thousands of terms and term relationships. A CV has also been described as "A set of subject terms, and rules for their use in assigning terms to materials for indexing and retrieval." (http://www.cs.cornell.edu/wya/diglib/MS1999/Glossary.html). In a CV, a term consists of one or more words used to represent a concept (example: "fear"; "females"; "child care"). Terms are selected from natural language for inclusion in a controlled vocabulary.



A structured model of how one list of items maps into a related list of items. The maps in DDI Comparison are an example of a crosswalk.


Data and knowledge repository

Place (may be virtual) where the products (metadata, data, and other related information) of the data life cycle are located.


Data life cycle

The stages of the data component of the research process, from study conceptualization to data analysis and archiving, feeding back to earlier stages. This process has often been depicted as linear, but the diagram embedded in the Discovery and Dissemination best practice offers a different perspective on it, from the user's point of view.



When used without a version in these best practices, DDI refers to the latest DDI specification, currently version 3.0. When older versions are referenced, the version number will be explicitly specified.


DDI application

A software application that reads and/or writes DDI XML.


DDI instance

A DDI Instance is the top-level wrapper for any DDI document. It may contain a set of top-level elements, which generally correspond to the modular breakdown within DDI. Every DDI Instance will use this wrapper, regardless of its content.


DDI profile

A mechanism to describe an organization's selected subset of elements and attributes.


DDI scheme

Schemes are maintainable lists of metadata elements that may be published separately and reused by a number of studies. They are the basis for resources such as question banks, concept banks, and variable banks. The construction of schemes takes into consideration their potential reuse by others.


DDI scheme with "Relations"

One of a subgroup of DDI schemes that may refer to other DDI schemes.


DDI scheme with "Dependencies"

One of a subgroup of DDI schemes that require the existence of other DDI schemes.



Strategies and processes used by the end user to locate and access products (metadata, data, and other related information) of the data life cycle. Dissemination: Data distribution with the aim of access by the end user to the products (metadata, data, and other related information) of the data life cycle.



The Domain Name System (DNS) translates Internet domain and host names to IP addresses. It translates domain names meaningful to humans into the numerical (binary) identifiers associated with networking equipment for the purpose of locating and addressing these devices world-wide. An often used analogy to explain the Domain Name System is that it serves as the "phone book" for the Internet by translating human-friendly computer hostnames into IP addresses. For example, www.example.com translates to



Document Type Definition is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. A DTD is a formal expression (in XML) of the structural constraints for a class of XML documents. The DTD language constructs are element and attribute-list declarations.



A specification for a reference that requests the exact version noted.



Eclipse is an ongoing project in support of an open source integrated development environment (IDE). Eclipse provides a framework and a basic platform (called the Eclipse Platform) that allows an organization to build an integrated development environment from plug-in software components provided by Eclipse members.


End user

Person performing work in the data life cycle for whom DDI metadata is required. The end user will likely not even be aware of the DDI metadata in the application he or she is using. End users span the data life cycle. Examples include research councils/funding bodies, researchers, data producers, archivists, librarians, data analysts, registry managers, research analysts/authors.


External publication of DDI schemes

This refers to the publication of DDI schemes as resources packages for use by the broader community.


Federated search

Federated search is the simultaneous search of multiple online databases or Web resources and is an emerging feature of automated, Web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine. (http://en.wikipedia.org/wiki/Federated_search)



Genericode defines a standard format for defining code lists (also known as enumerations or controlled vocabularies). Genericode aims to provide the following:

  • A standard model and XML representation for the contents of a code list
  • A standard model and XML representation for data associated with items in a code list
  • A standard model and XML representation for how new code lists are derived from existing code lists



The GNU Lesser General Public License (formerly the GNU Library General Public License) is a free software license published by the Free Software Foundation.



The term governance is used here to refer to the procedures associated with the decision-making, control, and administration of DDI metadata sets.



A GUI is a graphical (rather than purely textual) user interface to a computer.



Short for HyperText Transfer Protocol, the underlying protocol used by the World Wide Web. HTTP defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. For example, when you enter a URL in your browser, this actually sends an HTTP command to the Web server directing it to fetch and transmit the requested Web page.



HTTPS stands for HyperText Transfer Protocol over SSL (Secure Socket Layer). It is a TCP/IP protocol used by Web servers to transfer and display Web content securely. The data transferred are encrypted so that they cannot be read by anyone except the recipient.


Identifiable (in the context of DDI)

"Identifiables" are those elements in DDI that carry only the basic level of identification: a URN, ID, and Name.


Inclusion inline vs. by reference

Material is considered included inline when the content is explicitly included. Inclusion by reference means that the material is referenced by one document but published elsewhere.



In OAIS terminology, the OAIS entity that contains the services and functions that accept Submission Information Packages from Producers, prepares Archival Information Packages for storage, and ensures that Archival Information Packages and their supporting Descriptive Information become established within the OAIS. Used in its verb form, ingest refers to the process of taking information into a repository.


Internal publication of DDI schemes

This refers to publication of DDI schemes as resource packages within a specified project, working group, or organization. Note that a specific project may involve more than one organization, e.g., the Eurobarometer project.



Internationalization is the process of planning and implementing products and services so that they can easily be adapted to specific local languages and cultures, a process called localization.



An Internet Protocol (IP) address is a numerical identification (logical address) that is assigned to devices participating in a computer network utilizing the Internet Protocol for communication between its nodes. Although IP addresses are stored as binary numbers, they are usually displayed in human-readable notations, such as (for IPv4).



Java is a programming language expressly designed for use in the distributed environment of the Internet. It was designed to have the "look and feel" of the C++ language, but it is simpler to use than C++ and enforces an object-oriented programming model.


Knowledge transfer

The act of sharing the knowledge gained throughout the data life cycle.



A specification for a reference that requests the most recent version available.


Logical record

A reference to a data record that is independent of its physical location. It may be physically stored in two or more locations.


Maintainable (in the context of DDI)

"Maintainables" are complex objects that can be maintained outside of a DDI Instance (published as separate entities). Their identification strings ensure that they are globally unique.


Maintenance agencies

These organizations own the metadata objects they maintain, and only they are allowed to make changes to those objects.


Major version

The definition of a major version varies according to what is being published. However, major versions are expressed by the digits to the left of the decimal point.



Metadata Encoding and Transmission Standard http://www.loc.gov/standards/mets/



In the context of the best practices, middleware refers to utilities that manage the interface between the DDI metadata model and application services or high-level end-user tools.



Migration in the DDI context refers to moving from a DTD to XML Schema in terms of document structure; and from DDI 2 to DDI 3 as well as from DDI 3 back to DDI 2 in terms of porting content.


Minor version

The definition and level of detail of a minor version varies according to what is being published. The minor version information is always located to the right of the first decimal and can be further subdivided at the discretion of the maintaining agency.



Describe the logical structure of an n-dimensional array, in which each coordinate intersects with every other dimension at a single point. The NCube has been designed for use in the markup of aggregate data.


Open Archival Information System (OAIS)

A reference model of the space community that governs general archival activities and policies -- http://public.ccsds.org/publications/archive/650x0b1.pdf. Includes:

  • SIP: Submission Information Package
  • AIP: Archival Information Package
  • DIP: Dissemination Information Package



Preservation Metadata Implementation Strategies 

PREMIS Working Group


Pre-coordinated/Post-coordinated controlled vocabularies (CVs)

In pre-coordinated CV systems, multiple concepts are brought together in one term. An illustrative example is the Library of Congress Subject Headings (LCSH), which yield entries such as: "Insurance, Unemployment --Switzerland --Statistics." This method allows for disambiguation of the relationship of the concepts in the term that might not be possible in post-coordinated systems, such as whether a term is a qualifier of another. In post-coordinated or faceted systems, concepts are kept broad and separate and selected and joined in the process of searching with Boolean operators. A representation of the above LCSH in this system could be "Insurance AND Statistics AND Switzerland AND Unemployment" – note that entry order in the query has no relevance here. An example of such a system is the American Psychological Association's Thesaurus of Psychological Index Terms.


Published metadata

Published metadata is considered available for use outside of the community that created the original document. This broader audience may be internal to a project or organization or external. Metadata that is published must be wrapped in a DDI instance, versioned, and available for reuse or reference from outside of the instance. Packaging as a DDI instance does not necessarily mean packaging for publication. Metadata may be packaged for reasons other than publication during its internal development process. In these cases versioning is not required.



A collection of elements and attributes that contain information on a particular subject whose authors wish to share with others. Registers require support of well-defined registration processes, and include provisions for dealing with provenance and auditing, versioning, and security enforcement. Registers are the basic components of registries.



A virtual, centralized and structured database or portal that allows one to list, do a structured search, and to identify and retrieve metadata and possibly data that are distributed around a network. Registries are places where various types of resources are indexed and made visible and available for use throughout a community; they do not include clustered servers or depend on harvesting approaches to access their contents. Some survey organizations register measures (question wording and response options, for example) in order to standardize the way they elicit information from respondents. The implication is that there is one correct way for an organization to measure, say, income. Examples of DDI registries could be question banks, concept banks, social science data survey catalogs, and variable banks.


Resource package

A resource package is a means of packaging any maintainable set of DDI metadata for referencing as part of a study unit or group. A resource package structures materials for publication that are intended to be reused by multiple studies, projects, or communities of users. A resource package uses the group module with an alternative top-level element called Resource Package that is used to describe maintainable modules or schemes that may be used by multiple study units outside of a group structure.


Service-level agreement

The agreement between a service provider and service consumer that describes how the service will be provided. Typically a contract.



The DDI specification.



In the context of the best practices, stewardship involves taking on custodial responsibilities for a stage in the data life cycle.



An activity that a person undertakes in order to create, edit, or view documentation about data.



Short for Time to Live, a field in the Internet Protocol (IP) that specifies how many more hops a packet can travel before being discarded or returned.



Unicode is a computing industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems.



A Globally Unique Identifier is a special type of identifier used in applications to provide a reference number that is unique in any context.



A URL (Uniform Resource Locator, previously Universal Resource Locator) is the unique address for a file that is accessible on the Internet. A common way to get to a Web site is to enter the URL of its home page file in a Web browser's address line. However, any file within that Web site can also be specified with a URL.



A URN (Uniform Resource Name) is an Internet resource with a name that, unlike a URL, has persistent significance -- that is, the owner of the URN can expect that someone else (or a program) will always be able to find the resource.


Version Rationale

This is an optional DDI element in all versionable elements and provides a location for indicating the reason for the change, e.g., correction of a typographical error or correction of inaccurate content that may affect analysis performed using earlier content.


Versionable (in the context of DDI)

"Versionables" comprise a subset of DDI "identifiable" elements. These are elements for which changes in content are important to note and thus additional attributes related to versioning are enabled.



The process of providing a unique identifier for an element or entity that changes over time. Versioned elements retain their original ID but their version number is incremented to reflect a difference in content. This allows a reference to persist through the ID while allowing for either the specified version or the most current version of the element to be obtained. What is versioned, maintained, and referenced in the DDI 3.0 is the metadata itself, rather than the XML which expresses that metadata. While this might seem like a minor distinction, it has major implications for how applications are developed.



The use of sampling procedures may make it necessary to apply weights to produce accurate statistical results. Describe here the criteria for using weights in analysis of a collection. If a weighting formula or coefficient was developed, provide this formula, define its elements, and indicate how the formula is applied to data.


XML editing software, or XML "editors"

Applications that facilitate the creation of XML documents by providing prompts regarding the appropriate use of tags based on the XML schema which can be pre-loaded into the software. XML editors also validate XML documents and assist in producing valid documents by pointing to existing errors and usually indicating how the errors might be corrected. Examples of commercial XML editors are XMLSpy, oXygen, XMetaL. Free editors are also available, read a more complete discussion.


XML Schema

The XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. XML Schema is a W3C Recommendation.


XPath Syntax

An XML Path Language (Xpath) expression uses a path notation, like those used in URLs, for addressing parts of an XML document. The expression is evaluated to yield an object of the node-set, Boolean, number, or string type. For example, the expression book/author will return a node-set of the <author> elements contained in the <book> elements, if such elements are declared in the source XML document. In addition, an XPath expression can have predicates (filter expressions) or function calls. For example, the expression book[@type="Fiction"] refers to the <book> elements whose type attribute is set to "Fiction" (more information at http://msdn.microsoft.com/en-us/library/ms256471(VS.85).aspx)