FAQ about DDI 4

Why a new version of DDI?
DDI has continuously evolved to meet the needs of its user community. The goals of DDI 4 to produce a model-based specification that can be more easily managed, extended, and expressed in different representations such as RDF, JSON, and database schemes. Another goal of a new version of DDI is the ability to auto-generate DDI instead of creating “by hand.” 

Does a new version of DDI mean older versions will no longer be supported?
No! The Alliance continues to support, develop, and encourage the adoption of DDI-Codebook (DDI 2) and DDI-LIfecycle (DDI 3). The DDI 4 Prototype release is intended for public review and feedback on the model-based approach to developing a DDI specification.

If I’m using an earlier version of DDI, what new functionality will the DDI 4 Prototype provide?
From a content perspective, the DDI 4 Prototype provides the ability to describe data at a more atomistic level (i.e. down to the datum or individual observation) as well as better document more research processes and automation. The DDI 4 Prototype also produces an “official” standard version of a RDF binding.

Is DDI 4 backwards compatible?
No. While the DDI 4 Prototype covers some of the same content areas, it has taken an approach that is distinct from Codebook and Lifecycle.

What happened to DDI “Model Driven?”
The DDI 4 Prototype has evolved over the past five years and previously been referred to by many names, including: DDI Model Driven, DDI Views, Moving Forward, etc. 

How does this Prototype relate to previous releases of DDI 4?
Previous releases and reviews of DDI 4 (aka Views, MD, etc.) were concerned with how individual content areas were modeled (data, capture, agents, etc.). This Prototype release concerns how these individual areas integrate and the model approach works as a whole.

How do I know if I should change versions?
While DDI 4 is in Prototype form, we encourage users and potential users to continue using DDI’s two production-ready versions: DDI Codebook (2.x) or DDI Lifecycle (3.x).

If I want to use DDI 4, how do I go about doing that? 
The DDI 4 Prototype is still undergoing development. While users are encouraged to review work on the Prototype, there is no production-ready specification yet and there is no clear transition path from Codebook or Lifecycle.

When will DDI 4 be ready for production?
The next steps in developing DDI 4 will be determined by amount and degree of feedback about the Prototype. The DDI Alliance encourages individuals and organizations interested in participating in DDI 4 development to get involved. The Alliance encourages users and potential users to try DDI’s two production-ready version: DDI Codebook (2.x) or DDI Lifecycle (3.x).

What kind of comments about the DDI 4 Prototype are being solicited?
The DDI Alliance is soliciting such feedback as: 

  • Does this Prototype provide functionality that you or your organization find useful?
  • If not, what do you see as barriers to implementing DDI 4?
  • Would your organization use the XML or RDF bindings?
  • How would you or your organization actually use and/or implement DDI 4?

Why a model?
A model is a simplification of a part of the reality. It clarifies definitions and relationships between components. The defined concepts in a model are robust over time and build a sustainable basis for future development and extensions. The model is independent of used technical representations. The model can be represented by today available acknowledged technical forms like XML, RDF, and programming languages. The model can be used in future technical forms without any change in the model concepts.

A model from what?
The model represents acknowledged approaches for data capture, data description, and data processing.

What is model-driven in the context of DDI 4?
The architecture is driven by the DDI 4 model. The approach is to specify any important artefact of the domain of DDI in the model. The representations of the DDI 4 model are derived by transformation rules and the related syntax is generated automatically. This means that any change in the model is immediately available in the representations on which basis software and instance documents can be developed.

What is UML?
From Wikipedia, Unified Modeling Language: “The Unified Modeling Language (UML) is a general-purpose, developmental, modeling language in the field of software engineering, that is intended to provide a standard way to visualize the design of a system.” It is based on diagrammatic representations of components and their relationships.

What is an UML class diagram?
From Wikipedia, Class diagram: “In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system's classes, their attributes, operations (or methods), and the relationships among objects.”

UML seems to be quite complex. How does this help?
For the DDI 4 model, only UML class diagrams are used. Even there, from the whole expressivity only a core subset is used. The subset comprehends 15 items, for details see [document: DDI 4 UML Class Diagram Subset]. Based on this restriction, it is convenient to show class definitions and the relationships between classes in diagrams and related formal definitions.

What is XMI?
XMI is used as exchange format between UML tools. “The XML Metadata Interchange (XMI) is an Object Management Group standard for exchanging metadata information via Extensible Markup Language (XML).” (From Wikipedia, XML Metadata Interchange).

What is Canonical XMI?
The DDI model in Canonical XMI can be understood as the portable DDI UML Class Library which can be imported by various UML tools for further processing. “Canonical XMI is an additional conformance point for the XMI specification that eliminates variability in generating the XMI for a model.” (From OMG Wiki).

What is platform-independence in the DDI context?
“A computing platform or digital platform is the environment in which a piece of software is executed.” (From Wikipedia, Computing platform). Analogous for DDI as model-based metadata specification: the technological environment where DDI is applied to like XML Schema or OWL/RDF-S.

What is a representation or binding?
“In computing, a binding from a programming language to a library or operating system service is an application programming interface (API) providing glue code to use that library or service in a given programming language.” (From Wikipedia, Language binding). Loosely based on this, for DDI as metadata specification in UML: the binding can be understood as a representation of the DDI model in a formal language like XML or RDF. It represents the whole model or parts of it; it is not only glue code. The specific rules in a formal language could be defined by a schema, like XML Schema for XML and OWL/RDF-S for RDF. The formal language could be a markup language (like XML), a programming language (like Java) or a knowledge representation language (like OWL). A representation of the DDI model might need partly adaptions to the expressiveness of a formal language. The transformation concepts are defined in rules.

What are the main representations or bindings of the DDI 4 model?
XML is the representation for preservation and exchange. XML is also used for DDI-Codebook and DDI-Lifecycle. The rules for XML are defined by a XML Schema. RDF is the binding for discovery purposes in the Semantic Web and in the Web of Linked Data. The rules for RDF are defined in an ontology defined by OWL/RDF-S. DDI might be also represented in a programming language library like Java, C#, JSON, or R.

What is the difference between a RDF vocabulary, a ontology, and the DDI representation in OWL?
Basically, the same is meant in the context of DDI, it is a definition of a structure, a scheme.

What is an XML/RDF instance document?
An XML file that is structured according to the rules of an XML Schema and contains data (i.e. information) is called an XML instance document. An RDF file that is structured according to the rules of an OWL ontology (RDF vocabulary) and contains data (i.e. information is called an RDF instance document.

There seem to be many levels regarding DDI 4. What are the relationships?

  • The model represents the acknowledged approaches for data description.
  • The functional views define a subset of the DDI 4 class library which satisfy the requirements of a specific perspective defined by a common use case and/or a group of stakeholders
  • Representation (or binding) is the functional view in the syntax of a specific representation.
  • The software implementation of the representations of a functional view is a program for a specific purpose defined by the functional view. The program can read and/or write one or more representations of DDI (like XML).
  • Instance document (i.e. in XML or RDF) with a structure according to a specific functional view

How can instance documents be validated?

  • An XML instance document can be validated by an XML parser (XML Schema aware) according to the rules defined in an XML Schema.
  • An RDF instance document can be validated by a ShEx validator according to the rules defined in a ShEx file. ShEx is the Shape Expressions language. A ShEx file is generated for the whole DDI 4 model containing expressions for data type definitions and multiplicities/cardinalities among other things.
  • JSON data structures can be validated according to a JSON Schema.

What is secondary-level validation?
An instance document in any representation (like XML) can only by validated according to the rules which are defined in the related schema (like XML Schema). Additional tests can only be defined in a secondary step which can be implemented in an additional tool. First-level validation is sometimes only possible on the syntax level as with program languages.

What are multiplicities/cardinalities?
UML multiplicity defines the number of objects in one class that can be related to the number of objects in the other class (also known as cardinality). Example: one variable has one or many labels (i.e. for different purposes like languages).

How portable is DDI 4 metadata?
DDI metadata should be lossless transformed between different forms of DDI representations and/or functional views.

What are views or functional views?
Functional views define a subset of the DDI 4 class library which satisfy the requirements of a specific perspective defined by a common use case and/or a group of stakeholders

Do XML and RDF have not incompatible concepts?
The RDF data model is a graph while the XML data model is a tree (from Introduction to: RDF vs XML). The used XML flavor avoids deep hierarchies and uses mainly the DDI persistent identifiers (DDI URN) to refer from one object to another. With this background, it is easy to transform metadata from an XML instance document to an RDF instance document and vice versa. It is expected that the transformation process is lossless.

What is a datum and why is it important?
It can be understood as single cell or as event to which relevant metadata is associated to. This concept is used also in data lakes. Based on the concepts, any other forms of data representation can be described. DDI 4 data description provides a set of building blocks to describe traditional forms of data as currents new ones and can be used for the description of future forms of data description