|Title||DDI Cross Domain Integration for FAIR Data Sharing across Discipline Boundaries|
|Year of Publication||2021|
|Authors||Gregory A, Hodson S, Orten H, Wackerow J|
Introduction and Motivation
The Data Documentation Initiative (DDI) Alliance has been a leader in setting metadata standards for the social, behavioural, and economic sciences (SBE) for many years. They have provided specifications which support data collection, management, and dissemination with detailed descriptions of the data typical of those domains. As with many other branches of statistics and research, however, the type, volume, and sources of data have multiplied in the recent past. Many projects are now cross-disciplinary, involving data from different domains. At the same time, computational approaches to analysis of data and the reproduction and origination of research has evolved. These factors combine to highlight the need for an enhanced ability to integrate and understand data across domain boundaries, and to understand the provenance and processing of data, even as more and more of the work is performed programmatically by systems which leverage machine learning and other advanced technology approaches.
The DDI Alliance has recently published a new specification intended to fill this need for integrating data from disparate sources: DDI - Cross Domain Integration (DDI-CDI). Unlike earlier DDI work products, DDI-CDI is not domain-specific, but is designed to be used with research data from any domain. The specification provides a model for understanding and integrating data across a wide range of sources, including big data/no SQL, event history and register data, traditional columnar data, and multi-dimensional data. Further, it provides a way of describing data provenance, with a focus not only on traditional linear processes, but also on declarative "black box" processes employed by many modern systems. DDI-CDI is intended not to replace traditional domain models for data description, but to supplement them when data from different sources and of different types is being integrated. It is designed to work easily with many other popular standards and models, including semantic vocabularies and generic technology specifications for data processing, dissemination, and cataloguing.
With an expected production release at the end of 2021, the current draft of the specification is undergoing finalisation. This workshop will focus on issues of immediate importance leading up to implementation and subsequent revision of DDI-CDI. Experts in the standard and prospective implementers will be in attendance to help refine the development roadmap.
Interoperability, Sustainability, and Alignment with Other Standards
DDI - CDI is fundamentally a model which is intended to be implemented across a wide variety of technology platforms, and in combination with many other standards models, and specifications. To support this use, it is formalized using a limited subset of the Unified Modelling Language (UML).
The platform-independence of the model makes it more easily applicable across a broad range of applications and helps ensure that it will be sustainable even as the technology landscape evolves. DDI - CDI builds on many other standard models and is aligned with them where appropriate.
The goal is that specific modules can be used in a flexible way standalone, together with other DDI-CDI modules, or together with other specifications. The work will focus on identification of functional packages, defined function of packages, clear one-way dependencies between packages, separation between functional (core) packages/classes and supporting packages/classes.
Data structure components (toolkit)
Review an approach for building new data structure types (in addition to the existing traditional wide/rectangular data, long [event] data, multi-dimensional data, and NoSQL/key-value data). Possible additional data structure types include graphs, text, any object in a “cell” (tables, text, binary objects, arrays of arrays, etc.).
UML class model interoperable subset (UCMIS)
The strict use of UCMIS enables a robust model which can be imported in many UML tools and represented in object-oriented syntax representations. The focus here will be the relationship to other specifications (in the light of the modular approach) on the model level and syntax representation level. See documentation and spreadsheet of previously named “Practitioner's Subset for Data Modeling”.
Syntax representations of the model
Exploration and decisions on OWL/RDF-S, JSON-LD, SheX (as constraint language for RDF). The work will build on an existing mapping from UML to OWL/RDF-S.
Identify the methodology by which a community of users will specify how they will employ the model in their own implementations, such that they become more easily interoperable. Intersection with other machine-processible descriptions of data-sharing resources and methods within the community will be a focus.
DDI Cross Domain Integration for FAIR Data Sharing across Discipline Boundaries
Submitted by lyle on Sun, 2022-05-22 11:53