CODATA (the Committee on Data of the International Science Council) recently completed a European Open Science Cloud co-creation project which explored the potential uses and applications of the draft Data Documentation Initiative Cross-Domain Integration (DDI-CDI) specification for EOSC. The main output was a substantial report The Role of DDI-CDI in EOSC: Possible Uses and Applications which explores the challenges faced by EOSC and discusses a number of use cases/examples of the role that DDI-CDI can play.
Main report: The Role of DDI-CDI in EOSC: Possible Uses and Applications https://doi.org/10.5281/zenodo.4707263
The report was produced in close consultation with members of the DDI Alliance that developed the specification. A wide range of experts participated in workshops and intensive meetings, as detailed in the activity report produced for the EOSC Secretariat.
The Role of DDI-CDI in EOSC: Report on Activities https://doi.org/10.5281/zenodo.4707290
Documentation on DDI-CDI is available below. The DDI Alliance aim to publish the production release in June/July 2021.
Report Launch Workshop, 2 June
The report will be launched at a virtual workshop to be held at 13:00-15:00 UTC on Weds 2 June. This workshop will summarise the key findings and recommendations of the report; discuss the examples/case studies and how they may be further explored and tested; and identify next steps for trial implementations. The agenda and registration for the launch workshop will be announced in the near future. In the meantime, please save the date!
The Role of DDI-CDI in EOSC: Possible Uses and Applications
This report looks at the potential use of the Data Documentation Initiative Cross-Domain Integration (DDI-CDI) specification to the data-sharing requirements faced by EOSC. By analyzing real-world projects and implementations, and through discussion with those responsible for related metadata and infrastructure specifications, the potential role played by the DDI-CDI model in the overall EOSC system is envisioned, and recommendations made for how to realize the identified opportunities for its use.
The challenges faced by EOSC can be broken into two main areas:
– Problems of Scale: The volume of data is growing exponentially and is coming from a wider range of sources. At the same time, the FAIR principles require an increased amount of metadata, especially when it comes to interoperability and reuse of that data. Current manual approaches are proving to be unsustainable. The automation of metadata collection – that is, harvesting metadata programmatically from systems which produce, manage, disseminate, and use data – offers a possible solution, but the necessary framework for such activities is not in place. Standard models and encoding for such metadata (a “lingua franca”) must be established for large-scale capture and exchange of metadata.
– Problems of Cross-Domain Use: In order for data to be shared across domain and institutional boundaries, it must be understood by its users at all levels. While increasing attention is paid to the semantic mapping of concepts across domains, there are other critical needs for such data sharing. Disparate data structures must be accommodated, based on the tools and formats used in specific domains, and the means of data collection and processing – the provenance of the data – must be understood. Use of specific domain models and vocabularies must be known, and they must be accessible in a machine-actionable form. Reusable crosswalks between domains are needed. All of these requirements point to the need for more granular metadata, so that data can be successfully re-arranged to be suitable for use outside its domain of origin. The path of a single observation, as it is reused and further processed, should be knowable.
DDI-CDI will not address all of these concerns; no single standard or technology will provide a complete answer. It has, however, been designed to fill important gaps in the needed range of standards, models, and technologies to meet these challenges. On the basis of an intensive series, of meetings, conference sessions, workshops and other discussions with a range of different groups, this report looks at use cases and the emerging FAIR ecosystem to understand the potential application of DDI-CDI, and the role it could play within a broader frame. The approach being taken by EOSC—as described in the EOSC Interoperability Framework and in other activities—is then assessed to show specifically where DDI-CDI would fit. Recommendations for further work are then made on that basis.
Specific implementation examples include a data integration using climate data, energy consumption data, and consumer questionnaire responses; an example of how a repository could facilitate automated capture of metadata, based on the Dataverse platform; a data integration example from the European Social Survey Multi-Level application; and an exploration of processing, provenance, and cross-domain requirements as seen in the ALPHA Network and INSPIRE applications for the integration of population and clinical data. An analysis of how DDI-CDI could be used in combination with DCAT is presented, and the role which DDI-CDI could play within the emerging FAIR ecosystem, in relation to FAIR Implementation Profiles, FAIR Data Points, and FAIR Digital Objects, is examined. Finally, the way in which DDI-CDI could be integrated into the emerging EOSC infrastructure is considered in light of the EOSC Interoperability Framework and the FAIRsFAIR vision of integrated metadata catalogues.
DDI-CDI offers a new type of specification which could help to realize the capture, interchange, and use of metadata throughout the EOSC data-sharing infrastructure, and could do so in ways which are scalable and machine-actionable. It operates at the needed level of granularity and would work to heighten the utility of semantic mapping and approaches to the full utilization of data. Our recommendations identify several concrete areas where this application of the model should be further explored.
Public review page: https://ddi-alliance.atlassian.net/wiki/x/IQBPMw
Complete download package: https://ddi-alliance.bitbucket.io/DDI-CDI/DDI-CDI_Public_Review_1.zip
Announcement at DDI Alliance website: https://ddialliance.org/announcement/public-review-ddi-cross-domain-integration-ddi-cdi
Recognition of funding for the EOSC Co-Creation Project
This work was supported by the EOSC Secretariat. EOSCsecretariat.eu has received funding from the European Union’s Horizon Programme call H2020-INFRAEOSC-2018-4, Grant Agreement number 831644.