SDTL - Structured Data Transformation Language - Version 1.0

Published: 2020-12-01

 

RDF Namespace: http://rdf-vocabulary.ddialliance.org/sdtl

Content

Description

Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands. Statistical analysis packages (e.g., SPSS, Stata, SAS, and R) provide similar functionality, but each one has its own proprietary language. SDTL consists of JSON schemas for common operations, such as RECODE, MERGE FILES, and VARIABLE LABELS. SDTL provides machine-actionable descriptions of variable-level data transformation histories derived from any data transformation language. Provenance metadata represented in SDTL can be added to documentation in DDI and other metadata standards.

Applications

SDTL greatly enhances the value of DDI, because it is a key component of an automated metadata production process. Currently, DDI metadata is almost always created by data repositories not by data producers. Even when data are born digital, data producers discard provenance information that could be transported into DDI, because they do data management and variable transformations in statistical packages with minimal metadata capabilities. SDTL and the tools created by the C2Metadata Project are designed to create a metadata life cycle that parallels the data life cycle. The same scripts that are used to transform and manage data files can be used to update metadata files. As a result, data producers can create more accurate and complete DDI metadata with less time and effort for them and for data repositories.

Informational Documents

License

SDTL – Structured Data Transformation Language is free software you can distribute and/or modify under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Development Work

SDTL was created by the Continuous Capture of Metadata Project supported by the Data Infrastructure Building Blocks (DIBBs) program of the National Science Foundation through grant NSF ACI-1640575.

Future Work

SDTL is maintained and managed by the SDTL Working Group

Selected Articles

  • Alter, George, Jack Gager, Pascal Heus, Carson Hunter, Sanda Ionescu, Jeremy Iverson, H. V. Jagadish, et al. “Capturing Data Provenance from Statistical Software.” International Journal of Digital Curation 16, no. 1 (2021): 14–14. https://doi.org/10.2218/ijdc.v16i1.763. [Best Paper Award IDCC 2021]
  • Alter, George, Darrell Donakowski, Jack Gager, Pascal Heus, Carson Hunter, Sanda Ionescu, Jeremy Iverson, et al. “Provenance Metadata for Statistical Data: An Introduction to Structured Data Transformation Language (SDTL).” IASSIST Quarterly 44, no. 4 (18 2020). https://doi.org/10.29173/iq983.
  • Song, Jie, H. V. Jagadish, and George Alter. “SDTA: An Algebra for Statistical Data Transformation.” Presented at the 33rd International Conference on Scientific and Statistical Database Management, Tampa, FL, USA, 2021. https://doi.org/10.1145/3468791.3468811 [Best Paper Award SSDBM 2021]
  • Song, Jie, George Alter, and H. V. Jagadish. “Structured Data Transformation Algebra (SDTA) and Its Applications.” Distributed and Parallel Databases, July 20, 2022. https://doi.org/10.1007/s10619-022-07418-6.