SDTL
Structured Data Transformation Language
Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands. Statistical analysis packages (e.g., SPSS, Stata, SAS, and R) provide similar functionality, but each one has its own proprietary language. SDTL consists of JSON schemas for common operations, such as RECODE, MERGE FILES, and VARIABLE LABELS. SDTL provides machine-actionable descriptions of variable-level data transformation histories derived from any data transformation language. Provenance metadata represented in SDTL can be added to documentation in DDI and other metadata standards.
Supports Activities:
- Generate key machine-actionable metadata on production processes for inclusion in DDI and other
- Capture transformation processes for provenance purposes
- Capture a metadata life cycle that parallels the data life cycle
- Capture processing information in a structure that can be used to create syntax for a range of statistical packages
License
SDTL – Structured Data Transformation Language is free software you can distribute and/or modify under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Development Work
SDTL was created by the Continuous Capture of Metadata Project supported by the Data Infrastructure Building Blocks (DIBBs) program of the National Science Foundation through grant NSF ACI-1640575.
Future Work
SDTL is maintained and managed by the SDTL Working Group
Selected Articles
- Alter, George, Jack Gager, Pascal Heus, Carson Hunter, Sanda Ionescu, Jeremy Iverson, H. V. Jagadish, et al. “Capturing Data Provenance from Statistical Software.” International Journal of Digital Curation 16, no. 1 (2021): 14–14. https://doi.org/10.2218/ijdc.v16i1.763. [Best Paper Award IDCC 2021]
- Alter, George, Darrell Donakowski, Jack Gager, Pascal Heus, Carson Hunter, Sanda Ionescu, Jeremy Iverson, et al. “Provenance Metadata for Statistical Data: An Introduction to Structured Data Transformation Language (SDTL).” IASSIST Quarterly 44, no. 4 (18 2020). https://doi.org/10.29173/iq983.
- Song, Jie, H. V. Jagadish, and George Alter. “SDTA: An Algebra for Statistical Data Transformation.” Presented at the 33rd International Conference on Scientific and Statistical Database Management, Tampa, FL, USA, 2021. https://doi.org/10.1145/3468791.3468811 [Best Paper Award SSDBM 2021]
- Song, Jie, George Alter, and H. V. Jagadish. “Structured Data Transformation Algebra (SDTA) and Its Applications.” Distributed and Parallel Databases, July 20, 2022. https://doi.org/10.1007/s10619-022-07418-6.
Version 1.0 [current version]
Publication date: 2020-12-01
- SDTL User Guide - primary documentation and usage source
- Overview Documents:
- Introduction to SDTL
- C2Metadata Project
- Introduction to SDTL [Video] [Slides]