June 15, 2002
Committee Members Present: Bjorn Henrichsen (Chair), Micah Altman, Grant Blank, Bill Block (for Wendy Thomas), Ernie Boyko, Bill Bradley, Cavan Capps, Dan Gillman, Peter Granda, Ann Green, Mike Haarman, Marc Maynard, Ken Miller, Tom Piazza, Richard Rockwell, Merrill Shanks, Mary Vardigan
DDI Alliance, charter, membership issues, etc.
The focus of this discussion was a draft charter, written by Richard Rockwell, to create a DDI Alliance. The DDI Alliance concept is an attempt to establish a new membership structure and funding base that will provide support so that the initiative can continue. The charter document provides for an Expert Committee with representation from the DDI Alliance membership, with each member of the Committee having a vote and thus a say in the future of the DDI. Under this new structure, the two host institutions, ICPSR and the Roper Center, may seek additional external funding for specific DDI-related projects. There are also provisions for a Steering Committee, made up of representatives from the Host Institutions and Associations, and a Secretariat to staff the day-to-day activities of the Alliance.
Another feature of the DDI Alliance is an open review process for revisions to the evolving DDI specification. This review process, which is loosely modeled on that of the World Wide Web Consortium ,will provide credibility, particularly with respect to the ISO community. It was pointed out that the charter needs to cover the situation in which members abstain from voting. Also, the criteria for a Trial Review of a proposal to move forward need to be simpler and possibly just include a provision for a call for objections.
There was a lot of discussion about whether institutions should be permitted to have multiple memberships from different departments. This creates the potential for abuse since large institutions would have more votes. It was reported that the standards community has wrestled with this issue and has decided that only the top level of an organization can have a vote. It was also pointed out that as the charter is currently written, the Director of the Alliance can override votes of the Expert Committee.
At present, it looks as if there are potentially 16 Alliance members in the U.S. and Europe. It was decided that federal data producers should not be charged since the DDI needs their support and we want to encourage their participation. Rather, there should be a mechanism to appoint experts in a partnering role.
The charter needs a section on how to amend it (changes to the charter should arise in the Steering Committee) and also needs to deal with institutionalizing XML expertise in the membership structure. Richard will incorporate these and other changes discussed and produce a new draft charter as soon as possible. After that, we will proceed to send the charter to the host institutions and associations and to potential members to see how many we can enlist. While it would be optimal to announce in our October submission to NIH that the Alliance is in operation, letters of intent from members should suffice.
Funding opportunities, including the October NIH competition
It was reported that there is an intention to submit a proposal in October to NIH for a project involving the DDI and longitudinal data. Also, three DDI-related applications have been approved in Europe: the second round of the European Social Survey (involving 24 countries), a metadata system put forward by the ZA and others, and a portal to European data coordinated by NSD.
The issue of whether PIs applying for grants could request funding for the Alliance was raised. It's possible that there could be a subcontract to the Alliance for activities like workshops. The funds would come in to the Secretariat and then be disbursed as needed.
Working Group reports
ISO standards group. Bill Bradley reported that the team working on the new WebDAIS system have made a lot of progress and have developed a data model incorporating elements from ISO 11179. The committee may need to spend a full day discussing the model. Having the data model with the ISO elements will make moving the standards process forward much easier.
Dan Gillman reported that for the DDI to become an official ISO standard we would need the support of five countries, and we already have four. The ISO process has several steps and takes about one and a half years until the specification becomes an international standard. It is better to put the specification forward as an XML Schema, rather than a DTD.
Complex data files group. This group worked mainly through e-mail and came up with a solution to complex data files that covers the simplest situation, i.e., nested files with a guaranteed physical order. The solution does not cover other types of hierarchical and relational structures with multiple files but does make nested files machine-actionable.
The recommendation was to add "nested" to the list of options for 3.1.3 fileStrc (File Structure) and to add the following attributes to 18.104.22.168 recGrp (Record Group): rtypeloq (the content would be the starting column location of the record type indicator variable on each record of the data file); rtypewidth (specifies the width ([default=1], for files with many different record types); and rtypevtype (specifies the variable type [default would be "numeric" but could also be "character"]).
Next, this group will address hierarchical and relational files more generally. What is needed is the ability to express a many-to-many relationship and keys for joining separate files. This should be part of DDI 1. We need to be able to describe both logical and physical stores.
Aggregate working group. This group had not met in person, but Wendy Thomas summarized the issues before this group in an e-mail. The Committee agreed that we should revise the DTD according to items 1-4 of Wendy's message, which basically are "clean-up" issues. Item 5 needs some consultation. The other items on the list will be delayed for now.
Geography working group. Working Group leader Ron Wilson provided a report on activities of this group, which held a meeting on March 16 at the University of Minnesota in Minneapolis and developed a list of recommendations for changes to the DTD to cover geographical elements and attributes. While adding these content models at this time would not cause any technical problems, it was recommended that no changes be made until the group meets again in August in Santa Barbara with the Center for Spatially Integrated Social Sciences (CSISS). Ron Wilson has requested funding from the CSISS for this meeting.
Report from DTD developer
Mike Haarman announced that he would be leaving his current position at the University of Minnesota but would like to continue to provide XML support to the Committee.
He also reported on providing a presentation layer for DDI and indicated that he would produce a set of default stylesheets and a readme file that would help implementers. He also discussed adding TEI equivalents of commonly used HTML formatting tags (p, strong, hi-h5, pre, ol, ul, li, span, div, and emph) to the DDI to provide for internal formatting. It was decided to include this content model for every element and revisit this issue later to determine which elements do not need such formatting. We also need an on/off switch for including the CALS table.
In terms of creating an XML Schema for DDI, it was mentioned that there is value in the data typing/binding provided by schemas and in the data model, but data typing can add a great deal of complexity and can be limiting. It was decided that Mike would develop a stub schema with text strings and no data binding for now. Ultimately, we will merge the Tag Library with the schema and use stylesheets for presentation of the package.
It was decided that ICPSR will be responsible for registering the DDI with OASIS.
Resolution of weights issue
This was a proposal for a revision to the DTD to cover the fact that statistics may be generated using any of a variety of weights in a data file. The proposed recommendation also makes the content models of 4.2 var (Variable), 4.2.14 sumStat (Summary Statistic), and 22.214.171.124 catStat (Category Statistic) parallel. This recommendation passed and will be implemented.
Status of Tag Library
Wendy has almost finished the modifications to the Tag Library involving the aggregate data extension. She needs to add examples and incorporate items 1-4 from her report.
Research for DDI 2: How to begin
The Committee developed a list of goals for DDI 2, including:
- Time series
- Aggregate data
- Object -oriented model
- Collection level, "families of datasets"
- Instructions for applications
- Best practices
- Support for multiple languages, if XML:lang doesn't cover this adequately
Versioning plan for the DDI
Typically, any backwards-incompatible version of an application requires a new version number. Another common convention is that stable versions are typically labeled with even numbers after the decimal point, while development versions are labeled with odd numbers. Accordingly, we should rename Version 1.01, our most recent stable version, as Version 1.2. Our development or "working" version 1.02.1 should move to Version 1.3. We will end up with a stable 1.4 by the end of the NSF grant in early 2003. These new conventions need to be spelled out on the DDI site.
Plan for moving forward, fall meeting date
We need to move ahead with another version of the draft charter, which can be circulated. Richard is heading this effort. Also, the following changes to the specification were approved and need to be implemented:
- Create stub schema and readme, with default stylesheet
- Incorporate Aggregate Working Group Report items 1-4
- Incorporate new content model for nested files
- Incorporate TEI equivalents for formatting
- Add new content models for weights
- Add on/off switch for CALS table model
Our next meeting will be held on October 7 in Washington, DC. This will be a final meeting of the DDI Committee as it is currently configured, before the Alliance comes into being formally.