Description:
Information about the data file(s) that comprises a collection. This section can be repeated for collections with multiple files.
The "URI" attribute may be a URN or a URL that can be used to retrieve the file. The "sdatrefs" are summary data description references that record the ID values of all elements within the summary data description section of the Study Description that might apply to the file. These elements include: time period covered, date of collection, nation or country, geographic coverage, geographic unit, unit of analysis, universe, and kind of data. The "methrefs" are methodology and processing references that record the ID values of all elements within the study methodology and processing section of the Study Description that might apply to the file. These elements include information on data collection and data appraisal (e.g., sampling, sources, weighting, data cleaning, response rates, and sampling error estimates). The "pubrefs" attribute provides a link to publication/citation references and records the ID values of all citations elements within Other Study Description Materials or Other Study-Related Materials that pertain to this file. "Access" records the ID values of all elements in the Data Access section that describe access conditions for this file.
Remarks: When a codebook documents two different physical instantiations of a data file, e.g., logical record length (or OSIRIS) and card-image version, the Data File Description should be repeated to describe the two separate files. An ID should be assigned to each file so that in the Variable section the location of each variable on the two files can be distinguished using the unique file IDs.
Example(s):
<fileDscr ID="CARD-IMAGE" URI="www.icpsr.umich.edu/cgi-bin/archive.prl?path=ICPSR&num=7728"/> <fileDscr ID="LRECL" URI="www.icpsr.umich.edu/cgi-bin/archive.prl?path=ICPSR&num=7728"/>
Description: Information about the data file.
Description: Contains a short title that will be used to distinguish a particular file/part from other files/parts in the data collection.
Example(s):
<fileName ID="File1">Second-Generation Children Data </fileName>
Description: Abstract or description of the file. A summary describing the purpose, nature, and scope of the data file, special characteristics of its contents, major subject areas covered, and what questions the PIs attempted to answer when they created the file. A listing of major variables in the file is important here. In the case of multi-file collections, this uniquely describes the contents of each file.
Example(s):
<fileCont>Part 1 contains both edited and constructed variables describing demographic and family relationships, income, disability, employment, health insurance status, and utilization data for all of 1987. </fileCont>
Description: Type of file structure. The attribute "type" is used to indicate hierarchical, rectangular, relational, or nested (the default is rectangular). If the file is rectangular, the next relevant element is File Dimensions.
Description: Used to describe record groupings if the file is hierarchical or relational. The attribute "recGrp" allows a record group to indicate subsidiary record groups which nest underneath; this allows for the encoding of a hierarchical structure of record groups. The attribute "rectype" indicates the type of record, e.g., "A records" or "Household records." The attribute "keyvar" is an IDREF that provides the link to other record types. In a hierarchical study consisting of individual and household records, the "keyvar" on the person record will indicate the household to which it belongs. The attribute "rtypeloc" indicates the starting column location of the record type indicator variable on each record of the data file. The attribute "rtypewidth" specifies the width, for files with many different record types. The attribute "rtypevtype" specifies the type of the indicator variable. The "recidvar" indicates the variable that identifies the record group.
Example(s):
<fileStrc type="hierarchical"> <recGrp rectype="Person" keyvar="HHDID"> <labl>CPS 1999 Person-Level Record</labl> <recDimnsn> <varQnty>133</varQnty> <caseQnty>1500</caseQnty> <logRecL>852</logRecL> </recDimnsn> </recGrp> </fileStrc>
Description: A short description of the parent element. In the variable label, the length of this phrase may depend on the statistical analysis system used (e.g., some versions of SAS permit 40-character labels, while some versions of SPSS permit 120 characters), although the DDI itself imposes no restrictions on the number of characters allowed. A "level" attribute is included to permit coding of the level to which the label applies, i.e. record group, variable group, variable, category group, category, nCube group, nCube, or other study-related materials. The "vendor" attribute was provided to allow for specification of different labels for use with different vendors' software. The attribute "country" allows for the denotation of country-specific labels. The "sdatrefs" attribute records the ID values of all elements within the Summary Data Description section of the Study Description that might apply to the label. These elements include: time period covered, date of collection, nation or country, geographic coverage, geographic unit, unit of analysis, universe, and kind of data.
Description: Information about the physical characteristics of the record. The "level" attribute on this element should be set to "record".
Description: Number of variables.
Example(s):
<varQnty>27</varQnty>
Description: Number of variables.
Example(s):
<varQnty>27</varQnty>
Description: Number of cases or observations.
Example(s):
<caseQnty>1011</caseQnty>
Description: Logical record length, i.e., number of characters of data in the record.
Example(s):
<logRecL>27</logRecL>
Description: Logical record length, i.e., number of characters of data in the record.
Example(s):
Example: <logRecL>27</logRecL>
Description:
For clarifying information/annotation regarding the parent element.
The attributes for notes permit a controlled vocabulary to be developed ("type" and "subject"), indicate the "level" of the DDI to which the note applies (study, file, variable, etc.), and identify the author of the note ("resp").
Example(s):
<docDscr><verStmt><notes resp="Jane Smith">Additional information on derived variables has been added to this marked-up version of the documentation.</notes></verStmt></docDscr> <docDscr><citation><notes resp="Jane Smith">This citation was prepared by the archive based on information received from the markup authors.</notes></citation></docDscr> <docSrc><verStmt><notes resp="Jane Smith">The source codebook was produced from original hardcopy materials using Optical Character Recognition (OCR).</notes><verStmt> </docSrc> <docSrc><notes>A machine-readable version of the source codebook was supplied by the Zentralarchiv</notes></docSrc> <docDscr><notes>This Document Description, or header information, can be used within an electronic resource discovery environment.</notes></docDscr> <stdyDscr><verStmt><notes resp="Jane Smith">Data for 1998 have been added to this version of the data collection.</notes></verStmt></stdyDscr> <stdyDscr><citation><notes resp="Jane Smith">This citation was sent to ICPSR by the agency depositing the data.</notes></citation></stdyDscr> <stdyInfo><notes>Data on employment and income refer to the preceding year, although demographic data refer to the time of the survey.</notes></stdyInfo> <method><notes>Undocumented codes were found in this data collection. Missing data are represented by blanks.</notes></method> <method><notes>For this collection, which focuses on employment, unemployment, and gender equality, data from EUROBAROMETER 44.3: HEALTH CARE ISSUES AND PUBLIC SECURITY, FEBRUARY-APRIL 1996 (ICPSR 6752) were merged with an oversample.</notes></method> <setAvail><notes> Data from the Bureau of Labor Statistics used in the analyses for the final report are not provided as part of this collection.</notes></setAvail> <dataAccs><notes>Users should note that this is a beta version of the data. The investigators therefore request that users who encounter any problems with the dataset contact them at the above address.</notes></dataAccs> <fileStrc><notes>The number of arrest records for an individual is dependent on the number of arrests an offender had.</notes></fileStrc> <fileTxt><verStmt><notes>Data for all previously-embargoed variables are now available in this version of the file.</notes></verStmt></fileTxt> <fileDscr><notes>There is a restricted version of this file containing confidential information, access to which is controlled by the principal investigator.</notes> </fileDscr> <varGrp><notes>This variable group was created for the purpose of combining all derived variables.</notes></varGrp> <varGrp><notes source="archive" resp="John Data">This variable group and all other variable groups in this data file were organized according to a schema developed by the adhoc advisory committee. </notes></varGrp> <nCubeGrp><notes>This nCube Group was created for the purpose of presenting a cross- tabulation between variables "Tenure" and "Age of householder."</notes></nCubeGrp> <valrng><notes subject="political party">Starting with Euro-Barometer 2 the coding of this variable has been standardized following an approximate ordering of each country's political parties along a "left" to "right" continuum in the first digit of the codes. Parties coded 01-39 are generally considered on the "left", those coded 40-49 in the "center", and those coded 60-89 on the "right" of the political spectrum. Parties coded 50-59 cannot be readily located in the traditional meaning of "left" and "right". The second digit of the codes is not significant to the "left-right" ordering. Codes 90-99 contain the response "other party" and various missing data responses. Users may modify these codings or part of these codings in order to suit their specific needs. </notes> </valrng> <invalrng><notes>Codes 90-99 contain the response "other party" and various missing data responses. </notes></invalrng> <var><verStmt><notes>The labels for categories 01 and 02 for this variable, were inadvertently switched in the first version of this variable and have now been corrected.</notes></verStmt></var> <var><notes>This variable was created by recoding location of residence to Census regions.</notes></var> <nCube><verStmt><notes>The labels for categories 01 and 02 in dimension 1 were inadvertently switched in the first version of the cube, and have now been corrected. </notes></verStmt></nCube> <nCube><notes>This nCube was created to meet the needs of local low income programs in determining eligibility for federal funds.</notes></nCube> <dataDscr><notes>The variables in this study are identical to earlier waves. </notes> </dataDscr> <otherMat><notes>Users should be aware that this questionnaire was modified during the CAI process.</notes></otherMat>
Description: Dimensions of the overall file.
Description: Number of cases or observations.
Example(s):
<caseQnty>1011</caseQnty>
Description: Number of variables.
Example(s):
<varQnty>27</varQnty>
Description: Number of variables.
Example(s):
<varQnty>27</varQnty>
Description: Logical record length, i.e., number of characters of data in the record.
Example(s):
<logRecL>27</logRecL>
Description: Logical record length, i.e., number of characters of data in the record.
Example(s):
Example: <logRecL>27</logRecL>
Description: Records per case in the file. This element should be used for card-image data or other files in which there are multiple records per case.
Example(s):
<dimensns><recPrCas>5</recPrCas></dimensns>
Description: Records per case in the file. This element should be used for card-image data or other files in which there are multiple records per case.
Example(s):
<dimensns><recPrCas>5</recPrCas></dimensns>
Description: Overall record count in file. Particularly helpful in instances such as files with multiple cards/decks or records per case.
Example(s):
<dimensns><recNumTot>2400</recNumTot></dimensns>
Description: Types of data files include raw data (ASCII, EBCDIC, etc.) and software-dependent files such as SAS datasets, SPSS export files, etc. If the data are of mixed types (e.g., ASCII and packed decimal), state that here. Note that the element varFormat permits specification of the data format at the variable level. The "charset" attribute allows one to specify the character set used in the file, e.g., US-ASCII, EBCDIC, UNICODE UTF-8, etc.
Example(s):
<fileType charset="US-ASCII">ASCII data file</fileType>
Description: Physical format of the data file: Logical record length format, card-image format (i.e., data with multiple records per case), delimited format, free format, etc.
Example(s):
<format>comma-delimited</format>
Description: Indicates whether the file was produced at an archive or produced elsewhere.
Example(s):
<filePlac>Washington, DC: United States Department of Commerce, Bureau of the Census</filePlac>
Description: Indicate here, at the file level, the types of checks and operations performed on the data file. A controlled vocabulary may be developed for this element in the future. The following examples are based on ICPSR's Extent of Processing scheme:
Example(s):
<dataChck>The archive produced a codebook for this collection.</dataChck> <dataChck>Consistency checks were performed by Data Producer/ Principal Investigator.</dataChck> <dataChck>Consistency checks performed by the archive.</dataChck> <dataChck>The archive generated SAS and/or SPSS data definition statements for this collection.</dataChck> <dataChck>Frequencies were provided by Data Producer/Principal Investigator.</dataChck> <dataChck>Frequencies provided by the archive.</dataChck> <dataChck>Missing data codes were standardized by Data Producer/ Principal Investigator.</dataChck> <dataChck>Missing data codes were standardized by the archive.</dataChck> <dataChck>The archive performed recodes and/or calculated derived variables. </dataChck> <dataChck>Data were reformatted by the archive.</dataChck> <dataChck>Checks for undocumented codes were performed by Data Producer/Principal Investigator.</dataChck> <dataChck>Checks for undocumented codes were performed by the archive.</dataChck>
Description: Processing status of the file. Some data producers and social science data archives employ data processing strategies that provide for release of data and documentation at various stages of processing.
Example(s):
<ProcStat>Available from the DDA. Being processed.</ProcStat> <ProcStat>The principal investigator notes that the data in Public Use Tape 5 are released prior to final cleaning and editing, in order to provide prompt access to the NMES data by the research and policy community.</ProcStat>
Description: This element can be used to give general information about missing data, e.g., that missing data have been standardized across the collection, missing data are present because of merging, etc.
Example(s):
<dataMsng>Missing data are represented by blanks.</dataMsng> <dataMsng>The codes "-1" and "-2" are used to represent missing data.</dataMsng>
Description: Software used to produce the work. A "version" attribute permits specification of the software version number. The "date" attribute is provided to enable specification of the date (if any) for the software release. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example(s):
<docDscr><citation><prodStmt><software version="1.0">MRDC Codebook Authoring Tool</software> </prodStmt></citation></docDscr> <docDscr><citation><prodStmt><software version="8.0">Arbortext Adept Editor</software> </prodStmt></citation></docDscr> <docDscr><docSrc><prodStmt><software version="4.0">PageMaker</software></prodStmt></docSrc></docDscr> <stdyDscr><citation><prodStmt><software version="6.12">SAS</software></prodStmt></citation></stdyDscr> <fileTxt><software version="6.12">The SAS transport file was generated by the SAS CPORT procedure. </software></fileTxt>
Description: Version statement for the work at the appropriate level: marked-up document; marked-up document source; study; study description, other material; other material for study. A version statement may also be included for a data file, a variable, or an nCube.
Example(s):
<verStmt><version type="version" date="1999-01-25">Second version</version>
Description: Also known as release or edition. If there have been substantive changes in the data/documentation since their creation, this statement should be used at the appropriate level. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the "date" attribute.
Example(s):
<version type="edition" date="1999-01-25">Second ICPSR Edition</version> <var><verStmt><version type="version" date="1999-01-25">Second version of V25</version></verStmt> </var> <nCube><verStmt><version type="version" date="1999-01-25">Second version of N25</version></verStmt> </nCube>
Description: The organization or person responsible for the version of the work.
Example(s):
<verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp> <verResp>Inter-university Consortium for Political and Social Research</verResp> <var><verStmt><verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp></verStmt></var> <nCube><verStmt><verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp></verStmt></nCube>
Description:
For clarifying information/annotation regarding the parent element.
The attributes for notes permit a controlled vocabulary to be developed ("type" and "subject"), indicate the "level" of the DDI to which the note applies (study, file, variable, etc.), and identify the author of the note ("resp").
Example(s):
<docDscr><verStmt><notes resp="Jane Smith">Additional information on derived variables has been added to this marked-up version of the documentation.</notes></verStmt></docDscr> <docDscr><citation><notes resp="Jane Smith">This citation was prepared by the archive based on information received from the markup authors.</notes></citation></docDscr> <docSrc><verStmt><notes resp="Jane Smith">The source codebook was produced from original hardcopy materials using Optical Character Recognition (OCR).</notes><verStmt> </docSrc> <docSrc><notes>A machine-readable version of the source codebook was supplied by the Zentralarchiv</notes></docSrc> <docDscr><notes>This Document Description, or header information, can be used within an electronic resource discovery environment.</notes></docDscr> <stdyDscr><verStmt><notes resp="Jane Smith">Data for 1998 have been added to this version of the data collection.</notes></verStmt></stdyDscr> <stdyDscr><citation><notes resp="Jane Smith">This citation was sent to ICPSR by the agency depositing the data.</notes></citation></stdyDscr> <stdyInfo><notes>Data on employment and income refer to the preceding year, although demographic data refer to the time of the survey.</notes></stdyInfo> <method><notes>Undocumented codes were found in this data collection. Missing data are represented by blanks.</notes></method> <method><notes>For this collection, which focuses on employment, unemployment, and gender equality, data from EUROBAROMETER 44.3: HEALTH CARE ISSUES AND PUBLIC SECURITY, FEBRUARY-APRIL 1996 (ICPSR 6752) were merged with an oversample.</notes></method> <setAvail><notes> Data from the Bureau of Labor Statistics used in the analyses for the final report are not provided as part of this collection.</notes></setAvail> <dataAccs><notes>Users should note that this is a beta version of the data. The investigators therefore request that users who encounter any problems with the dataset contact them at the above address.</notes></dataAccs> <fileStrc><notes>The number of arrest records for an individual is dependent on the number of arrests an offender had.</notes></fileStrc> <fileTxt><verStmt><notes>Data for all previously-embargoed variables are now available in this version of the file.</notes></verStmt></fileTxt> <fileDscr><notes>There is a restricted version of this file containing confidential information, access to which is controlled by the principal investigator.</notes> </fileDscr> <varGrp><notes>This variable group was created for the purpose of combining all derived variables.</notes></varGrp> <varGrp><notes source="archive" resp="John Data">This variable group and all other variable groups in this data file were organized according to a schema developed by the adhoc advisory committee. </notes></varGrp> <nCubeGrp><notes>This nCube Group was created for the purpose of presenting a cross- tabulation between variables "Tenure" and "Age of householder."</notes></nCubeGrp> <valrng><notes subject="political party">Starting with Euro-Barometer 2 the coding of this variable has been standardized following an approximate ordering of each country's political parties along a "left" to "right" continuum in the first digit of the codes. Parties coded 01-39 are generally considered on the "left", those coded 40-49 in the "center", and those coded 60-89 on the "right" of the political spectrum. Parties coded 50-59 cannot be readily located in the traditional meaning of "left" and "right". The second digit of the codes is not significant to the "left-right" ordering. Codes 90-99 contain the response "other party" and various missing data responses. Users may modify these codings or part of these codings in order to suit their specific needs. </notes> </valrng> <invalrng><notes>Codes 90-99 contain the response "other party" and various missing data responses. </notes></invalrng> <var><verStmt><notes>The labels for categories 01 and 02 for this variable, were inadvertently switched in the first version of this variable and have now been corrected.</notes></verStmt></var> <var><notes>This variable was created by recoding location of residence to Census regions.</notes></var> <nCube><verStmt><notes>The labels for categories 01 and 02 in dimension 1 were inadvertently switched in the first version of the cube, and have now been corrected. </notes></verStmt></nCube> <nCube><notes>This nCube was created to meet the needs of local low income programs in determining eligibility for federal funds.</notes></nCube> <dataDscr><notes>The variables in this study are identical to earlier waves. </notes> </dataDscr> <otherMat><notes>Users should be aware that this questionnaire was modified during the CAI process.</notes></otherMat>
Description: This element maps individual data entries to one or more physical storage locations. It is used to describe the physical location of aggregate/tabular data in cases where the nCube model is employed.
Description: Identifies a physical storage location for an individual data entry, serving as a link between the physical location and the logical content description of each data item. The attribute "varRef" is an IDREF that points to a discrete variable description. If the data item is located within an nCube (aggregate data), use the attribute "nCubeRef" (IDREF) to point to the appropriate nCube and the element CubeCoord to identify the coordinates of the data item within the nCube.
Description: This is an empty element containing only the attributes listed below. It is used to identify the coordinates of the data item within a logical nCube describing aggregate data. CubeCoord is repeated for each dimension of the nCube giving the coordinate number ("coordNo") and coordinate value ("coordVal"). Coordinate value reference ("cordValRef") is an ID reference to the variable that carries the coordinate value. The attributes provide a complete coordinate location of a cell within the nCube.
Example(s):
<CubeCoord coordNo="1" coordVal="3"/> <CubeCoord coordNo="2" coordVal="7"/> <CubeCoord coordNo="3" coordVal="2"/>
Description:
This is an empty element containing only the attributes listed below. Attributes include "type" (type of file structure: rectangular, hierarchical, two-dimensional, relational), "recRef" (IDREF link to the appropriate file or recGrp element within a file), "startPos" (starting position of variable or data item), "endPos" (ending position of variable or data item), "width" (number of columns the variable/data item occupies), "RecSegNo" (the record segment number, deck or card number the variable or data item is located on), and "fileid" (an IDREF link to the fileDscr element for the file that includes this physical location).
Remarks: Where the same variable is coded in two different files, e.g., a fixed format file and a relational database file, simply repeat the physLoc element with the alternative location information. Note that if there is no width or ending position, then the starting position should be the ordinal position in the file, and the file would be described as free-format. New attributes will be added as other storage formats are described within the DDI.
Example(s):
<physLoc type="rectangular" recRef="R1" startPos="55" endPos="57" width="3"/> <physLoc type="hierarchical" recRef="R6" startPos="25" endPos="25" width="1"/>
Description:
For clarifying information/annotation regarding the parent element.
The attributes for notes permit a controlled vocabulary to be developed ("type" and "subject"), indicate the "level" of the DDI to which the note applies (study, file, variable, etc.), and identify the author of the note ("resp").
Example(s):
<docDscr><verStmt><notes resp="Jane Smith">Additional information on derived variables has been added to this marked-up version of the documentation.</notes></verStmt></docDscr> <docDscr><citation><notes resp="Jane Smith">This citation was prepared by the archive based on information received from the markup authors.</notes></citation></docDscr> <docSrc><verStmt><notes resp="Jane Smith">The source codebook was produced from original hardcopy materials using Optical Character Recognition (OCR).</notes><verStmt> </docSrc> <docSrc><notes>A machine-readable version of the source codebook was supplied by the Zentralarchiv</notes></docSrc> <docDscr><notes>This Document Description, or header information, can be used within an electronic resource discovery environment.</notes></docDscr> <stdyDscr><verStmt><notes resp="Jane Smith">Data for 1998 have been added to this version of the data collection.</notes></verStmt></stdyDscr> <stdyDscr><citation><notes resp="Jane Smith">This citation was sent to ICPSR by the agency depositing the data.</notes></citation></stdyDscr> <stdyInfo><notes>Data on employment and income refer to the preceding year, although demographic data refer to the time of the survey.</notes></stdyInfo> <method><notes>Undocumented codes were found in this data collection. Missing data are represented by blanks.</notes></method> <method><notes>For this collection, which focuses on employment, unemployment, and gender equality, data from EUROBAROMETER 44.3: HEALTH CARE ISSUES AND PUBLIC SECURITY, FEBRUARY-APRIL 1996 (ICPSR 6752) were merged with an oversample.</notes></method> <setAvail><notes> Data from the Bureau of Labor Statistics used in the analyses for the final report are not provided as part of this collection.</notes></setAvail> <dataAccs><notes>Users should note that this is a beta version of the data. The investigators therefore request that users who encounter any problems with the dataset contact them at the above address.</notes></dataAccs> <fileStrc><notes>The number of arrest records for an individual is dependent on the number of arrests an offender had.</notes></fileStrc> <fileTxt><verStmt><notes>Data for all previously-embargoed variables are now available in this version of the file.</notes></verStmt></fileTxt> <fileDscr><notes>There is a restricted version of this file containing confidential information, access to which is controlled by the principal investigator.</notes> </fileDscr> <varGrp><notes>This variable group was created for the purpose of combining all derived variables.</notes></varGrp> <varGrp><notes source="archive" resp="John Data">This variable group and all other variable groups in this data file were organized according to a schema developed by the adhoc advisory committee. </notes></varGrp> <nCubeGrp><notes>This nCube Group was created for the purpose of presenting a cross- tabulation between variables "Tenure" and "Age of householder."</notes></nCubeGrp> <valrng><notes subject="political party">Starting with Euro-Barometer 2 the coding of this variable has been standardized following an approximate ordering of each country's political parties along a "left" to "right" continuum in the first digit of the codes. Parties coded 01-39 are generally considered on the "left", those coded 40-49 in the "center", and those coded 60-89 on the "right" of the political spectrum. Parties coded 50-59 cannot be readily located in the traditional meaning of "left" and "right". The second digit of the codes is not significant to the "left-right" ordering. Codes 90-99 contain the response "other party" and various missing data responses. Users may modify these codings or part of these codings in order to suit their specific needs. </notes> </valrng> <invalrng><notes>Codes 90-99 contain the response "other party" and various missing data responses. </notes></invalrng> <var><verStmt><notes>The labels for categories 01 and 02 for this variable, were inadvertently switched in the first version of this variable and have now been corrected.</notes></verStmt></var> <var><notes>This variable was created by recoding location of residence to Census regions.</notes></var> <nCube><verStmt><notes>The labels for categories 01 and 02 in dimension 1 were inadvertently switched in the first version of the cube, and have now been corrected. </notes></verStmt></nCube> <nCube><notes>This nCube was created to meet the needs of local low income programs in determining eligibility for federal funds.</notes></nCube> <dataDscr><notes>The variables in this study are identical to earlier waves. </notes> </dataDscr> <otherMat><notes>Users should be aware that this questionnaire was modified during the CAI process.</notes></otherMat>