# DDI Controlled Vocabulary for Summary Statistic Type

## Description

Specifies the type of summary statistic. Summary statistics are a single number representation of the characteristics of a set of values.

## Details

Short Name:
SummaryStatisticType
Long Name:
Summary Statistic Type
Version:
2.1
Version Notes:
The definitions for "Standard error of the mean", "Median", and "Standard deviation" were edited for clarity. Symbols were removed from those terms that had them, for consistency (symbols were not available for all terms). Edited terms are listed in "Version Changes".
Version Changes:
DEFINITION REPHRASED: StandardErrorOfMean; Median; StandardDeviation. TERM REPHRASED: ArithmeticMean; Mode; Median; Variance; StandardDeviation; CoefficientOfVariation; AverageAbsoluteDeviation; MedianAbsoluteDeviation.
Canonical URI:
urn:ddi-cv:SummaryStatisticType
Canonical URI of this version:
urn:ddi-cv:SummaryStatisticType:2.1
Location URI:
http://www.ddialliance.org/Specification/DDI-CV/SummaryStatisticType_2.1_Genericode1.0_DDI-CVProfile1.0.xml
Alternate format location URI:
http://www.ddialliance.org/Specification/DDI-CV/SummaryStatisticType_2.1.html
Alternate format location URI:
http://www.ddialliance.org/Specification/DDI-CV/SummaryStatisticType_2.1_InputSheet_Excel2003.xls
Agency Name:
DDI Alliance

Code List
Value of the Code Descriptive Term of the Code Definition of the Code
ArithmeticMean Arithmetic mean Mathematical average of a set of values. The mean is calculated by adding up two or more values and dividing the total by their number. In social/political science, it is usually the sum of the measurements divided by the number of subjects, or cases.
GeometricMean Geometric mean Average value of all data if extracting the nth root of the product of all (n) values. Rarely used in social sciences.
HarmonicMean Harmonic mean Average value of all data if calculating the reciprocal of the arithmetic mean of the reciprocal of values. Rarely used in social sciences.
TrimmedMean Trimmed mean The (arithmetic) mean calculated after discarding given parts of observations at the high and low end (e.g., interquartile mean when the lowest 25% and the highest 25% are discarded, and the mean of the remaining values is calculated).
StandardErrorOfMean Standard error of the mean Provides an estimate of how much the sample mean differs from the actual mean of the population.
Mode Mode The most frequently observed data value (Statistics Canada).
Median Median The middle value in a set of values arranged in ascending order. Half of the values fall below the median, and half above it. If the set has an even number of values, the median will be the average of the two middle ones.
ValidCases Valid Cases Cases with observations which are considered to be valid, i.e., providing substantial information and to be included for calculation.
InvalidCases Invalid cases Cases which are considered/defined as "missing" (e.g., not ascertained, not applicable, etc.), usually excluded from calculation.
Minimum Minimum The lowest valid value in a variable.
Maximum Maximum The highest valid value in a variable.
Range Range The range of valid values, i.e., all values that fall between the lowest and highest valid values.
Sum Sum The sum or total of the values, across all valid cases.
Variance Variance The variance is the mean square deviation of the variable around the average value. It reflects the dispersion of a frequency distribution around its mean (OECD Glossary of Statistics).
StandardDeviation Standard deviation Indicates the degree to which individuals within the sample differ from the sample mean.
CoefficientOfVariation Coefficient of variation Standard deviation divided by the mean.
AverageAbsoluteDeviation Average absolute deviation The average of the absolute differences between each value and the overall mean. Measure of statistical dispersion around the mean, alternative to Standard Deviation.
MedianAbsoluteDeviation Median absolute deviation The median absolute deviation from the median. Measure of statistical dispersion around the median.
FirstQuartile First quartile The first of three values which separate the total frequency of a distribution into four equal parts.
SecondQuartile Second quartile The second of three values which separate the total frequency of a distribution into four equal parts (= median).
ThirdQuartile Third quartile The third of three values which separate the total frequency of a distribution into four equal parts.
InterquartileRange Interquartile range The range between the first and third quartile values.
FirstQuintile First quintile The first of four values which separate the total frequency of a distribution into five equal parts.
SecondQuintile Second quintile The second of four values which separate the total frequency of a distribution into five equal parts.
ThirdQuintile Third quintile The third of four values which separate the total frequency of a distribution into five equal parts.
FourthQuintile Fourth quintile The fourth of four values which separate the total frequency of a distribution into five equal parts.
InterquintileRange Interquintile range The range between the first and fourth quintile values.
FirstDecile First decile The first of nine values which separate the total frequency of a distribution into ten equal parts.
SecondDecile Second decile The second of nine values which separate the total frequency of a distribution into ten equal parts.
ThirdDecile Third decile The third of nine values which separate the total frequency of a distribution into ten equal parts.
FourthDecile Fourth decile The fourth of nine values which separate the total frequency of a distribution into ten equal parts.
FifthDecile Fifth decile The fifth of nine values which separate the total frequency of a distribution into ten equal parts (= median).
SixthDecile Sixth decile The sixth of nine values which separate the total frequency of a distribution into ten equal parts.
SeventhDecile Seventh decile The seventh of nine values which separate the total frequency of a distribution into ten equal parts.
EighthDecile Eighth decile The eighth of nine values which separate the total frequency of a distribution into ten equal parts.
NinthDecile Ninth decile The ninth of nine values which separate the total frequency of a distribution into ten equal parts.
InterdecileRange Interdecile range The range between the first and ninth decile values.
OtherPercentile Other percentile A percentile not covered by any of the other percentile terms.
Beta1 Skewness A measure for the asymmetry of a probability distribution of a variable.
Beta2 Kurtosis A measure for the "peakedness" of a probability distribution of a variable.
ShapiroWilk Shapiro-Wilk Normality test statistics.
PercentageOfValidCases Percentage of valid cases Indicates the percentage of valid cases of the total number of cases.
PercentageOfInvalidCases Percentage of invalid cases Indicates the percentage of invalid cases of the total number of cases.
Other Other Use if the summary statistic type is known, but not found in the list.

## Usage

A classification of the type of summary statistic provided. Supports the use of an external controlled vocabulary. DDI strongly recommends the use of a widely shared controlled vocabulary to support interoperability.

Module Name Element Name
physicalinstance TypeOfSummaryStatistic

This vocabulary cannot be used for the element sumStat (4.3.14) in DDI 2.1, because this element already comes with a hard-coded controlled vocabulary in the "type" attribute. For using in DDI 2.5, select value "other" in the "type" attribute, and insert the appropriate value from the external CV in the "otherType" attribute. Use the complex element controlledVocabUsed (in the docDscr section) to identify the controlled vocabulary to which the selected term belongs.

Element Number in DDI 2.1 Element/Attribute Name
4.3.14 sumStat