![]() |
![]() |
![]() |
![]() |
Statistics is the science of collecting, summarising, presenting and interpreting data, and using them to estimate the magnitude of associations and test hypotheses.
B Kirkwood and J Sterne (Essential Medical Statistics)

A sample consists of observations or measurements. Any aspect of an individual that is measured or recorded is called a variable. Examples of this are age; gender; diagnosis, serum amylase, CD4 count, all of which are called variables.
It is often useful to define the types of variables, as different statistical methods are applicable to each.
There are two broad categories of variables:
CATEGORICALVARIABLES:
Data can be presented in various forms depending on the type of data collected.
A frequency distribution is a table showing how often each value (or set of values) of the variable in question occurs in a data set.
A frequency table is used to summarise categorical or numerical data.,/p>
Frequencies are also presented as relative frequencies, that is, the percentage of the total number in the sample.
To summarise categorical data, count the number of observations in each category. These counts are called frequencies. In the following examples tabulations were produced in STATA using the dataset “famdata.dta”.
Example: A one-way frequency table:
| Gender | Frequency | Percent |
|---|---|---|
| Female | 118 | 48.96 |
| Male | 123 | 51.04 |
| Total | 241 | 100.0 |
| STATA command: tab gender | ||
Example: A two-way frequency table, also referred as 2x2 cross- tabulation or contingency table:
A frequency table with two categorical variables is called a contingency table because the figures found in the rows are contingent upon (dependent upon) those found in the columns.
| Smoke | Female | Male | Total |
|---|---|---|---|
| No | 56 (47.46%) | 36 (29.27%) | 92 |
| Yes | 62 (52.54%) | 87 (70.73%) | 149 |
| Total | 118 (100%) | 123 (100%) | 241 |
| STATA command: tab smoke gender, col | |||
This is a table showing the number of observations at different values or within certain ranges.
For a discrete variable the frequencies may be tabulated either for each value of the variable or for groups of values. With continuous variables, groups have to be formed.
The cumulative percentage for a value is the percentage less than or equal to that value
Example: Frequency distribution of household size (discrete variable).
| Household size | Frequency | Percent | Cumulative percent |
|---|---|---|---|
| 1 | 6 | 2.5% | 2.5% |
| 2 | 37 | 15.4% | 17.9% |
| 3 | 101 | 41.9% | 59.8% * |
| 4 | 61 | 25.3% | 85.1% |
| 5 | 25 | 10.4% | 95.5% |
| 6 | 11 | 4.6% | 100% | * Approximately 60% of the sample have less than 4 household members | STATA command: tab household |
Example: Frequency distribution of age in years (continuous variable).
| Age Group | Frequency | Percent | Cumulative percent |
|---|---|---|---|
| 15-19 | 44 | 18.26 | 18.26 |
| 20-29 | 60 | 24.90 | 43.15 * |
| 30-39 | 74 | 30.71 | 73.86 |
| 40-49 | 51 | 21.16 | 95.02 |
| 50-59 | 12 | 4.98 | 100.00 |
| Total | 241 | 100.00 | |
| * 43.2% of the farm workers are below 30 years of age | STATA command: tab agegroup | ||