Demographics of the donors in the 'Tests, symptoms and living situation' study
Introduction
In previous blogposts, we have already discussed the sociodemographics and spatial distribution of donors.
In this blogpost, we will again look at the sociodemographic composition and spatial distribution of donors. For the analysis, we will look at two selection levels each and compare them with the actual population composition (Census update, as of 12/31/2019). To do this, we look at sociodemographic and spatial distribution using three variables: spatial distribution at the state level (classified using the first three digits of the given zip code), gender, and age (recorded in categories of 10). In the first group, the data of the self-report at new registration of all data donors are analyzed, in the second group the data of the sub-study “tests, symptoms and living situation”. We would like to investigate two questions:
- How much does the current sample of all individuals participating in the Data- Donation-App deviate from the true distribution in the population?
- How much do donors who participated in the “tests, symptoms, and living situation” sub-study differ from the overall donor sample?
Regional Distribution
In order to map the regional distribution of donors, we first look at the postal code of the donors. We assign these to the corresponding federal state. For reasons of data protection, we only collect the first three digits of the donor’s postal code (and thus no longer give a distribution at the county level). If we compare the spatial distribution within the federal states with the true distribution in the population (Figure 1, bar chart), we see different participation rates in the federal states. In the states of Hamburg and Berlin, we see that donors are disproportionately represented relative to the population distribution. For example, the value 0.41 of Hamburg indicates that, compared to the distribution in the population, about 40 percent more people from Hamburg are represented in the data donation sample than would be expected according to the population distribution in Hamburg. On the other hand, the states of Saxony-Anhalt, Thuringia, Mecklenburg-Western Pomerania and Saxony are clearly underrepresented. This distribution is also reflected when we look at all donors per 100,000 inhabitants per federal state (Figure 1, Map of Germany).
We see similar results when comparing the participants in the sub-study with the distribution in the population (Figure 2, bar chart). The distribution of the partial study almost matches the distribution of all data donation participants within the states relative to the population. Here, the relative distribution of partial study participants in the states of Hamburg and Berlin is also significantly higher than the distribution in the population. Here, too, the states of Saxony-Anhalt, Mecklenburg-Western Pomerania and Thuringia are most underrepresented, closely followed by Saxony. This distribution is also reflected when we look at all donors per 100,000 inhabitants per state (Figure 2, map).
Gender Distribution
In the next step, we look at the gender distribution among all data donors and among the donors in the sub-study.
At first glance, it is noticeable that male data donors are overrepresented and female data donors underrepresented compared to the total population.
We see the same result for the participants of the partial study. Here we also see that the gender difference in the sub-study increases.
In our old blog post on the sociodemographics of donors, we had to make a correction to the gender distribution. Already since the launch of the data donation app, more men than women participate in the data donation.
Age Distribution
Essentially, we see a different age distribution in our data and the age distribution of the overall population.
We see that both the participants in the data donation and the participants in the partial study are slightly underrepresented in the youngest age groups and strongly underrepresented in older age groups.
Significantly overrepresented compared to the total population are the age groups from 30 to 59.
The comparison between participants in the data donation and participants in the sub-study, shows minor selection effects.
With regard to age, however, the participants in the sub-study are similarly distributed compared to all donors in general.
If we look at the age distribution differentiated by gender, we see greater differences between the participants of the data donation and the participants of the sub-study among women, especially in both age groups from 40 to 59 years.
Outlook
In the next post, we will give you further insight into the socio-demographic and health situation of the data donors in the study “Tests, Symptoms and Living Situation”. We will present further comparisons, e.g. on the subjectively assessed health with reference data (also called benchmark) from representative health studies of the RKI. In this way, we are getting closer and closer to one of the goals of the sub-study, namely to describe the sample composition more precisely.