Indicative data give you a sense of what’s happening and what people think. Such data can support decision-making by challenging conventional wisdom, identifying issues that were simply not thought about beforehand, and give insight into the ‘word on the street’. With some thought about who the respondents are, indicative data can be made to be reasonably reflective of the population being surveyed. One way polling companies do this is by making sure the sample contains the same proportion of people – by age, sex, and location – as the wider population. They sometimes apply ‘weights’ after the data collection to correct the under-representation of any one group. However, indicative data is not statistically representative and therefore not generalizable to the wider population.
Representative data let you generalize findings to the wider population. If you want your research to be able to legitimately say “people in country x think this” (rather than “our research participants think this”), then you need a representative sample. Representative, generalizable results rely on random (or ‘probabilistic’) sampling. Here, random means that everyone in the population of interest has an equal chance (probability) of being selected.
A simple random sample is best but this is only possible if there is a reliable list of the whole population (sampling frame). Given that this is usually never the case when it comes to populations of people, researchers developed clever alternative approaches, from stratified to cluster to multistage approaches. These methods help us get closer to a representative approach but none of them are perfect. And so there are formulas to work out how far the sample we selected differs from that of the whole population (this is ‘confidence’).
Representative samples are harder to generate with automated polling tools that rely on users to download applications, access certain (social media) websites, or own a phone, because those samples are naturally biased towards people who have access to these media. Usually, these population subgroups have more money. This matters less in countries where internet access is almost universal. It matters a whole lot more in countries where telecommunications remain very unevenly spread. Even in countries with relatively big economies like Nigeria, less than 50% of the population access the internet on a regular basis.
Traditional research isn’t perfect either. Traditional approaches might rely on ‘randomly’ selecting every fifth house in a village (random walk) or approaching every fifth person in a busy place. There are always ‘hidden’ parts of the population that are at least slightly less likely to be selected. Additionally, sometimes people refuse to participate. Whatever approach you take, it’s important to record the rates of groups that refuse to participate and, if necessary, apply corrective weights to the sample after completion.
Some approaches, such as that taken by Emani’s Rapid Survey product, seek to combine the best of tradition and technology, through two different approaches. The first leverages an extensive network, with a representative in every district ready to conduct random walks to build the sampling frame. Automations then randomly select respondents and collect responses. The second uses satellite data of known settlements, which are then randomly sampled (following a multistage cluster design). In both approaches, confidence and precision reach levels seen in traditional random sample surveys.