Harnessing the power of the “crowd” for data collection
Last month we talked about microsurveys as a possible solution for data collection in challenging contexts. This month our focus is on crowdsourced data collection.
Crowdsourced data collection is an innovative way to gather information from a huge user base. It rests on the idea that individuals are experts within their local environments. It’s a way to collect data on people’s opinions, observations, and experiences from a large number of individual users – with or without their knowledge. Scale, reach, speed and cost-effectiveness – enabled by technology and human traits of altruism, community, and reciprocity – make crowdsourcing powerful.
The idea of crowdsourcing has existed for at least a century, with government and non-profit bodies frequently enlisting volunteers to help them with information-based tasks that would be very expensive or simply unfeasible otherwise. Yet the term ‘crowdsourcing’ has been around for only a decade or so. Its popularity has coincided with the massive progress made with connectivity around the globe. It is the combination of crowdsourcing and automated data collection tools that have resulted in the proliferation of use cases which include conducting market research to social research to mapping. Some examples are:
- Wikipedia relies on about 120,000 active volunteers to keep things in shape.
- Citizen Science is helping academics to get fast and accurate reads on everything from water quality to the Christmas Bird Count, which has run for over a century.
- Google Maps harnesses the contributions of “on the ground” users who correct mistakes, leave reviews, and annotate their journeys.
- Rival Open Street Maps is rated highly, coverage is impressive (check this map out), and it apparently runs on less than $100,000 a year.
- The Missing Maps Project is helping humanitarian organisations to fill out unmapped areas in the developing world, with road names, buildings and other local information.
Crowdsourced vs. traditional data collection
Businesses, governments, and NGOs have long sought to engage with communities and other stakeholders. Digging deeper into local viewpoints, socio-political dynamics, and economic trends lets implementers and decision makers know what people need (and what they might oppose).
However, the traditional data collection methods that remain the norm among market research firms and in the humanitarian/ development world can be time consuming and expensive, especially in more dangerous and difficult areas. They require intermediaries to find research participants, who require management and may introduce bias. Even where access is possible, sampling and statistical robustness can be compromised.
Technology meets the power of the crowd
Crowd polling tools have proliferated, allowing you to ask questions of thousands of people you would not be able to listen to otherwise. Companies like Qriously and Qualtrics embed survey questions in social media adverts. Attest, Premise, and StreetBees build panels of respondents ready to respond via mobile applications. Others like Geopoll and Viamo reach people via SMS and automated phone calls, often in partnership with telecoms companies. Together, they bring powerful options to the table that let researchers reach thousands of users, consumers, beneficiaries, or stakeholders for a fraction of the price of traditional methods.
This opens up new possibilities for how we do research, especially in new and emerging markets. Traditional cross-sectional surveys (that happen once, at one point in time) can quickly become outdated. Automated options allow researchers to return to the field time and again with little extra expense, opening up more time series and longitudinal methods. Increased frequency of interaction over a longer time frame enhances rapport and understanding (especially about sensitive issues) between those who provide information and those who need it.
Automated methods also remove several intermediary steps, often allowing survey questions to be put directly to the end user. While this does not eliminate the risks of respondent bias, it does vastly reduce the uncertainties involved with hiring, training, and managing human teams.
Other benefits can include:
- Near real-time insights from large crowds highlighting concerns and needs before they become problematic.
- Elimination of travel time which is especially useful for large scale surveys.
- Access to groups that might not feel comfortable contributing to traditional face to face research processes.
- Less room for human error – less pressure on enumerators correctly understanding responses and accurately entering the data.
But…there are no silver bullets
A key differentiator between crowdsourcing tools is the validity of data. This is an important question that needs interrogating each time a data collection exercise is planned. You should ask yourself whether you need indicative or representative data.
Indicative data give you a sense of what’s happening and what people think whereas representative data lets you generalise findings to the wider population. To better understand the difference between indicative and representative data, read this blog post written by our CEO/ data geek. Indicative data is quicker and cheaper to generate than representative data using crowdsourcing tools.
As with any method, crowdsourcing relies on the willingness of people to participate and provide insights. Often people are motivated to participate because they believe in study objectives, as a result of financial or other incentives, or because of social obligation (i.e. politeness).
In automated polling, the limits of virtual and automated communication can make it more difficult to persuade people of the importance of taking part. Lack of human interaction weakens social obligation too. And so, the incentive factor becomes more important. This is not a problem in itself but must be monitored when considering the reliability of responses and possible attempts to game the system for personal gain.
Can we marry the best of traditional and human approaches?
Crowdsourcing offers immense opportunities to reach more people, more often. However, as we have seen, there are real limits to what can be done with automated polling data.
One approach to tackling data validity issues while retaining the benefits of crowdsourcing is to ‘humanize’ the methodology. Emani’s crowdsourcing engine relies on purpose-built networks of individuals who were, in the first instance, recruited into the network by real, identifiable people. Although this takes a bit longer than reaching out by SMS or a downloadable app, it means that we are able to explain Emani’s social mission, the benefits of being part of the community, and our parallel incentive structure.
Network members grow the network on request, and follow quotas to seek out groups that might evade other sampling approaches. To prevent groups becoming too homogenous, network members are limited in the number they can recruit. This approach draws on respondent-driven sampling methodologies, which have become pivotal in the public health and criminology fields.
Emani is motivated to build inclusive networks that do not exclude the (digitally) illiterate or disconnected. When someone joins the network, we ask them their preferred means of communication (WhatsApp, SMS, online form, phone call, or in-person visit) so that survey requests can be made in the way that best suits them.
It’s not perfect but we believe it’s a step in the right direction. The combinations of technology and human ingenuity have opened up vast potential for engaging humans in generating data, contributing to monitoring social impact and decision-making in never-before-seen ways. Crowdsourcing with a human touch and attention to network mobilisation and maintenance helps us maximize this potential.