Written by Emma Samman, Research Fellow at Overseas Development Institute and José Manuel Roche, Head of Research at Save the Children UK.
If we are going to ‘get to zero’, we need a data revolution that permits us to assess progress in the circumstances of disadvantaged and marginalized groups, whatever their size. Often these groups are harder to reach and/or discriminated against, and therefore not enjoying average societal progress. To inform decisions, countries need data that can be, as the High-level Panel on Post-2015 highlighted, “disaggregated by gender, geography, income, disability, and other categories, to make sure that no group is being left behind” (United Nations, 2013). Current data only partially allow such assessment. Data should not only identify groups based on gender, age, disability status, location, ethnicity or class but also on combinations of these (e.g., children with disabilities, or older people who are ethnic minorities). We need a real data revolution that matches the level of our ambition.
Household surveys have been the workhorse of data collection efforts to monitor the Millennium Development Goals (MDGs) and this is likely to continue under a new set of global goals. But they are insufficient to collect enough granular and timely data on the circumstances of disadvantaged and marginalized groups.
Why is this so challenging?
-Disaggregation is constrained by group size: Take the Demographic Health Survey (DHS), one of the international survey programs frequently used to monitor MDG progress worldwide. The sample for the DHS 2011 from Nepal consisted of an impressive 11,000 households containing over 47,000 people. Suppose we want to monitor malnutrition among children under five from rural areas in the Far-Western region: our sample size falls to 751 individuals. If we wanted to track performance among boys and girls, we are left with subsamples of about half that number – enough to be statistically representative, but with a relatively large error. If we want to add any additional filter – say smaller geographical regions or socioeconomic groups – our samples dwindle to such small sizes that it becomes very challenging to make any reliable inferences. Increasing sample sizes may be possible, in some cases – but the larger the sample, the more costly and difficult it becomes to secure high quality data. Pooling data across time is one possibility – but still rife with problems – it was estimated to take at least 8 years of survey data to obtain reportable estimates for some population subgroups in the US National Health Interview Survey. An alternative is to use census data – but these are only produced every decade, therefore inadequate for monitoring.
-Questionnaire space is limited: The problem is not only the representativeness and frequency of data; household surveys often do not register all the information needed for disaggregation. For example, DHS provides abundant information on health but does not capture information on income or consumption owing to length limitations; rather it provides a rough and ready wealth index based on asset ownership. In contrast, the World Bank’s Living Standard Measurement Surveys measures income and consumption comprehensively, but at the expense of detailed information on other dimensions. Some characteristics are also difficult to measure, such as ethnicity or disability. Short modules may provide a useful way of identifying particular groups –for example, the Washington Group’s short disability module or the experience of Brazil in measuring ethnicity in household surveys, but survey designers still need to make difficult choices about what to include.
-Oversampling particular groups is costly – and the costs are higher, the smaller the group in question. This too can lead to difficult choices. For example, white minorities were not included in the ‘ethnic boost’ sample in the UK Understanding Society survey that began in 2009.
How could the data revolution help?
There is certainly a need to adjust standard household surveys, for instance, to offer new short modules, to ask more questions of people other than the household head, and to oversample certain population groups in conventional ways. But household surveys cannot be all things to all people – if we add too many new elements, then we risk making them very blunt instruments – it becomes too costly to collect and data quality suffers.
We need to start thinking how to galvanize the collection of disaggregated data in light of the opportunities before us. Official statistics have their place – but increasingly, the distinction is not so much between official and unofficial data as between good and bad information, as Enrico Giovannini has noted. Possibilities include improving and expanding existing data sources and crucially, facilitating linkages between different types of data to capitalize on their value.
More data could be obtained from existing and emerging sources, for example:
-Using mobile technology to oversample small groups. The potential of mobile technology to enable people to report on their own experiences across a range of contexts – ranging from humanitarian relief situations to outbreaks of political violence – and in countries as diverse as Uganda, Brazil and Haiti is particularly exciting.
There is a need to explore further how these two approaches could complement household surveys in methodologically robust ways. Such approaches have the added transformative potential of making people agents of data collection themselves and enabling them to hold governments and others to account, though they need to be articulated with mechanisms to monitor responses to such initiatives.
At a systems level, promising directions for improving and linking data include:
-Renewed emphasis on building civil registration and vital statistics (CRVS) systems, and on linking administrative registries with household surveys – as in recent UK experiments – see here and here. To protect anonymity and facilitate the task in contexts with more complex residential address systems, linking using small areas or groups might be most fruitful.
-Exploring the potential of Big Data. Data collected from such diverse sources as call data records, satellite imagery and other varied sources holds promise – as in Abidjan, where mobile phone records led to detailed maps of poverty and of migration patterns. Such technologies could yield monetary measures of material wellbeing in small localities at a high frequency and low cost, but so far, the focus has been on monetary indicators, on the movement of people, and on spatial disaggregation.
If we find a way to make these connections effectively, then we can maximize the potential of the data revolution to advance an equity based agenda – otherwise, we risk of working in silos and not capitalizing on the full range of possibilities now open to us. In the worst scenario we may duplicate effort and waste resources without enough results.
Of course such initiatives are not from problems. Challenges are technical, ethnical and financial, but also political – posed by conflicting interest groups, questions around who is collecting what and why, and how should all these pieces of the puzzle fit together (and indeed, should they). Authorities may resist collecting data on particular groups – in 2011, India counted caste in its census for the first time since 1931 – while social stigma may render certain groups invisible. Many people have legitimate concerns about privacy. And some groups may have more access to new technologies, and more interest in reporting on their circumstances, than others. Nonetheless, we need to accept the scale of the challenge and be ambitious in our thinking.