Writing about a data revolution: between rhetoric and practical action

Written by Morten Jerven, Associate Professor at Simon Fraser University.

A world that counts” is a cleverly crafted motivational manifest.  But it is not a practical roadmap towards the applying a ‘data revolution’ to the sustainable development goals agenda. I identify the main weaknesses in the report as: Conflating statistics with data; lacking a theory for how better data leads to better decisions; assuming non proven synergies between official record and big data; equating data demand with data supply; ignoring the costs of data. Finally, I warn against the belief that only what is counted counts.

1. Statistics ≠ Data.

Throughout the document we are told that “This is the data revolution”.  We may be somewhere around a historical moment that might best be thought of as an information revolution or a document revolution. Literally, the word data means:  ‘what is given’, and there certainly are more fleetingly available traces of human activity or ‘observations’ that can be analyzed now than before. The combination of ability to capture and analyze these moments may be seriously transformative. But that is a question for another day.

The report is not about diagnosing world history, nor is it an assessment the relative knowledge impact of the invention of the book press and versus the appearance of the microprocessor. To envisage the sustainable development goals (SDGs) process as a data revolution is imaginative, but the event that the report is addressing is an important step in the evolution of development statistics, it is not a revolution. The name of this particular game is statistics.  It is about numbers that can be shaped into targets and indicators. The bottom line is the proposal to launch 169 targets to monitor development. It is about reaching a compromise about how much resources donors and official institutions are willing to devote to an organizing framework for donors and developing country governments. The rest is hype.

Data sound bigger and better, but if is about investing in an international statistical framework to monitor development in the next 15 years the emphasis should really be on statistics, not data. Here’s a statistic for you: the first time the word ‘statistics’ is mentioned in the main text is on page 18. And it is only used 16 times in the entire report. By comparison, ‘data’ is used 504 times.  Most times these words are not interchangeable. The report suggests that the UN should develop a ‘global consensus on data’. What is that supposed to mean? That statement is meaningless if you exchange the word ‘data’ with ‘observations’, ‘knowledge’ or ‘evidence’. It can however make sense if you talk about ‘statistics’.  International organizations do have a natural role when it comes to developing global standards for official statistics.  Reaching a global consensus on how observations and evidence constitute knowledge is futile.

2. Data ≠ Decisions & Big Data  ≠ Official Record

What is the link between data and decisions? We are told that “an explosion in the volume of data” will ultimately lead to “more empowered people, better policies, better decisions and greater participation and accountability, leading to better outcomes for people and the planet” (p. 6). This is not a theory of change.  It is a statement of belief and hope.

I have not seen much scholarly evidence proving the thesis that politicians make bad decisions because they do not have enough data. Nor have anyone walked me through the political economy models that predicts how better data makes for better decisions. What has been convincingly shown in scholarly work is that political priorities mirror statistical priorities, and that the activities of states leave a fingerprint in the statistical record. In research on the use of global indices, there is some evidence of the “Hawthorne effect” – namely that actors change their behavior to meet external expectations when they are being observed. However, there is equally convincing evidence of global monitoring results in bad statistics because producers of administrative statistics are incentivized to misreport to appear to reach targets.

In short, predominantly the causality runs from policy to statistics. Of course ‘numbers’ can be used for mobilization. Making up with numbers on how big illicit finance or global slavery may be considered good practice in some advocacy circles, but the SDGs are fundamentally about reforming aspects of the official record to facilitate monitoring between donors and country governments. Data for advocacy is a different animal, indeed sometimes several species removed from statistics for government policy.

The wielders of big data can certainly harness great powers. But when it comes to big data and development, the rewards have been meagre so far. There are clear benefits for researchers, but the link to statistics and policy is weak. So far bigger has not been better in development. Take the example of using light emissions data to space by satellites to capture economic growth not recorded by official sources. It may be a useful adjustment for scholars, but I doubt that a central bank will link interest rates to light emissions in space, or that the international financial institutions will rank countries from ‘lightest’ to ‘darkest’ anytime soon.

One needs to have a clear idea of what the policy circle looks like, and how this new list of development targets will affect that policy circle. Just advising more ‘data’ without a clear idea of what that word even means, let alone having any theory of how it affects policy behavior is not good enough. Granted the potential of having 169 SDGs might be great for global monitoring, for donors and even for researchers, but the bottom line is that any SDG list will have a direct impact on the local statistical evolution. It will not only make it bigger and better everywhere, there are perverse incentives and there are opportunity costs.

3. Demand ≠ Supply & Cost ≠ Investment

More data is only better data if they contain meaningful information and there are no opportunity costs to its supply.  But most data do have a cost. In particular, survey data are expensive to collect, disseminate and analyze. How expensive? The MDG had 8 goals and 48 targets and calculates the cost of supplying all those data on an annual basis for all measured countries. In the 1990-2015 MDG database there were more gaps than observations. The previous agenda suffered from a mismatch between ambition in monitoring and ability in measurement. I have suggested that the previous MDG agenda could have cost something in the vicinity of $27 billion – just in marginal survey cost. This does not take into account the needs for statistical capacity building or the cost of providing administrative data. The post-2015 MDG might be end up being much more expensive.

The potential benefits of more data and better data should be weighed up against the very real cost of providing statistics. This is not only a question of sheer financial cost. There are important opportunity costs. If resources at the statistical office are pulled from regular reporting to government institutions towards filling gaps in the global monitoring database, increases in financial funds available for specific measurement projects may actually have unintended negative consequences for the country level capacity to formulate and monitor independent policy. To believe that higher demand will be automatically met by high quality supply is naïve.

The report only uses the word ‘cost’ eight times, and only once in specific relation to the cost of data. The question of who should bear the cost, how big the cost will be is almost successfully redefined as an ‘investment need’. The report recognizes that “the monitoring of the SDGs will require substantial additional investment” (p.14) but that “A huge increase in the capacity of many governments, institutions and individuals will be needed to deliver and use this data.” (p.4) How big is the cost of statistics for sustainable development. Are we ready for ‘huge’ or should we go for ‘moderate’, or even ‘substantial’.  I did some ski jumping in my youth, and I was told that you should always aim for the horizon and jump as far as you can and then measure the distance afterwards. But that’s ski jumping. Statistical capacity is a step by step process – and if this process is about statistical outputs, then focus needs to be on statistical production.

A statistics revolution?

“New technologies of data processing inspired contemporaries to believe that the economy might be most effectively controlled, not by manipulating national aggregates, but by a comprehensive system of individualized surveillance”

Those are the words of Adam Tooze (in his book, Statistics and the German State, p. 28).  He was addressing the enthusiasm in Germany on the turn the 20th century. Eventually, the active role of the statisticians subsided and official statistics moved to redefine itself as passively serving knowledge needs. The data scientists of the 21st century may the new statisticians as we move to a data revolution.

The report says (p. 2-3): “whole groups of people are not being counted and important aspects of people’s lives and environmental conditions are still not measured” and then that “Never again should it be possible to say “we didn’t know”. No one should be invisible. This is the world we want – a world that counts.”

I understand the enthusiasm, but I want to warn against hubris. This is certainly not the world I want. I think it should always be possible to say ‘we didn’t know’. Numbers, or the act of counting does not guarantee objectivity nor is counting something the same as bettering it.  It is a testament to the richness of life, and the poverty of numbers that all things cannot be counted.

 

1 Comment on "Writing about a data revolution: between rhetoric and practical action"

  1. Reblogged this on Meanderings: scale, statitsics, interdisciplinarity and commented:
    “The data revolution is giving the world powerful tools that can help usher in a more sustainable future,” claimed Ban Ki-Moon, the UN Secretary Genereal (29/08/14), as he established an group to advise on the topic for the upcoming Sustainable Development Goals. The report that followed, titled ‘A world that counts’, champions the idea that “data are the lifeblood of decision-making,” in a world where “data are bigger, faster and more detailed than ever before” (p2).

    Yet the revolution of data is not the kind that is measured in RPMs. Data collection is not a mechanical rotation, that just needs higher frequency in order to hear a harmony of knowledge, aligned for the decision-making forces of development.

    Instead, data collection is, and has always been, the profligate offspring of the desire for artfully designed statistics. Data are the means, and statistics the politically agreeable end. As Morten Jerven observes in his thoughtful blog post, the game of statistics “is about numbers that can be shaped into targets and indicators”.

    Jerven points to 5 key flaws in the UN advisory report: “conflating statistics with data; lacking a theory for how better data leads to better decisions; assuming non proven synergies between official record and big data; equating data demand with data supply; ignoring the costs of data”. Following on from that, just one of the points that stands out from Jerven’s post is that “it should always be possible to say ‘we didn’t know’”.

    Much more could and has been written about the political strategising of seemingly objective statistics. Perhaps if I had some more data to test, I might suspect that the quantity of development data has an insignificantly weak correlation with the quality of its use for human well-being, security and development.

Leave a comment

Your email address will not be published.


*