Predicting Disasters – a Mug’s Game or Sorry, I Had a Math Geek Moment

25 04 2009

“By 2015, on average over 375 million people per year are likely to be affected by climate-related disasters. This is over 50% more than have been affected in an average year during the last decade.”

Oxfam International have decided to use this figure prominently in the first paragraph of their latest paper on “The Right to Survive: The humanitarian challenge for the twenty-first century”, a 148 page document aimed at justifying a “step-change in the quantity of resources devoted to saving lives in emergencies and in the quality and nature of humanitarian response”. The same statistic appears also as their ‘killer fact’ on the last page of the same document, and it’s already been picked up by the wires and quoted all over the place, including in the New Statesman, the Guardian, even the News of the World (an English tabloid newspaper better known for celebrity sex-scandals and the like).

The suggested increase in disaster-affected people of 50% in the next few years struck me as an enormous change in a very short period of time. I know we’re getting worried about global warming, but is it really likely to have that much effect so quickly?

While in general I think the world could and should devote significantly more resources to humanitarian intervention, there are plenty of convincing arguments for this based on the way the world is now, without having to make predictions about how it’s all about to get much worse. I decided to have a closer look at how this alarming figure was generated, as if this demand for more resources relies so heavily on a projection like this, the researchers had better be pretty sure that their calculations are sound and defensible.

Turns out this figure comes from a short paper written by Shamanthy Ganeshan, & Wayne Diamond for Oxfam, called “Forecasting the numbers of people affected annually by natural disasters up to 2015”. Ganeshan and Diamond use figures from the EM-DAT database compiled by the Centre for Research on the Epidemiology of Disasters (CRED). They use the 1980-2007 quarterly figures of people “affected” by natural disasters starting in that quarter. “Affected”, in CRED terms, means those “who suffered physical injuries or illness, as well as those made homeless or who otherwise required immediate assistance during a period of emergency.”

Ganeshan and Diamond separate out ‘climate-related’ disasters because “predicted changes in the global climate could be expected to increase the frequency and severity of these natural hazards”, they take the number of affected people each quarter between 1980 and 2007, smooth the data using a method called “double exponential smoothing” and then do a linear regression, i.e. a best-fit straight line, which they then project on to 2015 to get the headline 375 million figure, and a “95% confidence interval” of 338m – 413m.

Luckily for all of us, there are several reasons why this estimate isn’t to be taken at face value and we don’t need to break out the dinghy and the tins of Spam just yet.

Firstly, there are questions over the statistical methods used. According to the authors, double exponential smoothing is supposed to be “used when there is reason to believe there is an underlying trend in the time series”. This, however, is generally reflected using a method called simple exponential smoothing. Double exponential smoothing is used when we think that this trend might be changing, i.e. following the authors argument in this case it would be not just that disasters are increasing, but that this increase is actually accelerating. But the authors do not try to justify this choice. The authors then fit a linear regression to their smoothed figures. The logic behind this is unclear – if there is a mechanism such as climate change that is increasing the number of disaster-affected people over time, there is no need to smooth the data first if you are then going to fit a linear regression to it, this simply reduces the accuracy of the regression (and also makes their confidence interval calculation meaningless). I recalculated using their dataset and the most simple method – a best fit line (linear regression), and came up with a projected figure of 306 million in 2015, a 28% increase. Still a sizeable figure, admittedly, but disturbingly different to their 375 million, 50% increase. What this means is that a highly questionable choice of statistical method is responsible for a sizeable chunk of their ‘killer fact’.

Secondly there are actually reasons to doubt the implicit model of an accelerating increasing trend in people being affected by disasters. Although they do not try to show how this hypothesis fits the data, the mechanism Ganeshan and Diamond seem to be using is that there is an underlying climate change trend that is increasing the severity and frequency of climate hazards (an increase that may be linear or exponential, it is unclear), and that there are potentially other factors that could exacerbate this (they mention “population growth, including in vulnerable areas, and the range of factors that can make people more vulnerable to the disasters that occur”). This concept of an accelerating climate change has begun to appear more frequently in the literature, and there is growing research into potential positive feedback mechanisms, but this is far from established science, and I haven’t seen anyone suggesting that an accelerating climate change has already been happening over the last 30 years. The authors do not present a model or evidence to justify this choice. And while Ganeshan and Diamond make mention of some possible other mechanisms to increase the figure – population growth, growth of people living in areas such as urban slums particularly vulnerable to extreme climate events, they do not consider the possibility of factors acting in the opposite direction. Such factors could be technological improvements (allowing better communications to at-risk populations, better climate modelling and weather predictions); decreasing numbers of poor people and increased government capacity in the countries providing the bulk of the figures (unsurprisingly China and India provide 77% of the affected populations in the chosen time period, and the number of people living below the poverty line in these two countries has nearly halved from 1.2 billion to 663 million according to World Bank 1981 & 2005 figures), and all of the work done by organisations like Oxfam in improving resilience in at-risk populations . As Oxfam actually says in its “Right to Survive” paper:

“While Cyclone Sidr killed around 3,000 people in Bangladesh in 2007, this was a tiny fraction of the numbers killed by Cyclone Bhola in 1972 or even by Cyclone Gorky in 1991, despite the fact that these storms were similar in strength or weaker…because governments have taken action to prepare for disasters and reduce risks”.”

One simple approach to address this lack of clarity as to in which direction the number of affected people is actually heading in is to see which type of line best fits the data we have. I used annual rather than quarterly data for 1980-2007 (CRED records quarterly data in the quarter in which the disaster starts, but most of these go on for more than one quarter, so annual data may actually be more accurate as well as being less ‘noisy’ so providing easier comparison of different trends, and found that a logarithmic (i.e. an increase that flattens off to level) fit the data better than a straight line increase (R squared value of 0.1619 for logarithmic, R squared 0.1434 for the straight line, both pretty bad fits but it is data with very high variance), and that a quadratic curve (a decreasing trend) fits even better, with an R squared of 0.1844. The implication of that best fit curve? Average affected people peaked at around 225 million around the year 2000 and by 2015 would be around 148 million, a sizeable decrease. I’m not saying this is correct, and Ganeshan and Diamond do admit that “different forecasting models could lead to different results”, I am just saying there are reasons to doubt the model that the authors’ have chosen, since there is no explicit logic to the model and it is not the model which best fits the data they have chosen to use, and therefore to doubt their claims that theirs “is a reasonable forecast” especially when they undermine this reasonableness by drawing incorrect conclusions such as:

“we can be 95% confident that the number of people affected by climate-related natural disaster in 2015 will be between 336 million and 413 million in an average year”

This can only possibly be true if their model is correct, and there is a far higher than 5% chance that it is not.

Thirdly and finally there is the choice of datasets. The authors do not say why they have chosen to use 1980-2007. Looking at the data, it is clear that we do not have accurate figures before the 60s, and between the 1960 and 1970 disasters seem to become more completely recorded. Underrecording in earlier data therefore could well be an error which could explain some or all of the observed increasing trend. Although it may seem reasonable to assume that from 1980 onwards there would not have been too much change in accuracy of disaster victim recording, there are also plausible arguments that for example during the Cold War, disaster victims might have been underreported for political reasons by both sides. So I looked at how dependent the results were on the years chosen.

I would like to compare the impact of changing datasets using the same methods as the authors, but they have not given enough information to replicate their smoothing, which requires two constants, not just the one that they provide. So I have recalculated using a simple linear regression, which can be compared to the 306 million projection for 2015 above, and the average of the last 10 years that the authors base their comparison on, which is 238 million.

Ganeshan and Diamond seem to have excluded the latest 2008 data, which are available with CRED. Including these immediately reduces the projection for 2015 to 288 million. Let’s take the Cold War hypothesis, and use the dataset from 1989-2008, less data but maybe more reliable – this gives us 244 million (a very small increasing trend, scarcely higher than the 238 million average).

As an aside, you may wonder if the increases in the numbers of affected people is significant, ie. if we can actually statistically say that there is an increasing trend. The answer as far as I can see is clearly no, at least not most of the ways you cut the data. But this in itself is not a disqualification of the premise of the paper, since we have a pretty well scientifically established mechanism (i.e. climate change) by which these numbers probably will increase, so it’s a reasonable working hypothesis that there is an increase, rather than assuming the null hypothesis (having to prove that there is a statistically significant increase before we can do any further projections – we just don’t have the data for this).

Going back to the datasets, the shorter time we look back, the smaller the increasing trend, and from 1993 onwards the best fit straight line actually shows a decreasing trend, i.e. we would project a decreasing number of people being affected in the future. We might say that 15 years of data is not enough to get over the natural fluctuations in disasters, but it is not immediately obvious then that 27 years used by the authors is enough to be reliable.

My general conclusion would be that you have to take all these figures with more than a pinch of salt. From looking at the available data in detail, I would conclude that it is just as easy to make the case that the number of people being affected by climate hazards is decreasing over time as that it is increasing, and that the data is far too incomplete and of questionable accuracy to accept the authors’ claim that their “headline figure” is “reasonable”. It is an interesting way to look at the data, and this type of evidence-based approach is often lacking from NGOs’ advocacy positions, but it needs to be done very carefully and very thoroughly if it is going to have the desired impact.

I think Oxfam would be better off relying on the observables and generally accepted facts than to risk a backlash and the loss of credibility through use of questionable statistical methods. Few Oxfam supporters would question that global warming is happening and likely to have a big impact on poor people, nor that we should devote more resources to humanitarian issues, but relying on alarmist statistics like this in the end will only serve to weaken the underlying message.


Actions

Information

One response

26 04 2009
Will Dwinnell

This is an excellent post. Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s




%d bloggers like this: