Ben Fisher

Feb 17, 2015

Counting the Dead

As Jay Ulfelder pointed out in a recent blog post, getting data on political violence, especially in undeveloped countries, is hard. Violent events can go unreported, either due to government suppression or a lack of reporters in the region. The information that does get through is rarely going to be perfect. As a result, determining specific numbers for dead and wounded turns into a morbid game of educated guessing.

To illustrate Jay's point that numbers are difficult to get at, I've done a rough comparison of two datasets that record incidents of violence against civilians. The motivation behind this effort is to show that even when researchers are able to collect data, the results are far from consistent. The two datasets are the Worldwide Atrocities Dataset, and the Armed Conflict Location and Event Data Project (ACLED). The former records violent acts against noncombatants that result in at least 5 fatalities - so anything from a suicide bomber to government troops firing on protestors - for all countries. The latter dataset examines armed conflict in general, with a focus on African countries. For the purposes of this blog post, I only look at ACLED events that involved violence against civilians.

Now, to get a few of the gory preprocessing details out of the way: this is looking at African countries from 1997-2012, I drop ACLED observations with fewer than 5 fatalities to better match the Atrocities coding rules, and I drop a couple of extreme outliers from each dataset.[1]

The two datasets record roughly the same number of observations - 1,009 in the Atrocities dataset and 1,037 in ACLED - but the total number of deaths recorded varies substantially. The Atrocities data for this sample records 83,913 fatalities, while ACLED records 78,164. The figure below is a plot of total the number of deaths per month recorded by each dataset. It's easy to see that the two datasets are giving very different death totals in some cases, particularly in the early 2000's. The statistical correlation between the two series is about .25.

Alt text

As another rough check, I did a comparison of deaths per month for just the Democratic Republic of the Congo. The DRC has a long history of civil conflict that has resulted in huge numbers of civilian casualties. It's a great example of a situation where we know violence against civilians is a major problem, but determining the scale is extremely difficult. As the figure below shows, the two datasets appear to capture the same major spikes in violence, but give very different numbers of dead in several cases.

Alt text

Unfortunately for researchers, I don't see this problem ever really getting better. I think there will be some improvement in reporting accuracy as mobile phone use increases, but the numbers are remarkably inconsistent as recently as 2010. However, this doesn't resolve the fact that our historical data are all over the place. The 'ground truth' of political violence remains murky at best.


[1] The Atrocities dataset has a campaign observation for Sudan where 50,000 people were killed, and ACLED records a major massacre in the Congo where 25,000 were killed. Neither of these observations were matched in the other dataset. I discarded them to keep the figures readable.

Data, figures, and replication code for this post can be accessed here.