Deaths by Horse-kick and other Hazards in the Prussian Army

Antony Unwin and Bill Venables

2025-02-05

Introduction

In 1898 the Russian statistician Ladislaus von Bortkiewicz (1868-1931) used a dataset on deaths in Prussian Army Corps to illustrate what he called the Law of Small Numbers (von Bortkiewicz (1898)). At the time Statistics could only be confidently applied to large datasets where, for example, Gaussian approximations might apply. Von Bortkiewicz wanted to show that methods could be devised to deal with datasets where such approximate methods clearly could not be used as well. He took annual data on deaths due to horse-kicks in 14 Army Corps over 20 years from 1875 to 1894. They had been reported in the official state publication Preussische Statistik. At the start of the period a unified Germany had just been created following their victory over France in the Franco-Prussian war of 1870-71. Germany was not involved in a war again until 1914 when the First World War began, although from the 1880s there were colonisation expeditions in Africa and the western Pacific.

In more recent times the preference for most intermediate to advanced textbooks has, for good reasons, shifted to larger and larger data sets. Nevertheless the fascination with historically interesting datasets remains even if they are small. One such dataset is the horse-kick data and deserves a fresh look with modern tools, particularly at the level of detail at which it was originally reported and not as a gross summary of about 5 or 6 frequencies.

The original data set was of death by horse-kick only, and covered the 20 calendar years from 1875 to 1894. This package offers an extension of the data in two ways: Firstly, it extends the series to 1907, making it for 33 years rather than 20, and, secondly, it provides data in parallel on two further accidental causes of death, by falling from a horse or by drowning. The bigger story is set out in detail in the article (Unwin and Venables (2025)) which this package is intended to accompany.

A glimpse at the data

The following table shows the first few lines of the data set, called hkdeaths.

year corps regiments NCOs kick drown fall vonB_kick
1875 G 8 0 0 3 0 0
1875 I 6 0 0 5 0 0
1875 II 4 0 0 6 0 0
1875 III 4 0 0 7 1 0
1875 IV 4 0 0 7 0 0
1875 V 4 0 0 11 0 0

Most of the column names are self-explanatory, some need further explanation:

To get some idea of what is going on in the data, we can show the change over time of the corps annual aggregate deaths for each of the three causes.

hazards <- hkdeaths |> 
  group_by(year) |> 
  summarise(horse_kicks = sum(kick),
            falls       = sum(fall),
            drownings   = sum(drown), .groups = "drop") 
hazards_long <- hazards |> 
  pivot_longer(cols = c(horse_kicks, falls, drownings), 
               names_to = "cause", values_to = "deaths")
ggplot(hazards_long) + aes(x = year, y = deaths, colour = cause) +
  geom_line(linewidth = 1.5) +
  scale_colour_viridis_d() + 
  labs(title = "Annual Deaths from three Causes", x = "Year", y = "Deaths")

There is a suprisingly consistent fall in the death rate by drowning over the period, of about 2% per annum:

drown_1 <- glm(drownings ~ I(year - 1891), data = hazards, family = poisson(link = log))
exp(coef(drown_1)) |> rbind() |> as.data.frame(check.names = FALSE) |> kable(digits = 2)
(Intercept) I(year - 1891)
56.89 0.98

So the estimated mean deaths by drowning in the middle of the series, 1891, was 56.9 and the estimated decline is 2.2% per annum.

It is natural to ask if this decline in drowning death rate is also consistent across corps. This can be studied using a facet plot.

ggplot(hkdeaths) + aes(x = year, y = drown) + 
  geom_line(colour = "#21908CFF", linewidth=1.0) + facet_wrap(~ corps)  +
  geom_smooth(method = "glm", formula = y ~ I(x-1891), method.args = list(family = poisson),
              linewidth = 1.0, colour = "#440154FF", se = FALSE) + 
  labs(x = "Year", y = "Deaths", title = "Annual Drowning Deaths per Corps")

Perhaps corps XIV is the one exception, showing a slight increase in drowning deaths per annum over the period. All other corps show a decline to some degree.

References

Unwin, Antony, and Bill Venables. 2025. “Horses for Courses: A Fresh Look at the Horse-Kick Data.” Significance To appear.
von Bortkiewicz, Ladislaus. 1898. Das Gesetz der kleinen Zahlen. Leipzig: Teubner.