Skip to main content

Research Repository

Advanced Search

Decision tree machine learning applied to bovine tuberculosis risk factors to aid disease control decision making

Romero, M P; Chang, Y M; Brunton, L A; Parry, J; Prosser, A; Upton, P; Rees, E; Tearne, O; Arnold, M; Stevens, K; Drewe, J A


M P Romero

Y M Chang

L A Brunton

J Parry

A Prosser

P Upton

E Rees

O Tearne

M Arnold

K Stevens

J A Drewe


Identifying and understanding the risk factors for endemic bovine tuberculosis (TB) in cattle herds is critical for the control of this disease. Exploratory machine learning techniques can uncover complex non-linear relationships and interactions within disease causation webs, and enhance our knowledge of TB risk factors and how they are interrelated. Classification tree analysis was used to reveal associations between predictors of TB in England and each of the three surveillance risk areas (High Risk, Edge, and Low Risk) in 2016, identifying the highest risk herds. The main classifying predictor for farms in England overall related to the TB prevalence in the 100 nearest cattle herds. In the High Risk and Edge areas it was the number of slaughterhouse destinations and in the Low Risk area it was the number of cattle tested in surveillance tests. How long ago the last confirmed incident was resolved was the most frequent classifier in trees; if within two years, leading to the highest risk group of herds in the High Risk and Low Risk areas. At least two different slaughterhouse destinations led to the highest risk group of herds in England, whereas in the Edge area it was a combination of no contiguous low-risk neighbours (i.e. in a 1 km radius) and a minimum proportion of 6–23 month-old cattle in November. A threshold value of prevalence in 100 nearest neighbours increased the risk in all areas, although the value was specific to each area. Having low-risk contiguous neighbours reduced the risk in the Edge and High Risk areas, whereas high-risk ones increased the risk in England overall and in the Edge area specifically. The best classification tree models informed multivariable binomial logistic regression models in each area, adding statistical inference outputs. These two approaches showed similar predictive performance although there were some disparities regarding what constituted high-risk predictors. Decision tree machine learning approaches can identify risk factors from webs of causation: information which may then be used to inform decision making for disease control purposes.


Romero, M. P., Chang, Y. M., Brunton, L. A., Parry, J., Prosser, A., Upton, P., …Drewe, J. A. (2019). Decision tree machine learning applied to bovine tuberculosis risk factors to aid disease control decision making. Preventive Veterinary Medicine,

Journal Article Type Article
Acceptance Date Nov 27, 2019
Publication Date Nov 30, 2019
Deposit Date Dec 6, 2019
Publicly Available Date Dec 1, 2020
Print ISSN 0167-5877
Publisher Elsevier
Peer Reviewed Peer Reviewed
Public URL


You might also like

Downloadable Citations