Skip to main content

Research Repository

Advanced Search

A comparison of the value of two machine learning predictive models to support bovine tuberculosis disease control in England

Romero, Maria Pilar; Pilar Romero, M.; Chang, Yu-Mei; Brunton, Lucy A.; Prosser, Alison; Upton, Paul; Rees, Eleanor; Tearne, Oliver; Arnold, Mark; Stevens, Kim; Drewe, Julian A.


Maria Pilar Romero

M. Pilar Romero

Yu-Mei Chang

Lucy A. Brunton

Alison Prosser

Paul Upton

Eleanor Rees

Oliver Tearne

Mark Arnold

Kim Stevens

Julian A. Drewe


Nearly a decade into Defra’s current eradication strategy (Defra, 2014, 2011), bovine tuberculosis (bTB) remains a serious animal health problem in England, with c.30,000 cattle slaughtered annually in the fight against this insidious disease. There is an urgent need to improve our understanding of bTB risk in order to enhance the current disease control policy. Machine learning approaches applied to big datasets offer a potential way to do this. Regularized regression and random forest machine learning methodologies were implemented using 2016 herd-level data to generate the best possible predictive models for a bTB incident in England and its three surveillance risk areas (High-risk area [HRA], Edge area [EA] and Low-risk area [LRA]). Their predictive performance was compared and the best models in each area were used to characterize herds according to risk.
While all models provided excellent discrimination, random forest models achieved the highest balanced accuracy (i.e. average of sensitivity and specificity) in England, HRA and LRA, whereas the regularized regression LASSO model did so in the Edge (EA). The time since the last confirmed incident was resolved was the only variable in the top-ten ranking in all areas according to both types of models, which highlights the importance of bTB history as a predictor of a new incident.
Risk categorisation based on Receiver Operating Characteristic (ROC) analysis was carried out using the best predictive models in each area setting a 99% threshold value for sensitivity and specificity (97% in the LRA). Thirteen percent of herds in the whole of England as well as in its HRA, 14% in its EA and 31% in its LRA were classified as high-risk. These could be selected for the deployment of additional disease control measures at national or area level. In this way, low-risk herds within the area considered would not be penalised unnecessarily by blanket control measures and limited resources be used more efficiently. The methodology presented in this paper demonstrates a way to accurately identify high-risk farms to inform a targeted disease control and prevention strategy in England that supplements existing population strategies.


Romero, M. P., Pilar Romero, M., Chang, Y., Brunton, L. A., Prosser, A., Upton, P., …Drewe, J. A. (2021). A comparison of the value of two machine learning predictive models to support bovine tuberculosis disease control in England. Preventive Veterinary Medicine,

Journal Article Type Article
Acceptance Date Jan 7, 2021
Publication Date Jan 15, 2021
Deposit Date Jan 8, 2021
Publicly Available Date Jan 16, 2022
Print ISSN 0167-5877
Publisher Elsevier
Peer Reviewed Peer Reviewed
Public URL


You might also like

Downloadable Citations