Document Type : Original Article
Authors
1
Department of Animal Wealth Development (Biostatistics), Faculty of Veterinary Medicine, Benha University, Moshtohor, Toukh, 13736, Qalyubia, Egypt
2
Animal and poultry production, Department of Animal Wealth Development, Faculty of Veterinary Medicine, Benha University, Moshtohor, Toukh, 13736, Qalyubia, Egypt.
3
Ministry of Agriculture, Agricultural Research Center, Animal Production Research Institute, Dokki, Giza, 12619, Egypt.
4
Statistics Department, Faculty of Economics and Political Science, Cairo University, Giza, Egypt.
Abstract
Dairy farm record is a necessary element of good livestock business management. Record analysis allows a farm’s owner to make informed decisions based on complete records. However, incomplete records are less valuable for data analysis, it is critical to deal with missing values appropriately. This article compares different imputation methods for handling missing values in a raw dataset of dairy cattle including 997 records collected from 234 cows between 2012 and 2022. The dataset was screened against records with missing values then deleted, reducing its size to 858 observations equivalent to 200 cows. There were missing values in two variables with a missing percentage 13.9%: days in milk (DIM) and total milk yield (TOTM). Then, excluding observations with known values at random that exhibit the same missing data percentages as the original dataset for DIM and TOTM.
Five different imputation methods were compared to obtain the best imputation technique for prediction of missing values. These methods are mean imputation, median imputation, power regression imputation, multiple regression imputation and expectation maximization method (EM). The five methods were evaluated based on four performance metrics: the mean absolute deviation (MAD), the mean square error (MSE), the Spearman’s rank correlation coefficient (rs) and the mean absolute percentage error (MAPE). The results showed that the expectation maximization method was overall the best imputation method for data under study. It has the lowest MAD, the lowest MSE, the highest Spearman’s correlation coefficient and the second lowest MAPE for predicting missing dairy cow DIM and TOTM data.
Keywords
Main Subjects