MSc Thesis Seminar

Name: Jihao You
Supervisor: Dr. Martin Zuidhof
Date: Thursday, November 26, 2020
Time: 9:00 am

Join Zoom Meeting

Email: for details

Title: Application of machine learning in the big data for broiler breeders recorded by a precision feeding system


A precision feeding (PF) system developed at the University of Alberta is an innovation in precise nutrition and management for broiler breeders. The PF system can automatically feed individual broiler breeders and record vast amounts of real-time data regarding the feeding activity of individual broiler breeders that provides a valuable source of big data. Machine learning (ML) is an effective tool for big data analytics, because it can be helpful in revealing hidden patterns and correlations in big data. The current thesis aimed to apply ML approaches to extract information from the big data recorded by a PF system and make predictions based on the information. The first study investigated predicting daily oviposition events of individual broiler breeders by a random forest (RF) classification model. The raw dataset from the PF system was processed for 34 features in relation to the feeding activity and body weight (BW) change of individual breeders in one day. Important features were selected using the RF-recursive feature elimination method, and 28 features were selected to build the classification model. Overall accuracy of the model was 0.8482, and the out-of-bag score was 0.8510. Precision of no egg-laying and egg-laying, recall of no egg-laying and egg-laying were 0.8814, 0.8090, 0.8520 and 0.8453, respectively. The Kappa coefficient of the model was 0.6931, indicating substantial agreement. This model was able to identify whether a free-run broiler breeder laid an egg or not on a certain day during the laying period with around 85% accuracy. The second study investigated detecting anomalous real-time BW data of individual broiler breeders that are sometimes recorded by a PF system. A supervised learning approach was developed to detect anomalies by considering the data distribution and features regarding the feeding activity of individual birds recorded by the PF system. Based on a manually labelled dataset, 4 supervised learning algorithms were applied, including RF, support vector machine, k-nearest neighbor, and artificial neural network (ANN). It showed that RF was the best algorithm because it had the highest F1 score (0.9712) and area under the precision-recall curve (0.9948). Compared with common anomaly detection approaches including Z-scores, interquartile range (IQR), density-based spatial clustering of applications with noise (DBSCAN), and local outlier factor (LOF), RF had a higher average F1 score (0.9448), which indicated that RF was an effective solution to clean anomalous real-time BW data of individual broiler breeders fed by the PF system. The third study investigated improving the prediction for daily oviposition events of individual broiler breeders in the first study. In the first study, the model could only be used to identify daily oviposition events on the subsequent day and the prediction outputs were binary labels. An ANN model was proposed to predict and output the probability of daily oviposition events occurring using a specific time point in one day. The anchor point was newly defined as a specific time point in one day, and 26 features around the anchor point were created. The area under the receiver operating characteristic (ROC) curve was 0.9409, indicating that the model had an outstanding classification performance. The ANN model could predict oviposition events on the current day, and the output was a probability that could be informative to indicate how likely oviposition of an individual breeder occurred in the day. In situations where total egg production was known for a group, the ANN model could predict the probability of daily oviposition events occurring of all individual birds and then rank them to choose those most likely to have laid an egg. We concluded that ML approaches could extract meaningful information from the big data generated by a PF system for making predictions.