Home / Papers / Common Errors in Machine Learning Projects: A Second Look

Common Errors in Machine Learning Projects: A Second Look

DOI: 10.1145/3631802.3631808Semantic Scholar

3 Citations•2023•

Renato Magela Zimmermann, S. Allin, Lisa Zhang

Proceedings of the 23rd Koli Calling International Conference on Computing Education Research

This work uses a mixed-method approach to analyze errors in 30 final project reports in an undergraduate machine learning course and identifies areas of opportunity to improve machine learning pedagogy, particularly related to data processing, data leakage, hyperparameters, nonsensical outputs, and disentangling data decisions from model decisions.

Abstract

While machine learning (ML) has proved impactful in many disciplines, design decisions involved in building ML models are difficult for novices to make, and mistakes can cause harm. Prior work by Skripchuk et al. [35] identified common errors made by ML students via qualitative analysis of open-ended ML assessments, but their sample was limited to a single institution, course, and assessment setting. Our work is an extended, conceptual replication of this work to understand the common errors made by machine learning students. We use a mixed-method approach to analyze errors in 30 final project reports in an undergraduate machine learning course. The final reports describe the model-building process for a classification task, where students build models on a complex data set with numerical, categorical, ordinal and text features. Our choice to analyze project reports (rather than code) allows us to uncover design errors via how students justify their methodology. Our project task is to achieve the best test accuracy on an unseen test set; thus, as a way to validate these common errors, we identify the association between these errors and the model’s test accuracy performance. Common errors we find include those consistent with Skripchuk et al. [35], for example issues with data processing, hyperparameter tuning, and model selection. In addition, our focus on design error exposes other common errors, for example where students use certain kinds of features (e.g., bag of words representations) only with particular models (e.g., Naive Bayes). We call these latter types of errors model misconceptions, and such errors are associated with lower test accuracy. Some of these errors are also present in work by practitioners. Others reflect a difficulty by students to make correct connections between ML concepts and achieve the relational level of the SOLO taxonomy. We identify areas of opportunity to improve machine learning pedagogy, particularly related to data processing, data leakage, hyperparameters, nonsensical outputs, and disentangling data decisions from model decisions.