Home / Papers / Breast Cancer Prediction

Breast Cancer Prediction

7 Citations•2022•

journal unavailable

The objective of this project is to train machine learning models to predict whether a breast cancer cell is Benign or Malignant, and to make a performance comparison between different machine learning algorithms in order to assess the correctness in classifying data with respect to efficiency and effectiveness.

Abstract

: Breast cancer is causing an alarming increase in the number of deaths each year. It is the most common type of cancer and the leading cause of death in women around the world. Any advancement in cancer illness prediction and detection is critical to living a healthy life. As a result, high accuracy in cancer prognosis is critical for updating therapy aspects and patient survivability standards. Machine learning approaches, which have been shown to have a significant impact on the process of breast cancer prediction and early diagnosis, have become a research hotspot and have been proven to be a powerful technique. On the Breast Cancer Wisconsin dataset, we used five machine learning algorithms: Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision Tree (C4.5), and K-Nearest Neighbours (KNN). The objective of this project is to train machine learning models to predict whether a breast cancer cell is Benign or Malignant. Data will be transformed and its dimension reduced to reveal patterns in the dataset and create a more robust analysis. The optimal model will be selected following the resulting accuracy, sensitivity, and f1 score, amongst other factors. We will later define these metrics. We can use machine learning methods to extract the features of cancer cell nuclei and classify them. It would be helpful to determine whether a given sample appears to be Benign ("B") or Malignant ("M"). The machine learning models that we will apply in this report try to create a classifier that provides a high accuracy level combined with a low rate of false negatives (high sensitivity). This project will make a performance comparison between different machine learning algorithms in order to assess the correctness in classifying data with respect to efficiency and effectiveness of each algorithm in terms of accuracy, precision, sensitivity, and specificity, in order to find the best diagnosis. Diagnosis in an early stage is essential to facilitate the subsequent clinical management of patients and increase the survival rate of breast cancer patients. The major models used and tested will be supervised learning models (algorithms that learn from labelled data), which are most used in these kinds of data analysis. The utilization of machine learning approaches in medical fields proves to be prolific as such approaches may be considered of great assistance in the decision-making process of medical practitioners images for diagnosis with an increasing number of breast cancer patients. The convolutional neural network, a deep learning algorithm, provides significant results in classification among cancer and non-cancer tissue images but lacks in providing interpretation. The image frames a classification problem as weakly supervised multiple instances learning problems and use attention on instances to localize the tumor and normal regions in an image. Attention-based multiple instance learning (A-MIL) is applied on BreakHis and BACH datasets. A method used in this paper produces better localization results without compromising classification accuracy. attribute selection, target Role selection, and feature extraction. Machine learning algorithms are built using the prepared data to forecast breast cancer for a new set of measures. We show the model new data for which we have labels to evaluate the algorithms' performance. This is commonly accomplished by using the Train test split method to split the labelled data we've acquired into two The training data, also known as the training set, accounts for 75% of the data utilised to develop our machine learning model. Test data, or test set, is 25% of the data that will be used to see how well the model works. After testing the models we compare the obtained results to select the that provides the high and identify the most predictive algorithm for the detection of breast