Session: ASME Undergraduate Student Design Expo
Paper Number: 172901
The Study of the Effect of Pre-Trained and Unbalanced Datasets for Brain Tumor Classification
Composed of hundreds of millions of neurons and supporting cells, the human brain is a complex biological system that performs complex tasks. Yet, the brain itself is not shielded from threats; abnormal growths, known as tumors, that occur as a mass or a lump may impair vital brain function. These tumors can originate from different parts of the brain. The types of tumors are classified based on their origin, namely, glioma, meningioma, and pituitary. Because the location of these tumors influences both the symptoms and the most effective approach to treatment, accurate classification is critical for timely diagnosis and treatment. Manual identification can be time-consuming and error-prone, reducing a patient’s chances of survival. Thus, this research proposes a comparative deep-learning framework for multiclass classification of brain tumors in publicly available MRI scans. The motivation for this work stems from the need for reliable and quick AI-powered tools that can supplement medical workflows and aid early detection. Furthermore, due to differences in the availability of MRI images for certain types of tumors, there exists an imbalance in MRI image data. Hence, it is important to investigate the effect of data imbalances on AI model performance. Relatively speaking, medical image data is scarce when compared to the vast quantities of labeled image datasets in general computer vision tasks, such as object detection or facial recognition. To address this limitation, transfer-learning models pre-trained on large image datasets were utilized. It was seen from previous studies that transfer learning algorithms can result in better accuracy with less amount of data due to their pretrained weights. However, the effect of these algorithms on MRI scans needs to be verified. Thus, the research question developed for this study was, "Can a comparative analysis of multiple deep learning algorithms result in a highly accurate multiclass classification of brain tumors using MRI scans?" the two specific aims developed to answer the research questions were: (a) to create a framework for multiclass classification of brain tumors for multiple deep learning algorithms, and (b) to study the effect of both balanced and unbalanced datasets in brain tumor classification. The four categories under consideration were: No tumor, Glioma, Pituitary, and Meningioma. The dataset was obtained from Kaggle and was already partitioned into training and testing datasets, with approximately 80% allocated for training and 20% for testing. It initially contained a total of 7023 images divided into four categories: 2000 No tumor, 1621 Glioma, 1757 Pituitary Tumor, and 1645 Meningioma images. Before training, the dataset was filtered and pre-processed, removing blurry, low-quality, and poor contrast images. The filtering step also ensured that misclassified images were removed from the dataset. After filtering, the total refined dataset (6956) included: 1970 No tumor, 1592 Glioma, 1754 Pituitary Tumor, and 1640 Meningioma MRI images. Three convolutional neural networks were explored through Python’s Tensorflow and Keras libraries: pre-trained VGG16, pre-trained InceptionV3, and a non-pretrained custom CNN algorithm. For specific aim 1, the hyperparameter optimization of epochs was performed by training each algorithm from 50 to 600 epochs in 50-epoch increments, with 10 iterations per algorithm. For specific aim 2, the effects of class imbalances across the different tumor categories were studied. The top-performing algorithm from Specific Aim 1 was selected and trained for 150 epochs on five imbalanced and one balanced dataset, using 10 iterations for each case. In order to evaluate performance for specific aims 1 and 2, six performance indicators were used: testing accuracy, training loss, validation loss, precision, recall, and F1-scores. Each of the performance indicators was calculated using a weighted average over the values obtained from 10 iterations. Preliminary results demonstrate that InceptionV3 achieves superior performance, with testing accuracy exceeding 98% and minimal training loss. Furthermore, investigating class imbalances revealed that some of the categories of tumors are sensitive to imbalances in the dataset.
Presenting Author: Shweta Dabetwar University of Arkansas at Little Rock
Presenting Author Biography: N/A
Authors:
Syed Azfar Rahman University of Arkansas at Little RockShweta Dabetwar University of Arkansas at Little Rock
The Study of the Effect of Pre-Trained and Unbalanced Datasets for Brain Tumor Classification
Paper Type
Undergraduate Expo