Predicting Lung Cancer Onset Using Segmentation and Classification

Abstract: Lung cancer claims 150,000 Americans yearly, yet only 15.7% are diagnosed pre-metastasis. Current computer diagnostics focus on Bayesian networks using rule-based prediction. In this project, I developed a regular classification algorithm to identify potential lung nodules from CT scans and to classify whether they would be cancerous in a 12-month timespan. I was working with data from the Lung Image Database Consortium, and I used scans from 90 patients. I preprocessed the balanced data (45/45 positive/negative ratio) by applying provided masks indicating the location of lung tissue. To further isolate lung tissue, I used a Hounsfield filter with a 395-405 HU range and normalized slice number for each patient to 256. I created a random training/testing set of 65 scans/25 scans. Next, I used U-net segmentation, a small dataset-optimized convolutional neural network, to predict nodule masks and extract a feature set from cancerous slices during ground truth comparison. I designed six U-Net models with varied hyperparameters including learning rate, optimizer, activation function, and number of layers with a Dice coefficient objective function. Model 6 achieved the highest Dice(0.64) using a 0.001 learning rate, 9 layers, Adam optimizer, and Leaky ReLU activation. I then used Model 6’s feature set to train a classifier that determined whether unknown nodules were cancerous. Using XGBoost classification, I varied learning rate, optimizer, activation function, and decay, using a logarithmic loss objective function. The most effective XGBoost classifier achieved 0.59 loss and a 76%/72% accuracy on the training/testing sets. Future work will focus on using stratified k-fold cross validation to minimize sample bias and using grid search to improve hyperparameter optimization.

Project Presentation:


Click here to view my project poster.

Awards: I’ve been lucky to get recognized for my work on NodulePredict

  • 2017  Intel ISEF 4th Place- Translational Medical Science
  • 2017 North Jersey Regional Science Fair
    • Grand Prize Winner /Intel ISEF Finalist
    • 1st Place Math + Computer Science
    • 1st Place ACM Computing
    • 1st Place Intel Computer Science Award
    • Innovation Award
    • Statistics Award – Honorable Mention
  • 2017 NYC World Maker Faire Keynote Speaker @ MAKE: Electronics stage
  • 2017 Envision TV New Thought Leader
  • 2018 Columbia Junior Science Journal publication

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s