The data in is related to cancer diagnoses of different types. Each case includes information on the properties (radius, texture and perimeter) of the three most characteristic cell nuclei. Moreover, the age of the person, the date of the diagnose and treatment start, as well as the cancer type is available.
- Exploratory data analysis
- Here I have tried to answer following questions by building many visualisations.
What are abnormalities in the data?
Are there any interesting, perhaps unexpected correlations to be found?
How to treat with null values?..
- Here I have tried to answer following questions by building many visualisations.
- Modelling
I have selected Logistic regression for modelling. You can try some other modelling algorithms like RandomForest, XGBoost etc.. Objective was to keep its complexity reasonable (number of used features, etc.) - Find similar thing on kaggle here
Jupyter Notebook – Breast_cancer_prediction Challenge
Download test and train data and pay around with the data.