heart disease prediction using data mining and ML

#-heart-disease-prediction-using-data-mining-and-ML

Framingham Heart study dataset

#Framingham-Heart-study-dataset

includes several demographic risk factors:

#includes-several-demographic-risk-factors:

- sex: male or female.

-age: age of the patient.

- education: levels coded 1 for some high school, 2 for a high school diploma or GED, 3 for some college or vocational school, and 4 for a college degree.

The data set also includes behavioral risk factors associated with smoking :

#The-data-set-also-includes-behavioral-risk-factors-associated-with-smoking-:

- currentSmoker: whether or not the patient is a current smoker

- cigsPerDay: the number of cigarettes that the person smoked on average in one day.

Medical history risk factors:

#Medical-history-risk-factors:

- BPMeds: whether or not the patient was on blood pressure medication .

- prevalentStroke: whether or not the patient had previously had a stroke .

- prevalentHyp: whether or not the patient was hypertensive.

- diabetes: whether or not the patient had diabetes

Risk factors from the first physical examination of the patient.

#Risk-factors-from-the-first-physical-examination-of-the-patient.

- totChol: total cholesterol level .

- sysBP: systolic blood pressure.

- diaBP: diastolic blood.

- pressure BMI: Body Mass Index .

- heartRate: heart rate.

- glucose: glucose level .

- TenYearCHD: 10 year risk of coronary heart disease CHD .

information about project work's enveroment :

#information-about-project-work's-enveroment-:

- Anaconda 5.0.1 .

- progrmming languge python 3.6 .

- work's enveroment Jupyter Notebook 5 .

* we use anaconda becouse it include both inveroment and paython ، 

link for download

all laiberary in our project : 

#all-laiberary-in-our-project-:-

analysis and discover the dataset

#analysis-and-discover-the-dataset

reduce the data

#reduce-the-data

read the dataset: .heart_data.csv

sample from data in our project

Loading output library...

number of data in each column

Loading output library...

data details

std :    Standard deviation

mean :  arithmetic mean  

count : number of element

Loading output library...

Number of target value including in last column TenYearCHD

#Number-of-target-value-including-in-last-column-TenYearCHD

0 refers to the number of pepole doesnot sufer from heart attak disease in the last ten year

1 refers to people whom sufer from heart attack disease in the last ten year

Loading output library...



order data and review

remove all data that exist incomplete information

now number of exist value become equal small colum

Loading output library...



study soctial state of samples

First: construct table to social information :

#First:-construct-table-to-social-information-:

Sample

Loading output library...

analysis the association between samples data :

#analysis-the-association-between-samples-data-:

analysis the association mean the study between two variables , the basic topic for this is to determine the relationship between that's variables , from 0 (no correlation ) to 1 that is (perfect correlation.

more information  

Loading output library...
Loading output library...

Distribute study level by age

Loading output library...
Loading output library...

Behavior smoke for gender

Loading output library...
Loading output library...

number of cigarettes by age levels

Loading output library...

study level for gender

Loading output library...
Loading output library...

Sample

Loading output library...

Analysis the association between Medical data

Loading output library...
Loading output library...

Age distributed for patient that was injured heart disease cornary in last ten year

Loading output library...
Loading output library...

Age distributed for diabetes patients.

Loading output library...
Loading output library...

Heart disease cornary for genders

Loading output library...
Loading output library...

Avoide to prolongation on the review of data limit ourselves to this we go to machine learning

 classification and machine learning algorithm

#-classification-and-machine-learning-algorithm

Decision Tree Classifier

#Decision-Tree-Classifier

decision tree is the non-supervision learning way,it Used to classification and regression ,the objective from it make model to prediction value of variable goal bu learning rules of simple decision to extracted from features,we apply classification process by set of rules or conditional that determine path start root and ends of final root that represents symbol to classified thing and at all infinite node must be mack decision about path to next node.

soures

first: model prediction the probability of diabetes

#first:-model-prediction-the-probability-of-diabetes

Preparation of data

will be separated and division of data to the matrix'x' which will contain data features that will be used for training, and the matrix'y' only contain the column values'diabetes' any target, this means that the x will contain features every person and y is a matrix of a single column and each row in the y will contain the value of either 1 if the person may injured diabetes or 0 If you do not hurt. algorithm half- life will compare values or advantages of each row in the group x with the value of the corresponding in the matrix y to find out certain patterns for the reasons which can be affected in the injured person diabetes

properties that used in X

'male' , 'age' , 'cigsPerDay' , 'BMI' , 'glucose' , 'totChol'

training and testing data

will divide each of the matrix x and y to the data for Training and data test. We will use 80% of the matrix'x' and the matrix'y' training and 20% will be used to test.

Decision Tree Classifier

specifying the maximum depth of possible branches of the tree in the 10.

feed model decision tree by training data

This is process called " training "

Loading output library...

test the effectiveness of Model

decision tree ready, it will be we can test their effectiveness'score' using the training data and testing to know its accuracy.

Results showed that decision tree succeeded in expectation% 98 of the data set the stomach test properly, which means that their quality high

use the Model to predict the diabetes

#use-the-Model-to-predict-the-diabetes

we will enter data for new person to predict his diabetes

prediction result

from the person' s information' s New been expected to his diabetes, decision tree is due value 1 in the case of diabetes and returns the value 0 in the case of non- injury

add anthor persone to make sure form works

seeing as the result is the 0 Unlike the person former, here was not expected injury new person diabetes

Second: The model predicted heart disease Coronary

#Second:-The-model-predicted-heart-disease-Coronary

Preparation of data

we will repeat the previous steps our form expected to diabetes with the change the target to TenYearCHD

Decision Tree Classifier

feed model decision tree by training data

This is process called " training "

Loading output library...

test the effectiveness of Model

Results showed that decision tree succeeded in expectation% 83 of the data set the stomach test properly, good.

use Model to predicte heart disease Coronary

#use-Model-to-predicte-heart-disease-Coronary

data for new person

prediction result

has been expected to disease coronary heart.

add data of new person to make sure that the form works well:

seeing as the result is the0 Unlike the person former, here was not expected injury new person diabetes



 

  more algorthim

#-more-algorthim

Random Forest

#Random-Forest

algorithm to the forest random derived from the decision tree( Classification and Regression Trees), one of ways to machine learning in order to building model prediction of data, as it is obtained models through the division of data and build a simple form to predict the inside each section

soures

apply forest random in the prediction to coronary heart disease

Model Forest random classifier

Loading output library...

test the effectiveness of Model:

the forest random succeeded in expectation 84% of the total data stomach test properly, better than the decision tree.

use Model to predicte heart disease Coronary

#use-Model-to-predicte-heart-disease-Coronary

data for new person

prediction result

Gradient Boosting algorthim

#Gradient-Boosting-algorthim

algorithm for Category" Gradient boosting" will generate many of the trees expectation weak and then integrate it or improvement it the model a strong.

soures

Loading output library...

test the effectiveness of Model:

Gradient boost succeeded in expectation 86% of the total data ready- test is Saheehs

use Model to predicte heart disease Coronary

#use-Model-to-predicte-heart-disease-Coronary

data for new person

prediction result

Voting algorthim

#Voting-algorthim

algorithm" Voting" apply models voting multi- such as models that our work by the on the data, will be tested more than algorithm identifies the result of the expectation best

test the effectiveness of Model:

algorithm voting succeeded in expectation 86% of the total data ready- test

use Model to predicte heart disease Coronary

#use-Model-to-predicte-heart-disease-Coronary

data for new person

prediction result

result of perdiction:

#result-of-perdiction:

in our work in these algorthims that use for machin learing and data mining the result of accurance in desion tree only 83% and random forest 84% and voting algorithm 85%