Loading output library...

Check data

#Check-data
Loading output library...

Feature engineering

#Feature-engineering

extract the information out of the categorical variables

Remove original categorical features.

Remove Name_lenght because it is some how like PassengerID which contains too much overfitting informations.

Now you got data as:

  • No null,
  • All numeric,
  • Relevant information extracted,
  • Tthe categorical columns dropped.
Loading output library...

Check generated features

#Check-generated-features
Loading output library...
Loading output library...

There are few feature pairs which have high correlations. It indicates that each feature has unique vector. Good.

Learn

#Learn

Preperation

#Preperation

First-Level Models

#First-Level-Models

Using 5 classifiers here:

  • Random Forest classifier
  • Extra Trees classifier
  • AdaBoost classifer
  • Gradient Boosting classifer
  • Support Vector Machine

train

#train

show correlations of result

#show-correlations-of-result
Loading output library...
Loading output library...
Loading output library...

There have been quite a few articles and Kaggle competition winner stories about the merits of having trained models that are more uncorrelated with one another producing better scores.

Output of first level

#Output-of-first-level

Appendix: See importance of features

#Appendix:-See-importance-of-features
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Second-Level Model

#Second-Level-Model

Use XGBoost

Generate submission file

#Generate-submission-file