Predicting Housing Prices in Iowa

#Predicting-Housing-Prices-in-Iowa

My First Kaggle Competition- Final Score in the top 30%

#My-First-Kaggle-Competition--Final-Score-in-the-top-30%

House Prices: Advanced Regression Techniques

Predict sales prices and practice feature engineering, RFs, and gradient boosting

Skills Shown:

#Skills-Shown:

Data Visualization
Machine Learning-Regression
Feature Engineering

My Profile:https://www.kaggle.com/bpunturo

Kaggle Comp: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

Loading output library...
Loading output library...

Exploratory Data Analysis

#Exploratory-Data-Analysis
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

SalePrice data looks to be Right-Skewed. Will need to logarithmically transform the data before attempting my regressions.

#SalePrice-data-looks-to-be-Right-Skewed.-Will-need-to-logarithmically-transform-the-data-before-attempting-my-regressions.

Visualizing the relationships between all columns and salesprice

#Visualizing-the-relationships-between-all-columns-and-salesprice
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Feature Engineering

#Feature-Engineering

This section took a lot of creativity. I do not have much domain knowledge in regards to real estate, however, I did a lot of research on what features are important in houses.

This cell adds in a feature for total square feet, house grade(another measure of quality) and square footage in a house.

Loading output library...
Loading output library...
Loading output library...
Loading output library...

Fitting the models

#Fitting-the-models

Attempting Feature Selection with Random Forests

#Attempting-Feature-Selection-with-Random-Forests

I chose features depending on their importance within a Random Forests Model

#I-chose-features-depending-on-their-importance-within-a-Random-Forests-Model
Loading output library...
Loading output library...

I ended up not using the results from this feature selection technique.

#I-ended-up-not-using-the-results-from-this-feature-selection-technique.

Xgboost

#Xgboost

Lasso, Ridge and ElasticNet Regression

#Lasso,-Ridge-and-ElasticNet-Regression

Conclusion: After countless tweaking, I have discovered that Lasso Regression gives me the best results. Although I certainly would have liked better results, this is not bad for my first Kaggle Comp.

#Conclusion:-After-countless-tweaking,-I-have-discovered-that-Lasso-Regression-gives-me-the-best-results.-Although-I-certainly-would-have-liked-better-results,-this-is-not-bad-for-my-first-Kaggle-Comp.

Future Work: Regression is an expansive field. There are a multitude of ways to approach a regression problem. For future projects, perhaps I can chose different machine learning algorithms to output regressions. Supplementary, I could experiment with different feature selection techniques.

#Future-Work:-Regression-is-an-expansive-field.-There-are-a-multitude-of-ways-to-approach-a-regression-problem.-For-future-projects,-perhaps-I-can-chose-different-machine-learning-algorithms-to-output-regressions.-Supplementary,-I-could-experiment-with-different-feature-selection-techniques.

Generating my prediction

#Generating-my-prediction