This is the automobile data with missing values. Goal is to clean the data and use machine learning algorithms to predict the price.

Data Exploration

#Data-Exploration

We are reading a csv file and appending the data in a list

use numpy

#use-numpy

We created a numpy array named as alldatanp. This will print the number of missing rows in the whole data set.

Missing values

#Missing-values

Here we are counting the total number of missing cells in this data set.

Row numbers with missing values.

#Row-numbers-with-missing-values.

Printing the row number with the count of values missing in that row. We are making a dictionary here and printing Key as a row number and value as the number of values missing in that row.

#Printing-the-row-number-with-the-count-of-values-missing-in-that-row.-We-are-making-a-dictionary-here-and-printing-Key-as-a-row-number-and-value-as-the-number-of-values-missing-in-that-row.

What are the column numbers having missing values in it.

#What-are-the-column-numbers-having-missing-values-in-it.

Printing the column number with the count of missing values in it. for example column number 1 has 41 missing values.

#Printing-the-column-number-with-the-count-of-missing-values-in-it.-for-example-column-number-1-has-41-missing-values.

Printing the top 3 rows and all columns

#Printing-the-top-3-rows-and-all-columns

As we have 12 rows with missing values. Total number of rows are 205, around 5% rows are missing so we will not append these 12 rows in our final data set.

#As-we-have-12-rows-with-missing-values.-Total-number-of-rows-are-205,-around-5%-rows-are-missing-so-we-will-not-append-these-12-rows-in-our-final-data-set.

No More missing Values

#No-More-missing-Values

Replacing non numerical variables (attributes)

#Replacing-non-numerical-variables-(attributes)
Loading output library...
Loading output library...

All values are non numerical now

#All-values-are-non-numerical-now

now changing all data to float

Split data for machine learning algo

#Split-data-for-machine-learning-algo

Import Libraries

#Import-Libraries
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Finally achieve 87 percent accuracy