We need to measure an error

#We-need-to-measure-an-error

So we will be able to compare models by how well they fit a data.

• Initialize parameter(s) with some random state (p = 42)
• Calculate an error/cost for a function with this parameter
• Change parameter, so error/cost will decrease
• Repeat until error/cost stop decreasing

Minimizing an error function

#Minimizing-an-error-function Using regression for classification

#Using-regression-for-classification
• Using the same approach from optimization
• Different error measure
• Map output of regression into probability: 0,1

What if data is not easly separable?

#What-if-data-is-not-easly-separable?

So what is regularization?

#So-what-is-regularization?
• It's a penalty for parameters
• Adding penalty to the cost function:
1
2
    Cost = MSE(y, f(w)) + 1/C * sum(w),
where f(x) = w*x
• Bigger parameter - larger cost function
• Optimizator will minimize parameters

Two main types of regularization

#Two-main-types-of-regularization
• L1 or sum(|w|) - will push parameters to zero
• L2 or sum(w^2) - will limit parameter growth

Coming back to the dataset

#Coming-back-to-the-dataset
• Let's use L1 regularization
• This will eliminate unuseful polynomial features
• Linear model
• Transforms data into higher dimentions
• More efficient than manual transform
• Optimizes margin between classes

Let's try some real problem

#Let's-try-some-real-problem   Area under receiver operating characteristic curve (ROC curve) Let's use logistic regresion

#Let's-use-logistic-regresion  