So we will be able to compare models by how well they fit a data.

• Initialize parameter(s) with some random state (p = 42)
• Calculate an error/cost for a function with this parameter
• Change parameter, so error/cost will decrease
• Repeat until error/cost stop decreasing
• Using the same approach from optimization
• Different error measure
• Map output of regression into probability: 0,1
• It's a penalty for parameters
• Adding penalty to the cost function:
1
2
    Cost = MSE(y, f(w)) + 1/C * sum(w),
where f(x) = w*x
• Bigger parameter - larger cost function
• Optimizator will minimize parameters
• L1 or sum(|w|) - will push parameters to zero
• L2 or sum(w^2) - will limit parameter growth
• Let's use L1 regularization
• This will eliminate unuseful polynomial features
• Linear model
• Transforms data into higher dimentions
• More efficient than manual transform
• Optimizes margin between classes
## Area under receiver operating characteristic curve (ROC curve)

#Area-under-receiver-operating-characteristic-curve-(ROC-curve)
• Know your data
• Visualize everything
• Start from simple models
• Choose right metrics for evaluation
• Simple heristics are usually good start