The goal of this exercise is to train a linear classifier on text features that represent sequences of up to 3 consecutive characters so as to be recognize natural languages by using the frequencies of short character sequences as 'fingerprints'.
Author: Olivier Grisel firstname.lastname@example.org
License: Simplified BSD
precision: Ability of the classifier not to label as positive a sample that is negative. The higher the number, the more sure we are that the postive labels are actually positive.
precision = true_positives / (true_positives + false_positives)
recall: Ability of the classifier to find all the positive samples. The higher the number, the more sure we are that we are not missing any positive labels
recall = true_positives / (true_positives + false_negatives)
f1-score: Combines the precision and recall.
f1 = 2.0 * true_positives / (2*true_positives + false_positives + false_negatives)
support: The number of occurrences of each class in positive labels.
F1 = 2 * (precision * recall) / (precision + recall)