The goal of this exercise is to train a linear classifier on text features that represent sequences of up to 3 consecutive characters so as to be recognize natural languages by using the frequencies of short character sequences as 'fingerprints'.

Author: Olivier Grisel olivier.grisel@ensta.org

License: Simplified BSD

Loading output library...

http://scikit-learn.org/stable/modules/model_evaluation.html#precision-recall-and-f-measures

https://en.wikipedia.org/wiki/Precision_and_recall#Precision

**precision**: Ability of the classifier not to label as positive a sample that is negative. The higher the number, the more sure we are that the postive labels are actually positive.

`1`

`precision = true_positives / (true_positives + false_positives)`

**recall**: Ability of the classifier to find all the positive samples. The higher the number, the more sure we are that we are not missing any positive labels

`1`

`recall = true_positives / (true_positives + false_negatives)`

**f1-score**: Combines the precision and recall.

```
1
```

`f1 = 2.0 * true_positives / (2*true_positives + false_positives + false_negatives)`

**support**: The number of occurrences of each class in positive labels.

```
1
```

`F1 = 2 * (precision * recall) / (precision + recall)`

AKA error matrix.

Loading output library...