I'm here performing some first experimentations with creating a classifyer for predicting whether we can trust an predicted price or not. I have downloaded the
estimates_for_sold from prod and combined it with the
market transactions data base that I have localy.
Note that I have also taken some code from the change-io-method to ease the pre-processing steps, and that I have made this notebook stand-alone by adding code for
Note: The above processing only includes one estimation for each dwelling.
Here I have already defined the types for each column in
Now, get information on the median ape and count of dwellings that have been sold nearby. Again, this is slightly cheating bacause there is information from the ape that is beeing stored also in the test data.
Check that each feature is distributed equaly in both the training and test dataset
First lests check the accuracy when we consider that the sale was missplaced one and two categories
There is no getting around that this is a really good result, and more than good enough for the intended usage. Lets have a closer look at how the classifications are spread.
From the above plot we can observe that the number of estimates in category 1 is overestimated, while the oposit is true in category 2 and 3. Of note is also that dwellings with >25% error is very well predicted (category 7). Now lets have a look at the original distribution of the data.