Both of these functions import datacube data using a query, and return an xarray dataset with multiple bands/variables and 'geo_transform' and 'proj' attributes. This format is required as an input to both randomforest_train and randomforest_classify, and ensures that both training and analysis data are consistent.
Uses randomforest_train to extract training data from potentially multiple training shapefiles, and returns a trained classifier (and optionally, training label and training sample arrays)
Classifies and exports an analysis dataset using a previously trained random forest classifier, provided this dataset has the same number of bands/variables as the data used to train the classifier. Using the same data function (e.g. tc_import, hltc_import) used to train the classifier will ensure this is the case. By setting 'class_prob = True', can optionally export a geotiff of predicted class probabilities in addition to classification output.
Extract classifier estimates of the relative importance of each band/variable for training the classifier. Useful for potentially selecting a subset of input bands/variables for model training/classification (i.e. optimising feature space)
Export .png plots of each decision tree in the random forest ensemble. Useful for inspecting the splits used by the classifier to classify the data.
Random forest classifiers contain many modifiable parameters that can strongly affect the performance of the model. This section evaluates the effect of these parameters by plotting out-of-bag (OOB) error for a set of classifier parameter scenarios, and exports the resulting plots to file.
Code to visualise internal structure of ensemble forest using histogram of leaf depths and number of samples.
Not currently working; will need method for incorperating validation data