Set up analysis

#Set-up-analysis

Defines functions used for analysis.

Functions used to import training and analysis data

#Functions-used-to-import-training-and-analysis-data

Both of these functions import datacube data using a query, and return an xarray dataset with multiple bands/variables and 'geo_transform' and 'proj' attributes. This format is required as an input to both randomforest_train and randomforest_classify, and ensures that both training and analysis data are consistent.

Import training data and fit model

#Import-training-data-and-fit-model

Uses randomforest_train to extract training data from potentially multiple training shapefiles, and returns a trained classifier (and optionally, training label and training sample arrays)

Import analysis data and classify

#Import-analysis-data-and-classify

Classifies and exports an analysis dataset using a previously trained random forest classifier, provided this dataset has the same number of bands/variables as the data used to train the classifier. Using the same data function (e.g. tc_import, hltc_import) used to train the classifier will ensure this is the case. By setting 'class_prob = True', can optionally export a geotiff of predicted class probabilities in addition to classification output.

Loading output library...
Loading output library...
Loading output library...

Feature/band/variable importance

#Feature/band/variable-importance

Extract classifier estimates of the relative importance of each band/variable for training the classifier. Useful for potentially selecting a subset of input bands/variables for model training/classification (i.e. optimising feature space)

Loading output library...
Loading output library...

Export tree diagrams

#Export-tree-diagrams

Export .png plots of each decision tree in the random forest ensemble. Useful for inspecting the splits used by the classifier to classify the data.

Plot performance of model by parameter values

#Plot-performance-of-model-by-parameter-values

Random forest classifiers contain many modifiable parameters that can strongly affect the performance of the model. This section evaluates the effect of these parameters by plotting out-of-bag (OOB) error for a set of classifier parameter scenarios, and exports the resulting plots to file.

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

Visualise random forest structure

#Visualise-random-forest-structure

Code to visualise internal structure of ensemble forest using histogram of leaf depths and number of samples.

Source: https://github.com/aysent/random-forest-leaf-visualization

Loading output library...

Classification statistics (TBA)

#Classification-statistics-(TBA)

Not currently working; will need method for incorperating validation data