First things first... We load all the necessary Python libraries.
All the concatenated feature maps (x) and the labeled image (y) are loaded. We then fetch some informations about the overall geometry (GeoTransform and Projection) that we want for the output, and store this informations in variables that will be used at the end of the notebook.
In order to process both raster arrays, we first need to reduce their dimensions. Further functions only take 2D or 1D arrays... In case of large files, we can also reduce data size by sampling half of the pixels.
We then create both training and test sets in order to asses the classifier's performance later on. Both sets were created using a stratified sampling strategy, which allows for a balanced dataset. Random Forest models have a better learning curve when provided such data sets. We use a 50%/50% split.
Data is then scaled, another step which allows for a lower computation time and better performances for both PCA and Random Forest algorithms.
Optional: Dimensionality reduction using PCA. Such technique allows us to decorrelate all axes, thus reducing training time when working with classifiers.
We are ready to classify our data...
First we need to classify the training set, then the test set. We compare accuracy metrics obtained for both, and perform final classification on the entire scene.