Shortcuts
Open in Studio Open in Colab
[ ]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

Custom Trainers

Written by: Caleb Robinson

In this tutorial, we demonstrate how to extend a TorchGeo “trainer class”. In TorchGeo there exist several trainer classes that are pre-made PyTorch Lightning Modules designed to allow for the easy training of models on semantic segmentation, classification, change detection, etc. tasks using TorchGeo’s prebuilt DataModules. While the trainers aim to provide sensible defaults and customization options for common tasks, they will not be able to cover all situations (e.g., researchers will likely want to implement and use their own architectures, loss functions, optimizers, and/or metrics in the training routine). If you run into such a situation, then you can simply extend the trainer class you are interested in, and write custom logic to override the default functionality.

This tutorial shows how to do exactly this to customize a learning rate schedule, logging, and model checkpointing for a semantic segmentation task using the LandCover.ai dataset.

It’s recommended to run this notebook on Google Colab if you don’t have your own GPU. Click the “Open in Colab” button above to get started.

Setup

As always, we install TorchGeo.

[ ]:
%pip install torchgeo

Imports

Next, we import TorchGeo and any other libraries we need.

[ ]:
from collections.abc import Sequence
from typing import Any

import lightning
import lightning.pytorch as pl
from lightning.pytorch.callbacks import ModelCheckpoint
from lightning.pytorch.callbacks.callback import Callback
from torch.optim import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR
from torchmetrics import MetricCollection
from torchmetrics.classification import (
    Accuracy,
    FBetaScore,
    JaccardIndex,
    Precision,
    Recall,
)

from torchgeo.datamodules import LandCoverAI100DataModule
from torchgeo.trainers import SemanticSegmentationTask

Custom SemanticSegmentationTask

Now, we create a CustomSemanticSegmentationTask class that inhierits from SemanticSegmentationTask and that overrides a few methods:

  • __init__: We add two new parameters tmax and eta_min to control the learning rate scheduler

  • configure_optimizers: We use the CosineAnnealingLR learning rate scheduler instead of the default ReduceLROnPlateau

  • configure_metrics: We add a “MeanIoU” metric (what we will use to evaluate the model’s performance) and a variety of other classification metrics

  • configure_callbacks: We demonstrate how to stack ModelCheckpoint callbacks to save the best checkpoint as well as periodic checkpoints

  • on_train_epoch_start: We log the learning rate at the start of each epoch so we can easily see how it decays over a training run

Overall these demonstrate how to customize the training routine to investigate specific research questions (e.g., the effect of the scheduler on test performance).

[ ]:
class CustomSemanticSegmentationTask(SemanticSegmentationTask):
    # any keywords we add here between *args and **kwargs will be found in self.hparams
    def __init__(
        self, *args: Any, tmax: int = 50, eta_min: float = 1e-6, **kwargs: Any
    ) -> None:
        super().__init__(*args, **kwargs)  # pass args and kwargs to the parent class

    def configure_optimizers(
        self,
    ) -> 'lightning.pytorch.utilities.types.OptimizerLRSchedulerConfig':
        """Initialize the optimizer and learning rate scheduler.

        Returns:
            Optimizer and learning rate scheduler.
        """
        tmax: int = self.hparams['tmax']
        eta_min: float = self.hparams['eta_min']

        optimizer = AdamW(self.parameters(), lr=self.hparams['lr'])
        scheduler = CosineAnnealingLR(optimizer, T_max=tmax, eta_min=eta_min)
        return {
            'optimizer': optimizer,
            'lr_scheduler': {'scheduler': scheduler, 'monitor': self.monitor},
        }

    def configure_metrics(self) -> None:
        """Initialize the performance metrics."""
        num_classes: int = self.hparams['num_classes']

        self.train_metrics = MetricCollection(
            {
                'OverallAccuracy': Accuracy(
                    task='multiclass', num_classes=num_classes, average='micro'
                ),
                'OverallPrecision': Precision(
                    task='multiclass', num_classes=num_classes, average='micro'
                ),
                'OverallRecall': Recall(
                    task='multiclass', num_classes=num_classes, average='micro'
                ),
                'OverallF1Score': FBetaScore(
                    task='multiclass',
                    num_classes=num_classes,
                    beta=1.0,
                    average='micro',
                ),
                'MeanIoU': JaccardIndex(
                    num_classes=num_classes, task='multiclass', average='macro'
                ),
            },
            prefix='train_',
        )
        self.val_metrics = self.train_metrics.clone(prefix='val_')
        self.test_metrics = self.train_metrics.clone(prefix='test_')

    def configure_callbacks(self) -> Sequence[Callback] | Callback:
        """Initialize callbacks for saving the best and latest models.

        Returns:
            List of callbacks to apply.
        """
        return [
            ModelCheckpoint(every_n_epochs=50, save_top_k=-1, save_last=True),
            ModelCheckpoint(monitor=self.monitor, mode=self.mode, save_top_k=5),
        ]

    def on_train_epoch_start(self) -> None:
        """Log the learning rate at the start of each training epoch."""
        optimizers = self.optimizers()
        if isinstance(optimizers, list):
            lr = optimizers[0].param_groups[0]['lr']
        else:
            lr = optimizers.param_groups[0]['lr']
        self.logger.experiment.add_scalar('lr', lr, self.current_epoch)

Train model

The remainder of the turial is straightforward and follows the typical PyTorch Lightning training routine. We instantiate a DataModule for the LandCover.AI 100 dataset (a small version of the LandCover.AI dataset for notebook testing), instantiate a CustomSemanticSegmentationTask with a U-Net and ResNet-18 backbone, then train the model using a Lightning trainer.

The following variables can be modified to control training.

[ ]:
batch_size = 32
num_workers = 8
max_epochs = 50
fast_dev_run = False
[ ]:
dm = LandCoverAI100DataModule(
    root='data', batch_size=batch_size, num_workers=num_workers, download=True
)
[ ]:
task = CustomSemanticSegmentationTask(
    model='unet',
    backbone='resnet18',
    weights=True,
    in_channels=3,
    num_classes=6,
    loss='ce',
    lr=1e-3,
    tmax=50,
)
[ ]:
# validate that the task's hyperparameters are as expected
task.hparams
[ ]:
trainer = pl.Trainer(
    fast_dev_run=fast_dev_run, log_every_n_steps=1, min_epochs=1, max_epochs=max_epochs
)
[ ]:
trainer.fit(task, dm)

Test model

Finally, we test the model (optionally loading from a previously saved checkpoint).

[ ]:
# You can load directly from a saved checkpoint with `.load_from_checkpoint(...)`
# Note that you can also just call `trainer.test(task, dm)` if you've already trained
# the model in the current notebook session.

# task = CustomSemanticSegmentationTask.load_from_checkpoint(
#     os.path.join('lightning_logs', 'version_0', 'checkpoints', 'epoch=0-step=1.ckpt')
# )

trainer.test(task, dm)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources