Shortcuts

torchgeo.datamodules

Geospatial DataModules

Chesapeake Bay High-Resolution Land Cover Project

class torchgeo.datamodules.ChesapeakeCVPRDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the Chesapeake CVPR Land Cover dataset.

Uses the random splits defined per state to partition tiles into train, val, and test sets.

__init__(root_dir, train_splits, val_splits, test_splits, patches_per_tile=200, patch_size=256, batch_size=64, num_workers=0, class_set=7, use_prior_labels=False, prior_smoothing_constant=0.0001, **kwargs)[source]

Initialize a LightningDataModule for Chesapeake CVPR based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the ChesapeakeCVPR Dataset classes

  • train_splits (List[str]) – The splits used to train the model, e.g. [“ny-train”]

  • val_splits (List[str]) – The splits used to validate the model, e.g. [“ny-val”]

  • test_splits (List[str]) – The splits used to test the model, e.g. [“ny-test”]

  • patches_per_tile (int) – The number of patches per tile to sample

  • patch_size (int) – The size of each patch in pixels (test patches will be 1.5 times this size)

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • class_set (int) – The high-resolution land cover class set to use - 5 or 7

  • use_prior_labels (bool) – Flag for using a prior over high-resolution classes instead of the high-resolution labels themselves

  • prior_smoothing_constant (float) – additive smoothing to add when using prior labels

Raises

ValueError – if use_prior_labels is used with class_set==7

center_crop(size=512)[source]

Returns a function to perform a center crop transform on a single sample.

Parameters

size (int) – output image size

Returns

function to perform center crop

Return type

Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]

nodata_check(size=512)[source]

Returns a function to check for nodata or mis-sized input.

Parameters

size (int) – output image size

Returns

function to check for nodata values

Return type

Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]

pad_to(size=512, image_value=0, mask_value=0)[source]

Returns a function to perform a padding transform on a single sample.

Parameters
  • size (int) – output image size

  • image_value (int) – value to pad image with

  • mask_value (int) – value to pad mask with

Returns

function to perform padding

Return type

Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]

prepare_data()[source]

Confirms that the dataset is downloaded on the local node.

This method is called once per node, while setup() is called once per GPU.

preprocess(sample)[source]

Preprocesses a single sample.

Parameters

sample (Dict[str, Any]) – sample dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

National Agriculture Imagery Program (NAIP)

class torchgeo.datamodules.NAIPChesapeakeDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the NAIP and Chesapeake datasets.

Uses the train/val/test splits from the dataset.

__init__(naip_root_dir, chesapeake_root_dir, batch_size=64, num_workers=0, patch_size=256, **kwargs)[source]

Initialize a LightningDataModule for NAIP and Chesapeake based DataLoaders.

Parameters
  • naip_root_dir (str) – directory containing NAIP data

  • chesapeake_root_dir (str) – directory containing Chesapeake data

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • patch_size (int) – size of patches to sample

chesapeake_transform(sample)[source]

Transform a single sample from the Chesapeake Dataset.

Parameters

sample (Dict[str, Any]) – Chesapeake mask dictionary

Returns

preprocessed Chesapeake data

Return type

Dict[str, Any]

naip_transform(sample)[source]

Transform a single sample from the NAIP Dataset.

Parameters

sample (Dict[str, Any]) – NAIP image dictionary

Returns

preprocessed NAIP data

Return type

Dict[str, Any]

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – state to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

Non-geospatial DataModules

BigEarthNet

class torchgeo.datamodules.BigEarthNetDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the BigEarthNet dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, bands='all', num_classes=19, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for BigEarthNet based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the BigEarthNet Dataset classes

  • bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}

  • num_classes (int) – number of classes to load in target. one of {19, 43}

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.BigEarthNet.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]

Return a DataLoader for testing.

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

Cars Overhead With Context (COWC)

class torchgeo.datamodules.COWCCountingDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the COWC Counting dataset.

__init__(root_dir, seed, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for COWC Counting based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the COWCCounting Dataset class

  • seed (int) – The seed value to use when doing the dataset random_split

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

custom_transform(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and target

Returns

preprocessed sample

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]

Run torchgeo.datasets.COWC.plot().

prepare_data()[source]

Initialize the main Dataset objects for use in setup().

This includes optionally downloading the dataset. This is done once per node, while setup() is done once per GPU.

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

ETCI2021 Flood Detection

class torchgeo.datamodules.ETCI2021DataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the ETCI2021 dataset.

Splits the existing train split from the dataset into train/val with 80/20 proportions, then uses the existing val dataset as the test data.

New in version 0.2.

__init__(root_dir, seed=0, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for ETCI2021 based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the ETCI2021 Dataset classes

  • seed (int) – The seed value to use when doing the dataset random_split

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.ETCI2021.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Notably, moves the given water mask to act as an input layer.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

EuroSAT

class torchgeo.datamodules.EuroSATDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the EuroSAT dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for EuroSAT based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the EuroSAT Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.EuroSAT.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

FAIR1M (Fine-grAined object recognItion in high-Resolution imagery)

class torchgeo.datamodules.FAIR1MDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the FAIR1M dataset.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for FAIR1M based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the FAIR1M Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • test_split_pct (float) – What percentage of the dataset to use as a test set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

LandCover.ai (Land Cover from Aerial Imagery)

class torchgeo.datamodules.LandCoverAIDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the LandCover.ai dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for LandCover.ai based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the Landcover.AI Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

on_after_batch_transfer(batch, batch_idx)[source]

Apply batch augmentations after batch is transferred to the device.

Parameters
  • batch (Dict[str, Any]) – mini-batch of data

  • batch_idx (int) – batch index

Returns

augmented mini-batch

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]

Run torchgeo.datasets.LandCoverAI.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

LoveDA (Land-cOVEr Domain Adaptive semantic segmentation)

class torchgeo.datamodules.LoveDADataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the LoveDA dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, scene, batch_size=32, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for LoveDA based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to LoveDA Dataset classes

  • scene (List[str]) – specify whether to load only ‘urban’, only ‘rural’ or both

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

NASA Marine Debris

class torchgeo.datamodules.NASAMarineDebrisDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the NASA Marine Debris dataset.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for NASA Marine Debris based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the Dataset class

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • test_split_pct (float) – What percentage of the dataset to use as a test set

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

OSCD (Onera Satellite Change Detection)

class torchgeo.datamodules.OSCDDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the OSCD dataset.

Uses the train/test splits from the dataset and further splits the train split into train/val splits.

New in version 0.2.

__init__(root_dir, bands='all', train_batch_size=32, num_workers=0, val_split_pct=0.2, patch_size=(64, 64), num_patches_per_tile=32, pad_size=(1280, 1280), **kwargs)[source]

Initialize a LightningDataModule for OSCD based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the OSCD Dataset classes

  • bands (str) – “rgb” or “all”

  • train_batch_size (int) – The batch size used in the train DataLoader (val_batch_size == test_batch_size == 1)

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • patch_size (Tuple[int, int]) – Size of random patch from image and mask (height, width)

  • num_patches_per_tile (int) – number of random patches per sample

  • pad_size (Tuple[int, int]) – size to pad images to during val/test steps

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]

Return a DataLoader for testing.

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

Potsdam

class torchgeo.datamodules.Potsdam2DDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the Potsdam2D dataset.

Uses the train/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for Potsdam2D based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the Potsdam2D Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

RESISC45 (Remote Sensing Image Scene Classification)

class torchgeo.datamodules.RESISC45DataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the RESISC45 dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for RESISC45 based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the RESISC45 Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

on_after_batch_transfer(batch, batch_idx)[source]

Apply batch augmentations after batch is transferred to the device.

Parameters
  • batch (Dict[str, Any]) – mini-batch of data

  • batch_idx (int) – batch index

Returns

augmented mini-batch

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]

Run torchgeo.datasets.RESISC45.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

SEN12MS

class torchgeo.datamodules.SEN12MSDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the SEN12MS dataset.

Implements 80/20 geographic train/val splits and uses the test split from the classification dataset definitions. See setup() for more details.

Uses the Simplified IGBP scheme defined in the 2020 Data Fusion Competition. See https://arxiv.org/abs/2002.08254.

__init__(root_dir, seed, band_set='all', batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for SEN12MS based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the SEN12MS Dataset classes

  • seed (int) – The seed value to use when doing the sklearn based ShuffleSplit

  • band_set (str) – The subset of S1/S2 bands to use. Options are: “all”, “s1”, “s2-all”, and “s2-reduced” where the “s2-reduced” set includes: B2, B3, B4, B8, B11, and B12.

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

custom_transform(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

We split samples between train and val geographically with proportions of 80/20. This mimics the geographic test set split.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

So2Sat

class torchgeo.datamodules.So2SatDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the So2Sat dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, bands='rgb', unsupervised_mode=False, **kwargs)[source]

Initialize a LightningDataModule for So2Sat based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the So2Sat Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • bands (str) – Either “rgb” or “s2”

  • unsupervised_mode (bool) – Makes the train dataloader return imagery from the train, val, and test sets

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

Tropical Cyclone Wind Estimation Competition

class torchgeo.datamodules.CycloneDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the NASA Cyclone dataset.

Implements 80/20 train/val splits based on hurricane storm ids. See setup() for more details.

__init__(root_dir, seed, batch_size=64, num_workers=0, api_key=None, **kwargs)[source]

Initialize a LightningDataModule for NASA Cyclone based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the TropicalCycloneWindEstimation Datasets classes

  • seed (int) – The seed value to use when doing the sklearn based GroupShuffleSplit

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • api_key (Optional[str]) – The RadiantEarth MLHub API key to use if the dataset needs to be downloaded

custom_transform(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and target

Returns

preprocessed sample

Return type

Dict[str, Any]

prepare_data()[source]

Initialize the main Dataset objects for use in setup().

This includes optionally downloading the dataset. This is done once per node, while setup() is done once per GPU.

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

We split samples between train/val by the storm_id property. I.e. all samples with the same storm_id value will be either in the train or the val split. This is important to test one type of generalizability – given a new storm, can we predict its windspeed. The test set, however, contains some storms from the training set (specifically, the latter parts of the storms) as well as some novel storms.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

UC Merced

class torchgeo.datamodules.UCMercedDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the UC Merced dataset.

Uses random train/val/test splits.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for UCMerced based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the UCMerced Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.UCMerced.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

Vaihingen

class torchgeo.datamodules.Vaihingen2DDataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the Vaihingen2D dataset.

Uses the train/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for Vaihingen2D based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the Vaihingen Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

xView2

class torchgeo.datamodules.XView2DataModule(*args, **kwargs)

Bases: pytorch_lightning.core.datamodule.LightningDataModule

LightningDataModule implementation for the xView2 dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for xView2 based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the xView2 Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

torch.utils.data.DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

torch.utils.data.DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

torch.utils.data.DataLoader[Any]

Read the Docs v: v0.2.0
Versions
latest
stable
v0.2.0
v0.1.1
v0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources