Shortcuts

torchgeo.datamodules

Geospatial DataModules

Chesapeake Land Cover

class torchgeo.datamodules.ChesapeakeCVPRDataModule(root_dir, train_splits, val_splits, test_splits, patches_per_tile=200, patch_size=256, batch_size=64, num_workers=0, class_set=7, use_prior_labels=False, prior_smoothing_constant=0.0001, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the Chesapeake CVPR Land Cover dataset.

Uses the random splits defined per state to partition tiles into train, val, and test sets.

__init__(root_dir, train_splits, val_splits, test_splits, patches_per_tile=200, patch_size=256, batch_size=64, num_workers=0, class_set=7, use_prior_labels=False, prior_smoothing_constant=0.0001, **kwargs)[source]

Initialize a LightningDataModule for Chesapeake CVPR based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the ChesapeakeCVPR Dataset classes

  • train_splits (List[str]) – The splits used to train the model, e.g. [“ny-train”]

  • val_splits (List[str]) – The splits used to validate the model, e.g. [“ny-val”]

  • test_splits (List[str]) – The splits used to test the model, e.g. [“ny-test”]

  • patches_per_tile (int) – The number of patches per tile to sample

  • patch_size (int) – The size of each patch in pixels (test patches will be 1.5 times this size)

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • class_set (int) – The high-resolution land cover class set to use - 5 or 7

  • use_prior_labels (bool) – Flag for using a prior over high-resolution classes instead of the high-resolution labels themselves

  • prior_smoothing_constant (float) – additive smoothing to add when using prior labels

Raises

ValueError – if use_prior_labels is used with class_set==7

center_crop(size=512)[source]

Returns a function to perform a center crop transform on a single sample.

Parameters

size (int) – output image size

Returns

function to perform center crop

Return type

Callable[[Dict[str, Tensor]], Dict[str, Tensor]]

nodata_check(size=512)[source]

Returns a function to check for nodata or mis-sized input.

Parameters

size (int) – output image size

Returns

function to check for nodata values

Return type

Callable[[Dict[str, Tensor]], Dict[str, Tensor]]

pad_to(size=512, image_value=0, mask_value=0)[source]

Returns a function to perform a padding transform on a single sample.

Parameters
  • size (int) – output image size

  • image_value (int) – value to pad image with

  • mask_value (int) – value to pad mask with

Returns

function to perform padding

Return type

Callable[[Dict[str, Tensor]], Dict[str, Tensor]]

prepare_data()[source]

Confirms that the dataset is downloaded on the local node.

This method is called once per node, while setup() is called once per GPU.

preprocess(sample)[source]

Preprocesses a single sample.

Parameters

sample (Dict[str, Any]) – sample dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

remove_bbox(sample)[source]

Removes the bounding box property from a sample.

Parameters

sample (Dict[str, Any]) – dictionary with geographic metadata

Returns

sample without the bbox property

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

NAIP

class torchgeo.datamodules.NAIPChesapeakeDataModule(naip_root_dir, chesapeake_root_dir, batch_size=64, num_workers=0, patch_size=256, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the NAIP and Chesapeake datasets.

Uses the train/val/test splits from the dataset.

__init__(naip_root_dir, chesapeake_root_dir, batch_size=64, num_workers=0, patch_size=256, **kwargs)[source]

Initialize a LightningDataModule for NAIP and Chesapeake based DataLoaders.

Parameters
  • naip_root_dir (str) – directory containing NAIP data

  • chesapeake_root_dir (str) – directory containing Chesapeake data

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • patch_size (int) – size of patches to sample

chesapeake_transform(sample)[source]

Transform a single sample from the Chesapeake Dataset.

Parameters

sample (Dict[str, Any]) – Chesapeake mask dictionary

Returns

preprocessed Chesapeake data

Return type

Dict[str, Any]

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the NAIP Dataset.

Parameters

sample (Dict[str, Any]) – NAIP image dictionary

Returns

preprocessed NAIP data

Return type

Dict[str, Any]

remove_bbox(sample)[source]

Removes the bounding box property from a sample.

Parameters

sample (Dict[str, Any]) – dictionary with geographic metadata

Returns

sample without the bbox property

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – state to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

Non-geospatial DataModules

BigEarthNet

class torchgeo.datamodules.BigEarthNetDataModule(root_dir, bands='all', num_classes=19, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the BigEarthNet dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, bands='all', num_classes=19, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for BigEarthNet based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the BigEarthNet Dataset classes

  • bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}

  • num_classes (int) – number of classes to load in target. one of {19, 43}

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.BigEarthNet.plot().

New in version 0.2.

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]

Return a DataLoader for testing.

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

COWC

class torchgeo.datamodules.COWCCountingDataModule(root_dir, seed, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the COWC Counting dataset.

__init__(root_dir, seed, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for COWC Counting based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the COWCCounting Dataset class

  • seed (int) – The seed value to use when doing the dataset random_split

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.COWC.plot().

New in version 0.2.

prepare_data()[source]

Initialize the main Dataset objects for use in setup().

This includes optionally downloading the dataset. This is done once per node, while setup() is done once per GPU.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and target

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

Deep Globe Land Cover Challenge

class torchgeo.datamodules.DeepGlobeLandCoverDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the DeepGlobe Land Cover dataset.

Uses the train/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for DeepGlobe Land Cover based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the DeepGlobe Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Dict[str, Any]]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Dict[str, Any]]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Dict[str, Any]]

ETCI2021 Flood Detection

class torchgeo.datamodules.ETCI2021DataModule(root_dir, seed=0, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the ETCI2021 dataset.

Splits the existing train split from the dataset into train/val with 80/20 proportions, then uses the existing val dataset as the test data.

New in version 0.2.

__init__(root_dir, seed=0, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for ETCI2021 based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the ETCI2021 Dataset classes

  • seed (int) – The seed value to use when doing the dataset random_split

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.ETCI2021.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Notably, moves the given water mask to act as an input layer.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

EuroSAT

class torchgeo.datamodules.EuroSATDataModule(root_dir, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the EuroSAT dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for EuroSAT based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the EuroSAT Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.EuroSAT.plot().

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

FAIR1M

class torchgeo.datamodules.FAIR1MDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the FAIR1M dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for FAIR1M based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the FAIR1M Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • test_split_pct (float) – What percentage of the dataset to use as a test set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

Inria Aerial Image Labeling

class torchgeo.datamodules.InriaAerialImageLabelingDataModule(root_dir, batch_size=32, num_workers=0, val_split_pct=0.1, test_split_pct=0.1, patch_size=512, num_patches_per_tile=32, predict_on='test')

Bases: LightningDataModule

LightningDataModule implementation for the InriaAerialImageLabeling dataset.

Uses the train/test splits from the dataset and further splits the train split into train/val splits.

New in version 0.3.

__init__(root_dir, batch_size=32, num_workers=0, val_split_pct=0.1, test_split_pct=0.1, patch_size=512, num_patches_per_tile=32, predict_on='test')[source]

Initialize a LightningDataModule for InriaAerialImageLabeling.

Parameters
  • root_dir (str) – The root arugment to pass to the InriaAerialImageLabeling Dataset classes

  • batch_size (int) – The batch size used in the train DataLoader (val_batch_size == test_batch_size == 1)

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • test_split_pct (float) – What percentage of the dataset to use as a test set

  • patch_size (Union[int, Tuple[int, int]]) – Size of random patch from image and mask (height, width)

  • num_patches_per_tile (int) – Number of random patches per sample

  • predict_on (str) – Directory/Dataset of images to run inference on

n_random_crop(sample)[source]

Get n random crops.

on_after_batch_transfer(batch, dataloader_idx)[source]

Apply augmentations to batch after transferring to GPU.

Parameters
  • batch (dict) – A batch of data that needs to be altered or augmented.

  • dataloader_idx (int) – The index of the dataloader to which the batch

  • belongs.

Returns

A batch of data

Return type

dict

patch_sample(sample)[source]

Extract patches from single sample.

predict_dataloader()[source]

Return a DataLoader for prediction.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]

Return a DataLoader for testing.

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

LandCover.ai

class torchgeo.datamodules.LandCoverAIDataModule(root_dir, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the LandCover.ai dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for LandCover.ai based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the Landcover.AI Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

on_after_batch_transfer(batch, batch_idx)[source]

Apply batch augmentations after batch is transferred to the device.

Parameters
  • batch (Dict[str, Any]) – mini-batch of data

  • batch_idx (int) – batch index

Returns

augmented mini-batch

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]

Run torchgeo.datasets.LandCoverAI.plot().

New in version 0.2.

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

LoveDA

class torchgeo.datamodules.LoveDADataModule(root_dir, scene, batch_size=32, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the LoveDA dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, scene, batch_size=32, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for LoveDA based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to LoveDA Dataset classes

  • scene (List[str]) – specify whether to load only ‘urban’, only ‘rural’ or both

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

NASA Marine Debris

class torchgeo.datamodules.NASAMarineDebrisDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the NASA Marine Debris dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for NASA Marine Debris based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the Dataset class

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • test_split_pct (float) – What percentage of the dataset to use as a test set

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

OSCD

class torchgeo.datamodules.OSCDDataModule(root_dir, bands='all', train_batch_size=32, num_workers=0, val_split_pct=0.2, patch_size=(64, 64), num_patches_per_tile=32, pad_size=(1280, 1280), **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the OSCD dataset.

Uses the train/test splits from the dataset and further splits the train split into train/val splits.

New in version 0.2.

__init__(root_dir, bands='all', train_batch_size=32, num_workers=0, val_split_pct=0.2, patch_size=(64, 64), num_patches_per_tile=32, pad_size=(1280, 1280), **kwargs)[source]

Initialize a LightningDataModule for OSCD based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the OSCD Dataset classes

  • bands (str) – “rgb” or “all”

  • train_batch_size (int) – The batch size used in the train DataLoader (val_batch_size == test_batch_size == 1)

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

  • patch_size (Tuple[int, int]) – Size of random patch from image and mask (height, width)

  • num_patches_per_tile (int) – number of random patches per sample

  • pad_size (Tuple[int, int]) – size to pad images to during val/test steps

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]

Return a DataLoader for testing.

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

Potsdam

class torchgeo.datamodules.Potsdam2DDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the Potsdam2D dataset.

Uses the train/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for Potsdam2D based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the Potsdam2D Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

RESISC45

class torchgeo.datamodules.RESISC45DataModule(root_dir, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the RESISC45 dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for RESISC45 based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the RESISC45 Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

on_after_batch_transfer(batch, batch_idx)[source]

Apply batch augmentations after batch is transferred to the device.

Parameters
  • batch (Dict[str, Any]) – mini-batch of data

  • batch_idx (int) – batch index

Returns

augmented mini-batch

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]

Run torchgeo.datasets.RESISC45.plot().

New in version 0.2.

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

SEN12MS

class torchgeo.datamodules.SEN12MSDataModule(root_dir, seed, band_set='all', batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the SEN12MS dataset.

Implements 80/20 geographic train/val splits and uses the test split from the classification dataset definitions. See setup() for more details.

Uses the Simplified IGBP scheme defined in the 2020 Data Fusion Competition. See https://arxiv.org/abs/2002.08254.

__init__(root_dir, seed, band_set='all', batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for SEN12MS based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the SEN12MS Dataset classes

  • seed (int) – The seed value to use when doing the sklearn based ShuffleSplit

  • band_set (str) – The subset of S1/S2 bands to use. Options are: “all”, “s1”, “s2-all”, and “s2-reduced” where the “s2-reduced” set includes: B2, B3, B4, B8, B11, and B12.

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and mask

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

We split samples between train and val geographically with proportions of 80/20. This mimics the geographic test set split.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

So2Sat

class torchgeo.datamodules.So2SatDataModule(root_dir, batch_size=64, num_workers=0, bands='rgb', unsupervised_mode=False, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the So2Sat dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, bands='rgb', unsupervised_mode=False, **kwargs)[source]

Initialize a LightningDataModule for So2Sat based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the So2Sat Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • bands (str) – Either “rgb” or “s2”

  • unsupervised_mode (bool) – Makes the train dataloader return imagery from the train, val, and test sets

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

Tropical Cyclone

class torchgeo.datamodules.CycloneDataModule(root_dir, seed, batch_size=64, num_workers=0, api_key=None, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the NASA Cyclone dataset.

Implements 80/20 train/val splits based on hurricane storm ids. See setup() for more details.

__init__(root_dir, seed, batch_size=64, num_workers=0, api_key=None, **kwargs)[source]

Initialize a LightningDataModule for NASA Cyclone based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the TropicalCycloneWindEstimation Datasets classes

  • seed (int) – The seed value to use when doing the sklearn based GroupShuffleSplit

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • api_key (Optional[str]) – The RadiantEarth MLHub API key to use if the dataset needs to be downloaded

prepare_data()[source]

Initialize the main Dataset objects for use in setup().

This includes optionally downloading the dataset. This is done once per node, while setup() is done once per GPU.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image and target

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

We split samples between train/val by the storm_id property. I.e. all samples with the same storm_id value will be either in the train or the val split. This is important to test one type of generalizability – given a new storm, can we predict its windspeed. The test set, however, contains some storms from the training set (specifically, the latter parts of the storms) as well as some novel storms.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

UC Merced

class torchgeo.datamodules.UCMercedDataModule(root_dir, batch_size=64, num_workers=0, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the UC Merced dataset.

Uses random train/val/test splits.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]

Initialize a LightningDataModule for UCMerced based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the UCMerced Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]

Run torchgeo.datasets.UCMerced.plot().

New in version 0.2.

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

USAVars

class torchgeo.datamodules.USAVarsDataModule(root_dir, labels=['housing', 'income', 'roads', 'nightlights', 'population', 'elevation', 'treecover'], batch_size=64, num_workers=0)

Bases: LightningModule

LightningDataModule implementation for the USAVars dataset.

Uses random train/val/test splits.

New in version 0.3.

__init__(root_dir, labels=['housing', 'income', 'roads', 'nightlights', 'population', 'elevation', 'treecover'], batch_size=64, num_workers=0)[source]

Initialize a LightningDataModule for USAVars based DataLoaders.

Parameters
  • root_dir (str) – The root argument passed to the USAVars Dataset classes

  • labels (Sequence[str]) – The labels argument passed to the USAVars Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

prepare_data()[source]

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – dictionary containing image

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]

Return a DataLoader for testing.

train_dataloader()[source]

Return a DataLoader for training.

val_dataloader()[source]

Return a DataLoader for validation.

Vaihingen

class torchgeo.datamodules.Vaihingen2DDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the Vaihingen2D dataset.

Uses the train/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for Vaihingen2D based DataLoaders.

Parameters
  • root_dir (str) – The root argument to pass to the Vaihingen Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

xView2

class torchgeo.datamodules.XView2DataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)

Bases: LightningDataModule

LightningDataModule implementation for the xView2 dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]

Initialize a LightningDataModule for xView2 based DataLoaders.

Parameters
  • root_dir (str) – The root arugment to pass to the xView2 Dataset classes

  • batch_size (int) – The batch size to use in all created DataLoaders

  • num_workers (int) – The number of workers to use in all created DataLoaders

  • val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]

Transform a single sample from the Dataset.

Parameters

sample (Dict[str, Any]) – input image dictionary

Returns

preprocessed sample

Return type

Dict[str, Any]

setup(stage=None)[source]

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters

stage (Optional[str]) – stage to set up

test_dataloader()[source]

Return a DataLoader for testing.

Returns

testing data loader

Return type

DataLoader[Any]

train_dataloader()[source]

Return a DataLoader for training.

Returns

training data loader

Return type

DataLoader[Any]

val_dataloader()[source]

Return a DataLoader for validation.

Returns

validation data loader

Return type

DataLoader[Any]

Read the Docs v: v0.3.1
Versions
latest
stable
v0.3.1
v0.3.0
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources