torchgeo.datamodules¶

Geospatial DataModules¶

Chesapeake Land Cover¶

class torchgeo.datamodules.ChesapeakeCVPRDataModule(root_dir, train_splits, val_splits, test_splits, patches_per_tile=200, patch_size=256, batch_size=64, num_workers=0, class_set=7, use_prior_labels=False, prior_smoothing_constant=0.0001, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the Chesapeake CVPR Land Cover dataset.

Uses the random splits defined per state to partition tiles into train, val, and test sets.

__init__(root_dir, train_splits, val_splits, test_splits, patches_per_tile=200, patch_size=256, batch_size=64, num_workers=0, class_set=7, use_prior_labels=False, prior_smoothing_constant=0.0001, **kwargs)[source]¶

Initialize a LightningDataModule for Chesapeake CVPR based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the ChesapeakeCVPR Dataset classes
train_splits (List[str]) – The splits used to train the model, e.g. [“ny-train”]
val_splits (List[str]) – The splits used to validate the model, e.g. [“ny-val”]
test_splits (List[str]) – The splits used to test the model, e.g. [“ny-test”]
patches_per_tile (int) – The number of patches per tile to sample
patch_size (int) – The size of each patch in pixels (test patches will be 1.5 times this size)
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
class_set (int) – The high-resolution land cover class set to use - 5 or 7
use_prior_labels (bool) – Flag for using a prior over high-resolution classes instead of the high-resolution labels themselves
prior_smoothing_constant (float) – additive smoothing to add when using prior labels

Raises

ValueError – if use_prior_labels is used with class_set==7

center_crop(size=512)[source]¶

Returns a function to perform a center crop transform on a single sample.

Parameters: size (int) – output image size
Returns: function to perform center crop
Return type: Callable[[Dict[str, Tensor]], Dict[str, Tensor]]

nodata_check(size=512)[source]¶

Returns a function to check for nodata or mis-sized input.

Parameters: size (int) – output image size
Returns: function to check for nodata values
Return type: Callable[[Dict[str, Tensor]], Dict[str, Tensor]]

pad_to(size=512, image_value=0, mask_value=0)[source]¶

Returns a function to perform a padding transform on a single sample.

Parameters

size (int) – output image size
image_value (int) – value to pad image with
mask_value (int) – value to pad mask with

Returns

function to perform padding

Return type

Callable[[Dict[str, Tensor]], Dict[str, Tensor]]

prepare_data()[source]¶

Confirms that the dataset is downloaded on the local node.

This method is called once per node, while setup() is called once per GPU.

preprocess(sample)[source]¶

Preprocesses a single sample.

Parameters: sample (Dict[str, Any]) – sample dictionary containing image and mask
Returns: preprocessed sample
Return type: Dict[str, Any]

remove_bbox(sample)[source]¶

Removes the bounding box property from a sample.

Parameters: sample (Dict[str, Any]) – dictionary with geographic metadata

Returns: sample without the bbox property

setup(stage=None)[source]¶

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

NAIP¶

class torchgeo.datamodules.NAIPChesapeakeDataModule(naip_root_dir, chesapeake_root_dir, batch_size=64, num_workers=0, patch_size=256, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the NAIP and Chesapeake datasets.

Uses the train/val/test splits from the dataset.

__init__(naip_root_dir, chesapeake_root_dir, batch_size=64, num_workers=0, patch_size=256, **kwargs)[source]¶

Initialize a LightningDataModule for NAIP and Chesapeake based DataLoaders.

Parameters

naip_root_dir (str) – directory containing NAIP data
chesapeake_root_dir (str) – directory containing Chesapeake data
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
patch_size (int) – size of patches to sample

chesapeake_transform(sample)[source]¶

Transform a single sample from the Chesapeake Dataset.

Parameters: sample (Dict[str, Any]) – Chesapeake mask dictionary
Returns: preprocessed Chesapeake data
Return type: Dict[str, Any]

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the NAIP Dataset.

Parameters: sample (Dict[str, Any]) – NAIP image dictionary
Returns: preprocessed NAIP data
Return type: Dict[str, Any]

remove_bbox(sample)[source]¶

Removes the bounding box property from a sample.

Parameters: sample (Dict[str, Any]) – dictionary with geographic metadata

Returns: sample without the bbox property

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – state to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

Non-geospatial DataModules¶

BigEarthNet¶

class torchgeo.datamodules.BigEarthNetDataModule(root_dir, bands='all', num_classes=19, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the BigEarthNet dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, bands='all', num_classes=19, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for BigEarthNet based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the BigEarthNet Dataset classes
bands (str) – load Sentinel-1 bands, Sentinel-2, or both. one of {s1, s2, all}
num_classes (int) – number of classes to load in target. one of {19, 43}
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.BigEarthNet.plot().

New in version 0.2.

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]¶

Return a DataLoader for testing.

train_dataloader()[source]¶

Return a DataLoader for training.

val_dataloader()[source]¶

Return a DataLoader for validation.

COWC¶

class torchgeo.datamodules.COWCCountingDataModule(root_dir, seed, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the COWC Counting dataset.

__init__(root_dir, seed, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for COWC Counting based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the COWCCounting Dataset class
seed (int) – The seed value to use when doing the dataset random_split
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.COWC.plot().

New in version 0.2.

prepare_data()[source]¶

Initialize the main Dataset objects for use in setup().

This includes optionally downloading the dataset. This is done once per node, while setup() is done once per GPU.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image and target
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

Deep Globe Land Cover Challenge¶

class torchgeo.datamodules.DeepGlobeLandCoverDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the DeepGlobe Land Cover dataset.

Uses the train/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]¶

Initialize a LightningDataModule for DeepGlobe Land Cover based DataLoaders.

Parameters

root_dir (str) – The root argument to pass to the DeepGlobe Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Dict[str, Any]]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Dict[str, Any]]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Dict[str, Any]]

ETCI2021 Flood Detection¶

class torchgeo.datamodules.ETCI2021DataModule(root_dir, seed=0, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the ETCI2021 dataset.

Splits the existing train split from the dataset into train/val with 80/20 proportions, then uses the existing val dataset as the test data.

New in version 0.2.

__init__(root_dir, seed=0, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for ETCI2021 based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the ETCI2021 Dataset classes
seed (int) – The seed value to use when doing the dataset random_split
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.ETCI2021.plot().

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Notably, moves the given water mask to act as an input layer.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

EuroSAT¶

class torchgeo.datamodules.EuroSATDataModule(root_dir, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the EuroSAT dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for EuroSAT based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the EuroSAT Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.EuroSAT.plot().

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

FAIR1M¶

class torchgeo.datamodules.FAIR1MDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the FAIR1M dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)[source]¶

Initialize a LightningDataModule for FAIR1M based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the FAIR1M Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set
test_split_pct (float) – What percentage of the dataset to use as a test set

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

Inria Aerial Image Labeling¶

class torchgeo.datamodules.InriaAerialImageLabelingDataModule(root_dir, batch_size=32, num_workers=0, val_split_pct=0.1, test_split_pct=0.1, patch_size=512, num_patches_per_tile=32, predict_on='test')¶

Bases: LightningDataModule

LightningDataModule implementation for the InriaAerialImageLabeling dataset.

Uses the train/test splits from the dataset and further splits the train split into train/val splits.

New in version 0.3.

__init__(root_dir, batch_size=32, num_workers=0, val_split_pct=0.1, test_split_pct=0.1, patch_size=512, num_patches_per_tile=32, predict_on='test')[source]¶

Initialize a LightningDataModule for InriaAerialImageLabeling.

Parameters

root_dir (str) – The root arugment to pass to the InriaAerialImageLabeling Dataset classes
batch_size (int) – The batch size used in the train DataLoader (val_batch_size == test_batch_size == 1)
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set
test_split_pct (float) – What percentage of the dataset to use as a test set
patch_size (Union[int, Tuple[int, int]]) – Size of random patch from image and mask (height, width)
num_patches_per_tile (int) – Number of random patches per sample
predict_on (str) – Directory/Dataset of images to run inference on

n_random_crop(sample)[source]¶

Get n random crops.

on_after_batch_transfer(batch, dataloader_idx)[source]¶

Apply augmentations to batch after transferring to GPU.

Parameters

batch (dict) – A batch of data that needs to be altered or augmented.
dataloader_idx (int) – The index of the dataloader to which the batch
belongs. –

Returns

A batch of data

Return type

dict

patch_sample(sample)[source]¶

Extract patches from single sample.

predict_dataloader()[source]¶

Return a DataLoader for prediction.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]¶

Return a DataLoader for testing.

train_dataloader()[source]¶

Return a DataLoader for training.

val_dataloader()[source]¶

Return a DataLoader for validation.

LandCover.ai¶

class torchgeo.datamodules.LandCoverAIDataModule(root_dir, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the LandCover.ai dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for LandCover.ai based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the Landcover.AI Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

on_after_batch_transfer(batch, batch_idx)[source]¶

Apply batch augmentations after batch is transferred to the device.

Parameters

batch (Dict[str, Any]) – mini-batch of data
batch_idx (int) – batch index

Returns

augmented mini-batch

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.LandCoverAI.plot().

New in version 0.2.

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image and mask
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

LoveDA¶

class torchgeo.datamodules.LoveDADataModule(root_dir, scene, batch_size=32, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the LoveDA dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, scene, batch_size=32, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for LoveDA based DataLoaders.

Parameters

root_dir (str) – The root argument to pass to LoveDA Dataset classes
scene (List[str]) – specify whether to load only ‘urban’, only ‘rural’ or both
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image and mask
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

NASA Marine Debris¶

class torchgeo.datamodules.NASAMarineDebrisDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the NASA Marine Debris dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, test_split_pct=0.2, **kwargs)[source]¶

Initialize a LightningDataModule for NASA Marine Debris based DataLoaders.

Parameters

root_dir (str) – The root argument to pass to the Dataset class
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set
test_split_pct (float) – What percentage of the dataset to use as a test set

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

OSCD¶

class torchgeo.datamodules.OSCDDataModule(root_dir, bands='all', train_batch_size=32, num_workers=0, val_split_pct=0.2, patch_size=(64, 64), num_patches_per_tile=32, pad_size=(1280, 1280), **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the OSCD dataset.

Uses the train/test splits from the dataset and further splits the train split into train/val splits.

New in version 0.2.

__init__(root_dir, bands='all', train_batch_size=32, num_workers=0, val_split_pct=0.2, patch_size=(64, 64), num_patches_per_tile=32, pad_size=(1280, 1280), **kwargs)[source]¶

Initialize a LightningDataModule for OSCD based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the OSCD Dataset classes
bands (str) – “rgb” or “all”
train_batch_size (int) – The batch size used in the train DataLoader (val_batch_size == test_batch_size == 1)
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set
patch_size (Tuple[int, int]) – Size of random patch from image and mask (height, width)
num_patches_per_tile (int) – number of random patches per sample
pad_size (Tuple[int, int]) – size to pad images to during val/test steps

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]¶

Return a DataLoader for testing.

train_dataloader()[source]¶

Return a DataLoader for training.

val_dataloader()[source]¶

Return a DataLoader for validation.

Potsdam¶

class torchgeo.datamodules.Potsdam2DDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the Potsdam2D dataset.

Uses the train/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]¶

Initialize a LightningDataModule for Potsdam2D based DataLoaders.

Parameters

root_dir (str) – The root argument to pass to the Potsdam2D Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

RESISC45¶

class torchgeo.datamodules.RESISC45DataModule(root_dir, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the RESISC45 dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for RESISC45 based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the RESISC45 Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

on_after_batch_transfer(batch, batch_idx)[source]¶

Apply batch augmentations after batch is transferred to the device.

Parameters

batch (Dict[str, Any]) – mini-batch of data
batch_idx (int) – batch index

Returns

augmented mini-batch

Return type

Dict[str, Any]

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.RESISC45.plot().

New in version 0.2.

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

SEN12MS¶

class torchgeo.datamodules.SEN12MSDataModule(root_dir, seed, band_set='all', batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the SEN12MS dataset.

Implements 80/20 geographic train/val splits and uses the test split from the classification dataset definitions. See setup() for more details.

Uses the Simplified IGBP scheme defined in the 2020 Data Fusion Competition. See https://arxiv.org/abs/2002.08254.

__init__(root_dir, seed, band_set='all', batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for SEN12MS based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the SEN12MS Dataset classes
seed (int) – The seed value to use when doing the sklearn based ShuffleSplit
band_set (str) – The subset of S1/S2 bands to use. Options are: “all”, “s1”, “s2-all”, and “s2-reduced” where the “s2-reduced” set includes: B2, B3, B4, B8, B11, and B12.
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image and mask
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

We split samples between train and val geographically with proportions of 80/20. This mimics the geographic test set split.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

So2Sat¶

class torchgeo.datamodules.So2SatDataModule(root_dir, batch_size=64, num_workers=0, bands='rgb', unsupervised_mode=False, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the So2Sat dataset.

Uses the train/val/test splits from the dataset.

__init__(root_dir, batch_size=64, num_workers=0, bands='rgb', unsupervised_mode=False, **kwargs)[source]¶

Initialize a LightningDataModule for So2Sat based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the So2Sat Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
bands (str) – Either “rgb” or “s2”
unsupervised_mode (bool) – Makes the train dataloader return imagery from the train, val, and test sets

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

Tropical Cyclone¶

class torchgeo.datamodules.CycloneDataModule(root_dir, seed, batch_size=64, num_workers=0, api_key=None, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the NASA Cyclone dataset.

Implements 80/20 train/val splits based on hurricane storm ids. See setup() for more details.

__init__(root_dir, seed, batch_size=64, num_workers=0, api_key=None, **kwargs)[source]¶

Initialize a LightningDataModule for NASA Cyclone based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the TropicalCycloneWindEstimation Datasets classes
seed (int) – The seed value to use when doing the sklearn based GroupShuffleSplit
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
api_key (Optional[str]) – The RadiantEarth MLHub API key to use if the dataset needs to be downloaded

prepare_data()[source]¶

Initialize the main Dataset objects for use in setup().

This includes optionally downloading the dataset. This is done once per node, while setup() is done once per GPU.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image and target
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Create the train/val/test splits based on the original Dataset objects.

The splits should be done here vs. in __init__() per the docs: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#setup.

We split samples between train/val by the storm_id property. I.e. all samples with the same storm_id value will be either in the train or the val split. This is important to test one type of generalizability – given a new storm, can we predict its windspeed. The test set, however, contains some storms from the training set (specifically, the latter parts of the storms) as well as some novel storms.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

UC Merced¶

class torchgeo.datamodules.UCMercedDataModule(root_dir, batch_size=64, num_workers=0, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the UC Merced dataset.

Uses random train/val/test splits.

__init__(root_dir, batch_size=64, num_workers=0, **kwargs)[source]¶

Initialize a LightningDataModule for UCMerced based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the UCMerced Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

plot(*args, **kwargs)[source]¶

Run torchgeo.datasets.UCMerced.plot().

New in version 0.2.

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

USAVars¶

class torchgeo.datamodules.USAVarsDataModule(root_dir, labels=['housing', 'income', 'roads', 'nightlights', 'population', 'elevation', 'treecover'], batch_size=64, num_workers=0)¶

Bases: LightningModule

LightningDataModule implementation for the USAVars dataset.

Uses random train/val/test splits.

New in version 0.3.

__init__(root_dir, labels=['housing', 'income', 'roads', 'nightlights', 'population', 'elevation', 'treecover'], batch_size=64, num_workers=0)[source]¶

Initialize a LightningDataModule for USAVars based DataLoaders.

Parameters

root_dir (str) – The root argument passed to the USAVars Dataset classes
labels (Sequence[str]) – The labels argument passed to the USAVars Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders

prepare_data()[source]¶

Make sure that the dataset is downloaded.

This method is only called once per run.

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – dictionary containing image
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

test_dataloader()[source]¶

Return a DataLoader for testing.

train_dataloader()[source]¶

Return a DataLoader for training.

val_dataloader()[source]¶

Return a DataLoader for validation.

Vaihingen¶

class torchgeo.datamodules.Vaihingen2DDataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the Vaihingen2D dataset.

Uses the train/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]¶

Initialize a LightningDataModule for Vaihingen2D based DataLoaders.

Parameters

root_dir (str) – The root argument to pass to the Vaihingen Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

xView2¶

class torchgeo.datamodules.XView2DataModule(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)¶

Bases: LightningDataModule

LightningDataModule implementation for the xView2 dataset.

Uses the train/val/test splits from the dataset.

New in version 0.2.

__init__(root_dir, batch_size=64, num_workers=0, val_split_pct=0.2, **kwargs)[source]¶

Initialize a LightningDataModule for xView2 based DataLoaders.

Parameters

root_dir (str) – The root arugment to pass to the xView2 Dataset classes
batch_size (int) – The batch size to use in all created DataLoaders
num_workers (int) – The number of workers to use in all created DataLoaders
val_split_pct (float) – What percentage of the dataset to use as a validation set

preprocess(sample)[source]¶

Transform a single sample from the Dataset.

Parameters: sample (Dict[str, Any]) – input image dictionary
Returns: preprocessed sample
Return type: Dict[str, Any]

setup(stage=None)[source]¶

Initialize the main Dataset objects.

This method is called once per GPU per run.

Parameters: stage (Optional[str]) – stage to set up

test_dataloader()[source]¶

Return a DataLoader for testing.

Returns: testing data loader
Return type: DataLoader[Any]

train_dataloader()[source]¶

Return a DataLoader for training.

Returns: training data loader
Return type: DataLoader[Any]

val_dataloader()[source]¶

Return a DataLoader for validation.

Returns: validation data loader
Return type: DataLoader[Any]

torchgeo.datamodules¶

Geospatial DataModules¶

Chesapeake Land Cover¶

NAIP¶

Non-geospatial DataModules¶

BigEarthNet¶

COWC¶

Deep Globe Land Cover Challenge¶

ETCI2021 Flood Detection¶

EuroSAT¶

FAIR1M¶

Inria Aerial Image Labeling¶

LandCover.ai¶

LoveDA¶

NASA Marine Debris¶

OSCD¶

Potsdam¶

RESISC45¶

SEN12MS¶

So2Sat¶

Tropical Cyclone¶

UC Merced¶

USAVars¶

Vaihingen¶

xView2¶

Docs

Tutorials

Resources