Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
Benchmarking¶
This tutorial benchmarks the performance of various sampling strategies, with and without caching.
It’s recommended to run this notebook on Google Colab if you don’t have your own GPU. Click the “Open in Colab” button above to get started.
Setup¶
First, we install TorchGeo.
[ ]:
%pip install torchgeo
Imports¶
Next, we import TorchGeo and any other libraries we need.
[ ]:
import os
import tempfile
import time
from typing import Tuple
from torch.utils.data import DataLoader
from torchgeo.datasets import NAIP, ChesapeakeDE
from torchgeo.datasets.utils import download_url, stack_samples
from torchgeo.samplers import RandomGeoSampler, GridGeoSampler, RandomBatchGeoSampler
Datasets¶
For this tutorial, we’ll be using imagery from the National Agriculture Imagery Program (NAIP) and labels from the Chesapeake Bay High-Resolution Land Cover Project. First, we manually download a few NAIP tiles.
[ ]:
naip_root = os.path.join(tempfile.gettempdir(), "naip")
naip_url = (
"https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/"
)
tiles = [
"m_3807511_ne_18_060_20181104.tif",
"m_3807511_se_18_060_20181104.tif",
"m_3807512_nw_18_060_20180815.tif",
"m_3807512_sw_18_060_20180815.tif",
]
for tile in tiles:
download_url(naip_url + tile, naip_root)
Next, we tell TorchGeo to automatically download the corresponding Chesapeake labels.
[ ]:
chesapeake_root = os.path.join(tempfile.gettempdir(), "chesapeake")
chesapeake = ChesapeakeDE(chesapeake_root, download=True)
Timing function¶
[ ]:
def time_epoch(dataloader: DataLoader) -> Tuple[float, int]:
tic = time.time()
i = 0
for _ in dataloader:
i += 1
toc = time.time()
return toc - tic, i
The following variables can be modified to control the number of samples drawn per epoch.
[ ]:
size = 1000
length = 888
batch_size = 12
stride = 500
RandomGeoSampler¶
[ ]:
for cache in [False, True]:
chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)
naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)
dataset = chesapeake & naip
sampler = RandomGeoSampler(dataset, size=size, length=length)
dataloader = DataLoader(
dataset, batch_size=batch_size, sampler=sampler, collate_fn=stack_samples
)
duration, count = time_epoch(dataloader)
print(duration, count)
GridGeoSampler¶
[ ]:
for cache in [False, True]:
chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)
naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)
dataset = chesapeake & naip
sampler = GridGeoSampler(dataset, size=size, stride=stride)
dataloader = DataLoader(
dataset, batch_size=batch_size, sampler=sampler, collate_fn=stack_samples
)
duration, count = time_epoch(dataloader)
print(duration, count)
RandomBatchGeoSampler¶
[ ]:
for cache in [False, True]:
chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)
naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)
dataset = chesapeake & naip
sampler = RandomBatchGeoSampler(
dataset, size=size, batch_size=batch_size, length=length
)
dataloader = DataLoader(dataset, batch_sampler=sampler, collate_fn=stack_samples)
duration, count = time_epoch(dataloader)
print(duration, count)