Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
Benchmarking¶
This tutorial benchmarks the performance of various sampling strategies, with and without caching.
It’s recommended to run this notebook on Google Colab if you don’t have your own GPU. Click the “Open in Colab” button above to get started.
Setup¶
First, we install TorchGeo.
[1]:
%pip install torchgeo
Imports¶
Next, we import TorchGeo and any other libraries we need.
[2]:
import os
import tempfile
import time
from typing import Tuple
from torch.utils.data import DataLoader
from torchgeo.datasets import NAIP, ChesapeakeDE
from torchgeo.datasets.utils import download_url, stack_samples
from torchgeo.samplers import RandomGeoSampler, GridGeoSampler, RandomBatchGeoSampler
Datasets¶
For this tutorial, we’ll be using imagery from the National Agriculture Imagery Program (NAIP) and labels from the Chesapeake Bay High-Resolution Land Cover Project. First, we manually download a few NAIP tiles.
[3]:
data_root = tempfile.gettempdir()
naip_root = os.path.join(data_root, "naip")
naip_url = (
"https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/"
)
tiles = [
"m_3807511_ne_18_060_20181104.tif",
"m_3807511_se_18_060_20181104.tif",
"m_3807512_nw_18_060_20180815.tif",
"m_3807512_sw_18_060_20180815.tif",
]
for tile in tiles:
download_url(naip_url + tile, naip_root)
Next, we tell TorchGeo to automatically download the corresponding Chesapeake labels.
[4]:
chesapeake_root = os.path.join(data_root, "chesapeake")
chesapeake = ChesapeakeDE(chesapeake_root, download=True)
Timing function¶
[5]:
def time_epoch(dataloader: DataLoader) -> Tuple[float, int]:
tic = time.time()
i = 0
for _ in dataloader:
i += 1
toc = time.time()
return toc - tic, i
RandomGeoSampler¶
[6]:
for cache in [False, True]:
chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)
naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)
dataset = chesapeake & naip
sampler = RandomGeoSampler(dataset, size=1000, length=888)
dataloader = DataLoader(
dataset, batch_size=12, sampler=sampler, collate_fn=stack_samples
)
duration, count = time_epoch(dataloader)
print(duration, count)
296.582683801651 74
54.20210099220276 74
GridGeoSampler¶
[7]:
for cache in [False, True]:
chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)
naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)
dataset = chesapeake & naip
sampler = GridGeoSampler(dataset, size=1000, stride=500)
dataloader = DataLoader(
dataset, batch_size=12, sampler=sampler, collate_fn=stack_samples
)
duration, count = time_epoch(dataloader)
print(duration, count)
391.90197944641113 74
118.0611424446106 74
RandomBatchGeoSampler¶
[8]:
for cache in [False, True]:
chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)
naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)
dataset = chesapeake & naip
sampler = RandomBatchGeoSampler(dataset, size=1000, batch_size=12, length=888)
dataloader = DataLoader(dataset, batch_sampler=sampler, collate_fn=stack_samples)
duration, count = time_epoch(dataloader)
print(duration, count)
230.51380324363708 74
53.99923872947693 74