Learn to process satellite images with just a few lines of code without downloading it
Introduction
In the STOP and STAC part 1 article, we introduced the concept of STAC (Spatio Temporal Asset Catalog) and discussed an easier way to find and access satellite images without the hassle of downloading huge files.
In Part 2, we introduce some new, cool tools and analyses that make working with satellite images easy and fun. We will access Sentinel-2 images using STAC and Open Data Cube, visualize them as RBG, calculate spectral indices (NDVI and NDWI) over a time series, and export the results.
Let's begin by exploring how STAC and Open Data Cube (ODC) work together and how the odc-stac Python library makes it all happen.
Integrating STAC with Open Data Cube
Remember, STAC helps us find the right images but doesn’t hold the data itself. Now, we’re ready to explore how to bring these images to life and work with them effectively. This is where our new tools come into play: Open Data Cube (ODC), Xarray, and Dask.
As a quick recap, think of STAC as a sophisticated map that points you to the exact location of the data you need across the cloud. Imagine it as the index at the back of a detailed atlas, guiding you to the treasure but not being the treasure itself. If you need a more comprehensive recap on STAC, check this article.
If STAC is the map, then ODC is the treasure chest, unlocking the true value of the data STAC points to. ODC is an open-source solution for accessing, managing, and analyzing large quantities of geospatial data. At its core, the ODC is a set of Python libraries and PostgreSQL database that helps you work with geospatial raster data. It organizes this data efficiently, making it readily accessible for analysis and visualization.
The bridge between STAC and ODC is the odc-stac python library. This library allows you to use STAC to find and describe the data you're interested in and then hands off that description to ODC, which takes care of the heavy lifting in accessing and analyzing the data. This integration means you can easily go from discovering data with STAC to doing meaningful ODC analysis within a streamlined, efficient workflow.
But we still need to handle the accessed data; that's where Xarray and Dask shine!
Xarray and Dask
Xarray steps in as our brilliant assistant to handle this data.
Xarray is a powerful library designed to handle multidimensional arrays with labels. It supports operations on data arrays that include dimensions such as time, latitude, longitude, and spectral bands while maintaining metadata and coordinate systems. This makes it especially useful for managing satellite imagery and other types of geospatial data. By integrating seamlessly with libraries like rioxarray, it adds essential raster functionality, enabling effective spatial operations and transformations directly applicable to remote sensing data. Xarray's ability to handle multiple spectral bands and time series data in a single structure greatly enhances time series analysis, change detection studies, and environmental monitoring workflows.
Then there’s Dask, the powerhouse that helps us work with big datasets without slowing down.
Dask is a flexible library for parallel computing in Python, crucial for working with large datasets typical in remote sensing. It allows for scaling up to larger data sets than would fit in a computer's memory, breaking them into smaller blocks and processing them in parallel. This means you can analyze large swathes of satellite imagery without being restricted by the physical memory limits of your computer. This capability is key for efficiently handling extensive satellite data arrays and speeding up computations like image processing and time series analysis.
Let's Code!
Now, let's dive into the practical part. We will bring satellite images to life, visualize their hidden details, and uncover the stories they tell about our planet. Here's how:
Load Satellite Imagery with odc-stac
Before we fetch our satellite imagery using the odc-stac library, let's revisit some of the code from Part 1, where we interacted with STAC to query the data.
With our query set up, we can now access our STAC items to gather important details about our data, such as spectral band names, pixel resolution, and other relevant metadata. These details will serve as input parameters for the odc-stac tool.
Explaining our inputs:
items - The STAC items to load. These should be a collection of STAC items representing satellite imagery.
bands - List of band names or identifiers to retrieve from each STAC item.
crs - Coordinate Reference System to which all the data should be reprojected.
resolution - Spatial resolution to which the imagery should be resampled.
chunks - A dictionary specifying how to chunk the data for Dask. Empty means use default chunking strategy.
groupby - Grouping strategy for data. 'solar_day' groups data by each day.
bbox - Spatial bounding box to filter the loaded imagery by geographical area.
This is what our output dataset looks like:
Once loaded, our data is transformed into a Dask Xarray dataset, ready for processing. But first, let’s get a visual by creating and viewing an RGB composite image. This will give us a preliminary look at the data we're about to dive deeper into.
Create an RGB Composite Image
Now, let’s use common Python libraries such as numpy and matplotlib to create an RGB composite image of our area.
We start by stacking the Red, Green, and Blue bands using NumPy's dstack function, selecting a specific time slice with xarray's isel, and extracting the data values. Next, we normalize these values to enhance image contrast, ensuring all pixel values fall within the optimal range [0, 1] for display. Finally, we visualize our composed image with Matplotlib.
Practical Applications: Calculating Spectral Indices - NDVI and NDWI
Spectral indices are powerful tools in remote sensing. They provide insights into various environmental conditions by highlighting specific features in satellite imagery. Two of the most commonly used indices are the Normalized Difference Vegetation Index (NDVI) (Rouse, 1994), used for assessing vegetation health, and the Normalized Difference Water Index (NDWI) (McFetters, 1996), used to monitor moisture content and the presence of water in landscapes.
A great thing about working with Xarray is that you can calculate your spectral indices for the whole time series with one line of code! Here's how:
Visualizing and Exporting results as TIFF
To visualize our results, we need to select a given time step and remember that we have calculated the spectral indices for the entire time series. Let's keep using index = 3.
After visualizing your indices, you might want to save your results for further analysis or to share them with others. With rioxarray, exporting your data to a TIFF file is straightforward. Let's save our NDVI calculation as a TIFF.
This process creates a geospatially referenced TIFF file of your NDVI calculation, which most GIS software can easily open and analyse.
Final Thoughts
And just like that, we've accessed and processed complex satellite data with a few lines of code! I hope this two-part article has shown how accessible and powerful modern remote sensing can be. Whether you're monitoring ecosystems or planning urban developments, remember that great insights often start with simple curiosity and the right tools. Keep exploring, and let each dataset lead you to new discoveries. Happy coding, and see you soon!
As per usual, here is the full code for this article:
References
Rouse, J.W., R.H. Haas, J.A. Scheel, and D.W. Deering. Monitoring vegetation systems in the Great Plains with ERTS. Proceedings, 3rd Earth Resource Technology Satellite (ERTS) Symposium 1974, 1, 48–62.
McFeeters, S.K. The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features. International Journal of Remote Sensing 1996 17, 1425-1432.
Commentaires