PS3 — Development of Surface AQI & Identification of HCHO Hotspots over India using Satellite Data
Part of Bharatiya Antariksh Hackathon 2026 (team The Bayesian Blend). See invite, the sibling note ps7-exoplanet-light-curves, and the decision-ps3-vs-ps7 writeup.
The one-line crux (from the ISRO session)
**Satellites measure columnar concentrations; AQI needs surface concentrations. The whole problem is converting column → surface using ML, trained/validated against CPCB ground stations, so we can map air quality even >100 km from any monitoring station** (most stations sit only inside cities).
Two objectives:
Objective 1 — Surface AQI
Convert columnar satellite measurements to surface concentrations of NO₂, O₃, PM2.5 etc., using meteorology as the bridge, then assemble a surface AQI map over India.
flowchart TD
A["INSAT-3D AOD<br/>(columnar aerosol)"] --> P
B["Sentinel-5P TROPOMI<br/>NO2 · CO · HCHO · O3 columns"] --> P
C["ERA5 / IMDAA / MERRA-2 met<br/>T · RH · wind"] --> P
P["Preprocessing<br/>spatial align · temporal sync · feature eng"] --> M
M["AI/ML model<br/>CNN · LSTM · CNN-LSTM"] --> S["Predicted SURFACE<br/>NO2 / O3 / PM2.5"]
D["CPCB ground stations<br/>(public)"] --> V
S --> V["Validate<br/>RMSE · MAE · Bias"]
V --> AQI["Surface AQI maps over India"]
Objective 2 — HCHO (formaldehyde) hotspots
Formaldehyde is a key VOC and ozone-chemistry indicator; biomass burning (crop-residue + forest fires) spikes it.
flowchart TD
H["TROPOMI / OMI / GOME-2<br/>HCHO columns"] --> X
F["MODIS / VIIRS<br/>fire counts (FIRMS)"] --> X
X["Hotspot detection<br/>thresholds / clustering"] --> R["High-res HCHO hotspot maps"]
R --> SRC["Source regions:<br/>Indo-Gangetic Plain · forest-fire zones"]
W["Wind / reanalysis"] --> T["Transport influence + fire↔HCHO correlation"]
X --> T
Data (all public — you assemble it)
- INSAT-3D AOD — MOSDAC
- Sentinel-5P / TROPOMI (NO₂, SO₂, CO, O₃, HCHO) — Earth Engine / DLR
- CPCB ground stations — CAAQM repository
- MODIS / VIIRS fire counts — FIRMS
- Reanalysis met — ERA5 / IMDAA / MERRA-2
Evaluation
- Objective 1: RMSE, R, MAE vs CPCB ground truth.
- Objective 2: hotspot detection accuracy/clarity, multi-source integration, scientific interpretation, visualisation, methodology innovation.
Why this fits me
- We already won this domain. Our SIH 2025 win at ISRO SAC was 24/48-hr surface NO₂/O₃ forecasting for Delhi — fusing reanalysis met + satellite columns (NO₂/CO/HCHO), preprocessing pipeline, RMSE/MAE/Bias vs ground stations. PS3 is the same crux (column→surface + HCHO). Reusable pipeline + judge credibility.
- Hardware: GPU-optional (CNN-LSTM or gradient boosting). The pinch is multi-source ETL across 5+ portals and raster scratch space, not compute.
- Crowd: less crowded — high domain barrier (atmospheric chemistry, messy fusion) deters casual teams. The thin field is exactly where my moat is strongest.
- Honest caveat: it's a repeat of work I've already done — safe, but less growth/novelty than ps7-exoplanet-light-curves.
Tools
Python, Google Earth Engine, xarray/rasterio, CNN-LSTM (PyTorch/TF) or XGBoost, plus my SIH air-quality pipeline as the base.