PS3 — Development of Surface AQI & Identification of HCHO Hotspots over India using Satellite Data

Part of Bharatiya Antariksh Hackathon 2026 (team The Bayesian Blend). See invite, the sibling note ps7-exoplanet-light-curves, and the decision-ps3-vs-ps7 writeup.

The one-line crux (from the ISRO session)

**Satellites measure columnar concentrations; AQI needs surface concentrations. The whole problem is converting column → surface using ML, trained/validated against CPCB ground stations, so we can map air quality even >100 km from any monitoring station** (most stations sit only inside cities).

Two objectives:

Objective 1 — Surface AQI

Convert columnar satellite measurements to surface concentrations of NO₂, O₃, PM2.5 etc., using meteorology as the bridge, then assemble a surface AQI map over India.

flowchart TD
  A["INSAT-3D AOD<br/>(columnar aerosol)"] --> P
  B["Sentinel-5P TROPOMI<br/>NO2 · CO · HCHO · O3 columns"] --> P
  C["ERA5 / IMDAA / MERRA-2 met<br/>T · RH · wind"] --> P
  P["Preprocessing<br/>spatial align · temporal sync · feature eng"] --> M
  M["AI/ML model<br/>CNN · LSTM · CNN-LSTM"] --> S["Predicted SURFACE<br/>NO2 / O3 / PM2.5"]
  D["CPCB ground stations<br/>(public)"] --> V
  S --> V["Validate<br/>RMSE · MAE · Bias"]
  V --> AQI["Surface AQI maps over India"]

Objective 2 — HCHO (formaldehyde) hotspots

Formaldehyde is a key VOC and ozone-chemistry indicator; biomass burning (crop-residue + forest fires) spikes it.

flowchart TD
  H["TROPOMI / OMI / GOME-2<br/>HCHO columns"] --> X
  F["MODIS / VIIRS<br/>fire counts (FIRMS)"] --> X
  X["Hotspot detection<br/>thresholds / clustering"] --> R["High-res HCHO hotspot maps"]
  R --> SRC["Source regions:<br/>Indo-Gangetic Plain · forest-fire zones"]
  W["Wind / reanalysis"] --> T["Transport influence + fire↔HCHO correlation"]
  X --> T

Data (all public — you assemble it)

Evaluation

Why this fits me

Tools

Python, Google Earth Engine, xarray/rasterio, CNN-LSTM (PyTorch/TF) or XGBoost, plus my SIH air-quality pipeline as the base.