PS7 Explained: From Stellar Photons to a Classification Task

A ground-up explanation of Problem Statement 7 — AI-enabled detection of exoplanets from noisy light curves (Bharatiya Antariksh Hackathon 2026, team The Bayesian Blend). Starts at the physics of a single photon and builds, layer by layer, to the exact machine-learning problem we are solving and the classes we must separate. Companions: ps7-exoplanet-light-curves, ps7-how-it-works, ps7-methods-clean, decision-ps3-vs-ps7.


0. The one sentence version

A planet passing in front of its star blocks a sliver of light, so the star looks very slightly, very briefly, and very regularly dimmer. PS7 asks us to (a) find that periodic dimming inside noisy brightness measurements, and (b) decide what actually caused each dip — a planet, or one of several impostors that produce a similar dip for entirely different physical reasons. Part (b) is a classification task, and it is where the competition is won.


1. The raw observable: a light curve

A telescope like TESS or Kepler stares at a patch of sky and, every few minutes, counts the photons arriving from each star. For one star, plotting brightness vs. time gives a light curve.

Everything in this problem is a statement about the shape of $F(t)$.


2. Why a planet makes the star dimmer (the transit)

When a planet's orbit is aligned so that, from our viewpoint, it crosses the disk of its star, it occults a small fraction of the star's surface. This is a transit. The star is far too distant to resolve as a disk — we only see its total brightness drop, then recover.

The depth of that drop is set by simple geometry: the planet blocks the fraction of the stellar disk equal to the ratio of their projected areas.

$$\delta \;=\; \frac{\Delta F}{F} \;=\; \left(\frac{R_p}{R_\star}\right)^{2}$$

Depth carries the planet's size. It does not tell you mass — a planet and a small star can be the same physical size. That ambiguity is the seed of the whole classification problem (§6).


3. The three numbers that describe one transit

A transit is almost fully described by three observables, each tied to physics:

ObservableSymbolWhat sets itWhat it tells us
Depth$\delta$$(R_p/R_\star)^2$planet size relative to star
Duration$T_{14}$crossing geometry + orbital speedorbital distance / impact parameter
Period$P$Kepler's third laworbital distance → "year" length

Period comes from how often the dip repeats. Via Kepler's third law it fixes the orbital semi-major axis $a$:

$$P^{2} \;=\; \frac{4\pi^{2}}{G M_\star}\,a^{3}$$

Duration — the time from first to last contact — depends on how fast the planet moves ($v \propto a/P$) and on the chord it cuts across the disk (the impact parameter $b$, how centrally it crosses):

$$T_{14} \;\approx\; \frac{P}{\pi}\,\frac{R_\star}{a}\,\sqrt{1-b^{2}}$$

A central crossing ($b \approx 0$) gives the longest, flattest transit; a grazing one ($b \to 1$) gives a short, shallow, V-shaped clip of the limb. Hold that thought — grazing geometry is exactly what lets impostors mimic planets (§6).

Interactive versions of all three knobs are in ps7-exoplanet-light-curves and ps7-how-it-works — slide depth/duration/period and watch the dip change.


4. Why this is hard: noise and the needle-in-a-haystack

Two compounding difficulties turn a clean idea into a real problem:

  1. The signal is tiny and buried in noise. Photon-counting (shot) noise, stellar variability (spots, flares, pulsations), and instrument systematics (spacecraft jitter, thermal ramps, scattered light) all wobble $F(t)$ by amounts comparable to — or larger than — a real transit. A single Earth-size dip is essentially invisible in one pass.
  1. Most stars have no planet at all. A sector holds 20,000–30,000 light curves; only a few hundred show any real transit. We are searching for rare events in a sea of flat, noisy, or merely variable stars.

The detectability of a transit is captured by its signal-to-noise ratio, which grows with depth and with the number of transits observed $N_{tr}$:

$$\text{SNR} \;\approx\; \frac{\delta}{\sigma}\,\sqrt{N_{tr}\,n_{\text{in}}}$$

where $\sigma$ is the per-point scatter and $n_{\text{in}}$ the points per transit. Below SNR $\approx 7$, detection is unreliable — that is the detection floor, and it defines what any method (ours or a CNN's) physically cannot reach. (Interactive floor calculator in ps7-how-it-works §8.)

This is why the problem splits into two layers.


5. The two layers: detection, then classification

flowchart TD
    A["Raw light curve, 20000+ per sector"] --> B["Layer 1 DETECTION: detrend and BLS or TLS period search"]
    B -->|"no periodic dip"| X["Discard: flat or noise"]
    B -->|"periodic dip found"| C["Phase-fold at best period: stack transits into one clean dip"]
    C --> D["Layer 2 CLASSIFICATION: engineer shape features then GBM"]
    D --> P["Planet or transit"]
    D --> E["Eclipsing binary"]
    D --> Bl["Blend or contaminated"]
    D --> O["Other astrophysical or artefact"]

Layer 1 — Detection (an algorithm, not ML).

Layer 2 — Classification (the ML, and the prize).

The split matters: Layer 1 is well-solved classical signal processing; Layer 2 is where domain insight and good features beat a generic deep net. PS7's marks live in Layer 2.


6. The classes — what we classify, and the physics of each

This is the heart of the task. A periodic dip is necessary but not sufficient evidence of a planet, because several distinct astrophysical configurations produce a periodic dip. The official PS7 taxonomy groups them as transits, eclipses, blends, and other astrophysical categories. Here is each class, why it dims the star, and the tell-tale signature that lets us separate it.

Class 1 — Planet transit ✅ (the target)

A genuine planet crossing its star.

Class 2 — Eclipsing binary (EB) ❌ (the classic impostor)

Two stars orbiting each other; one periodically eclipses the other. This is the single most common false positive because two stars produce a huge, perfectly periodic dip.

Class 3 — Blend / contamination ❌ (the sneaky impostor)

The target's aperture also contains light from a neighbouring star — frequently a background eclipsing binary. The deep eclipse of the faint neighbour gets diluted by the bright target and arrives looking like a shallow planet transit.

Class 4 — Other astrophysical / artefacts / non-transiting ⚪

A catch-all for "periodic-ish dip that is not a transiting companion at all":

Taxonomy summary

ClassPhysical causeDecisive tells (features)
Planetplanet occults starflat-bottom U, no secondary, equal odd/even, shallow
Eclipsing binarystar occults starV-shape, secondary eclipse, odd–even mismatch, deep
Blendneighbour's EB diluted into aperturecentroid shift, diluted depth, star-param mismatch
Other / artefactspots, pulsation, instrument, noisenot cleanly periodic, aligns with cadence, low SNR

The core scientific claim of our approach: each cause leaves a different geometric fingerprint in the folded dip. Turn those fingerprints into numbers and the classes occupy different regions of feature space, where a gradient-boosted model can draw the boundary. The "do the classes separate?" scatter plot in ps7-how-it-works §5 is the single figure that proves the whole method is viable.


7. The exact ML problem statement

Stripping away the astronomy, here is what the model actually does:

The imbalance, and why it's central

In reality, planets are rare: among detected periodic dips, false positives (EBs + blends) and variables vastly outnumber genuine planets. A naive model can score high accuracy by calling everything "not a planet." So we:


8. How success is judged (and our edge)

PS7 evaluates: detection + classification robustness, accuracy of recovered transit parameters (depth/period/duration vs. expected significance), and methodology, visualisation, and clarity (a 3-page report).

Our differentiator is cross-survey generalization — train on Kepler, test cold on TESS and K2. These missions have different bandpasses, cadences, and noise, so a model that still classifies correctly across them proves it learned transit physics, not one instrument's quirks. That is exactly the robustness ISRO's curated, held-out set will demand, and it is the headline figure we are building.


9. One-paragraph recap

A transiting planet dims its star by $\left(R_p/R_\star\right)^2$ for a duration set by orbital geometry, repeating every period $P$. Layer 1 finds that periodic dip in noisy data by detrending and BLS/TLS period search, then phase-folds to amplify it. Layer 2 — the classification task that wins PS7 — reads the folded dip's shape to decide whether a planet, an eclipsing binary, a blend, or some other astrophysical/instrumental effect produced it, using survey-agnostic shape features and a gradient-boosted classifier, judged under realistic class imbalance with PR-based metrics.

Back to ps7-exoplanet-light-curves · ps7-how-it-works · ps7-methods-clean · decision-ps3-vs-ps7.