PS7 Methods: Scientifically Validated Pipeline, Papers to Cite, Proposal-Ready Numbers

Clean, cited synthesis for the proposal. Raw dump lives in the project at dirty.md. Companions: ps7-how-it-works, ps7-exoplanet-light-curves, ps7-data-strategy, decision-ps3-vs-ps7. Sourced from a 2-agent literature sweep, 17 peer-reviewed papers.

The headline for our proposal

The single most important finding: a feature-based gradient-boosting classifier matches deep CNNs and beats classical BLS, peer-reviewed (Malik et al. 2022, MNRAS). That is exactly our intended Mallorn-style approach, so we can cite a published result to justify it over a heavier deep-learning model. The honest counterpoint to state up front: on real, noisy TESS data even the best methods drop (Malik real-TESS precision 0.63; Osborn sim-trained CNN recall 0.61), which is why we train and validate on real labelled data and add a physics-based validation second stage.

Recommended pipeline (every stage has a citation)

  1. Detrend with the Tukey biweight time-windowed slider, window about 3x the maximum transit duration. Validated as the best general-purpose detrender (Hippke et al. 2019, the wotan benchmark). Use a spline plus Huber estimator for very active stars. Start from TESS PDCSAP flux (systematics already cotrended via CBVs).
  2. Search for periodic transits with Transit Least Squares (Hippke and Heller 2019), which fits a limb-darkened shape rather than a box. Detection cut at SDE greater than or equal to 7 to 9. Cross-reference the SPOC convention of MES 7.1 sigma.
  3. Engineer features and classify with gradient boosting (LightGBM or XGBoost or CatBoost), following Malik et al. 2022. The discriminating features, consistent across the literature, are: odd-even depth difference, secondary-eclipse depth and significance, transit shape (U versus V, ingress and egress ratio), centroid offset, depth versus stellar radius, MES or SNR, period, and duration.
  4. Fit parameters for predicted planets with the Mandel and Agol 2002 analytic transit model via batman (Kreidberg 2015), inside an MCMC, to recover depth, period, and duration with posterior confidence intervals.
  5. Validate statistically with TRICERATOPS (Giacalone et al. 2021) for TESS, assigning a false-positive probability. Validate the top candidates at FPP less than 0.015 and NFPP less than 0.001.
  6. Quantify performance with an injection-recovery completeness grid (log spacing in period by radius-ratio, about 50 injected transits per cell). This is both our sensitivity-floor figure and a quantitative deliverable.

Proposal-ready benchmark numbers

ClaimNumberMethodSource
Biweight detrending recovers nearly all shallow transits99% (Kepler), 94% (K2) of shallowestinjection-retrieval, searched with TLSHippke et al. 2019
TLS beats BLS for small-planet recoveryTLS ~93% vs BLS ~76%limb-darkened vs box searchHippke and Heller 2019
Standard transit-search detection thresholdSDE ~7 gives ~1% false-positive rate (use 7 to 9)BLS/TLS periodogramHippke and Heller 2019; Kovacs 2002
TESS SPOC detection thresholdMES = 7.1 sigmaSPOC matched-filter searchTESS SPOC / Jenkins
Feature-based GBDT matches deep learning, beats BLSKepler AUC 0.948 (recall 0.96); sim TESS AUC 0.92; real TESS acc 0.98 / recall 0.82 / precision 0.63789 tsfresh features + LightGBMMalik et al. 2022
Deep CNN baseline (Kepler)AUC 0.988, acc 96.0%, P 0.93, R 0.95dual global+local 1D-CNNShallue and Vanderburg 2018
CNN on real TESS (the noise penalty)average precision 69.3%, acc 97.8%AstroNet-VettingYu et al. 2019
Simulation-trained CNN fails on real datarecall ~61% on real TESSCNN trained on simulated TESSOsborn et al. 2020
Explainable deep classifier beats Robovetter0.968 / 0.974 / 0.996 vs 0.951 / 0.975 / 0.994 (P/R/acc)multi-branch DNNValizadegan et al. 2022 (ExoMiner)
RF/GP validation, calibrated probabilitiesAUC 0.999; validated 50 planets at p greater than 0.99RF + Gaussian Process on 38 featuresArmstrong et al. 2021
TESS statistical validation thresholdsFPP less than 0.015 AND NFPP less than 0.001Bayesian FP modellingGiacalone et al. 2021 (TRICERATOPS)
Kepler/K2 validation thresholdFPP less than 0.01 (1,284 validated, 428 flagged)population-synthesis FPPMorton et al. 2016 (vespa)
batman forward-model speed (for MCMC)1,000,000 models in 30 s, accurate to 0.03 ppmanalytic Mandel and AgolKreidberg 2015
Injection-recovery designlog grid period x radius, ~50 injections per cellinject, rerun pipeline, measure recovered fractionChristiansen et al.; TESS occurrence papers
Class-imbalance handlingclass_weight balanced + synthetic augmentation + 10-fold CV; report PR-AUC not accuracyweighting + calibrationArmstrong 2021; Yu 2019; Tey 2023

Papers to cite (bibliography)

Detection and signal processing

Machine-learning classification

Statistical validation

How this maps to our approach and our EDA

Caveats from the sweep