PS7 Methods: Scientifically Validated Pipeline, Papers to Cite, Proposal-Ready Numbers

Clean, cited synthesis for the proposal. Raw dump lives in the project at dirty.md. Companions: ps7-how-it-works, ps7-exoplanet-light-curves, ps7-data-strategy, decision-ps3-vs-ps7. Sourced from a 2-agent literature sweep, 17 peer-reviewed papers.

The headline for our proposal

The single most important finding: a feature-based gradient-boosting classifier matches deep CNNs and beats classical BLS, peer-reviewed (Malik et al. 2022, MNRAS). That is exactly our intended Mallorn-style approach, so we can cite a published result to justify it over a heavier deep-learning model. The honest counterpoint to state up front: on real, noisy TESS data even the best methods drop (Malik real-TESS precision 0.63; Osborn sim-trained CNN recall 0.61), which is why we train and validate on real labelled data and add a physics-based validation second stage.

Recommended pipeline (every stage has a citation)

Detrend with the Tukey biweight time-windowed slider, window about 3x the maximum transit duration. Validated as the best general-purpose detrender (Hippke et al. 2019, the wotan benchmark). Use a spline plus Huber estimator for very active stars. Start from TESS PDCSAP flux (systematics already cotrended via CBVs).
Search for periodic transits with Transit Least Squares (Hippke and Heller 2019), which fits a limb-darkened shape rather than a box. Detection cut at SDE greater than or equal to 7 to 9. Cross-reference the SPOC convention of MES 7.1 sigma.
Engineer features and classify with gradient boosting (LightGBM or XGBoost or CatBoost), following Malik et al. 2022. The discriminating features, consistent across the literature, are: odd-even depth difference, secondary-eclipse depth and significance, transit shape (U versus V, ingress and egress ratio), centroid offset, depth versus stellar radius, MES or SNR, period, and duration.
Fit parameters for predicted planets with the Mandel and Agol 2002 analytic transit model via batman (Kreidberg 2015), inside an MCMC, to recover depth, period, and duration with posterior confidence intervals.
Validate statistically with TRICERATOPS (Giacalone et al. 2021) for TESS, assigning a false-positive probability. Validate the top candidates at FPP less than 0.015 and NFPP less than 0.001.
Quantify performance with an injection-recovery completeness grid (log spacing in period by radius-ratio, about 50 injected transits per cell). This is both our sensitivity-floor figure and a quantitative deliverable.

Proposal-ready benchmark numbers

Claim	Number	Method	Source
Biweight detrending recovers nearly all shallow transits	99% (Kepler), 94% (K2) of shallowest	injection-retrieval, searched with TLS	Hippke et al. 2019
TLS beats BLS for small-planet recovery	TLS ~93% vs BLS ~76%	limb-darkened vs box search	Hippke and Heller 2019
Standard transit-search detection threshold	SDE ~7 gives ~1% false-positive rate (use 7 to 9)	BLS/TLS periodogram	Hippke and Heller 2019; Kovacs 2002
TESS SPOC detection threshold	MES = 7.1 sigma	SPOC matched-filter search	TESS SPOC / Jenkins
Feature-based GBDT matches deep learning, beats BLS	Kepler AUC 0.948 (recall 0.96); sim TESS AUC 0.92; real TESS acc 0.98 / recall 0.82 / precision 0.63	789 tsfresh features + LightGBM	Malik et al. 2022
Deep CNN baseline (Kepler)	AUC 0.988, acc 96.0%, P 0.93, R 0.95	dual global+local 1D-CNN	Shallue and Vanderburg 2018
CNN on real TESS (the noise penalty)	average precision 69.3%, acc 97.8%	AstroNet-Vetting	Yu et al. 2019
Simulation-trained CNN fails on real data	recall ~61% on real TESS	CNN trained on simulated TESS	Osborn et al. 2020
Explainable deep classifier beats Robovetter	0.968 / 0.974 / 0.996 vs 0.951 / 0.975 / 0.994 (P/R/acc)	multi-branch DNN	Valizadegan et al. 2022 (ExoMiner)
RF/GP validation, calibrated probabilities	AUC 0.999; validated 50 planets at p greater than 0.99	RF + Gaussian Process on 38 features	Armstrong et al. 2021
TESS statistical validation thresholds	FPP less than 0.015 AND NFPP less than 0.001	Bayesian FP modelling	Giacalone et al. 2021 (TRICERATOPS)
Kepler/K2 validation threshold	FPP less than 0.01 (1,284 validated, 428 flagged)	population-synthesis FPP	Morton et al. 2016 (vespa)
batman forward-model speed (for MCMC)	1,000,000 models in 30 s, accurate to 0.03 ppm	analytic Mandel and Agol	Kreidberg 2015
Injection-recovery design	log grid period x radius, ~50 injections per cell	inject, rerun pipeline, measure recovered fraction	Christiansen et al.; TESS occurrence papers
Class-imbalance handling	class_weight balanced + synthetic augmentation + 10-fold CV; report PR-AUC not accuracy	weighting + calibration	Armstrong 2021; Yu 2019; Tey 2023

Papers to cite (bibliography)

Detection and signal processing

Hippke, David, Kovacs, Heller 2019, "Wotan: Comprehensive Time-series Detrending in Python", AJ 158, 143. arXiv:1906.00966.
Hippke and Heller 2019, "Transit Least Squares", A&A 623, A39. arXiv:1901.02015.
Kovacs, Zucker, Mazeh 2002, "Box-fitting algorithm (BLS)", A&A 391, 369. arXiv:astro-ph/0206099.
Mandel and Agol 2002, "Analytic Light Curves for Planetary Transit Searches", ApJ 580, L171. arXiv:astro-ph/0210099.
Kreidberg 2015, "batman: BAsic Transit Model cAlculatioN", PASP 127, 1161. arXiv:1507.08285.

Machine-learning classification

Shallue and Vanderburg 2018, "Identifying Exoplanets with Deep Learning (AstroNet)", AJ 155, 94. arXiv:1712.05044.
Ansdell et al. 2018, "Scientific Domain Knowledge Improves Exoplanet Transit Classification", ApJL 869, L7. arXiv:1810.13434.
Yu et al. 2019, "Identifying Exoplanets with Deep Learning III (TESS)", AJ 158, 25. arXiv:1904.02726.
Osborn et al. 2020, "Rapid Classification of TESS Candidates with CNNs", A&A 633, A53. arXiv:1902.08544.
Tey et al. 2023, "Identifying Exoplanets with Deep Learning V", AJ 165, 95. arXiv:2301.01371.
Valizadegan et al. 2022, "ExoMiner", ApJ 926, 120. arXiv:2111.10009.
Malik, Moster, Obermeier 2022, "Exoplanet Detection using Machine Learning", MNRAS 513, 5505. arXiv:2011.14135. (the key feature-based GBDT paper for our approach)
McCauliff et al. 2015, "Automatic Classification of Kepler Candidates (Autovetter)", ApJ 806, 6. arXiv:1408.1496.
Armstrong, Gamper, Damoulas 2021, "Exoplanet Validation with Machine Learning: 50 New Kepler Planets", MNRAS 504, 5327. arXiv:2008.10516.
Armstrong et al. 2017, "Transit Shapes and Self-Organizing Maps", MNRAS 465, 2634. arXiv:1611.01968.

Statistical validation

Morton et al. 2016, "False Positive Probabilities for All Kepler Objects of Interest (vespa)", ApJ 822, 86. arXiv:1605.02825.
Giacalone et al. 2021, "Vetting K2 and TESS Planets with TRICERATOPS", AJ 161, 24. arXiv:2002.00691.

How this maps to our approach and our EDA

Our feature-based gradient-boosting plan is published-method-backed by Malik et al. 2022 (matches deep learning, beats BLS, runs on CPU in minutes, gives interpretable importances). We cite it as the spine of the proposal.
Our crude EDA classifier reached 0.50 (three-class) and 0.67 (planet versus rest). The literature explains the gap: we were missing the strongest features (odd-even, proper trapezoid shape, centroid) and used folded-only data. Adding the validated feature set should move us toward the Malik and Armstrong range.
Two-stage architecture is well-precedented: fast ML triage, then physics-based validation (TRICERATOPS) at FPP less than 0.015. This is a defensible, citable design.
Honest ceilings to acknowledge in the proposal: real-TESS precision is hard (Malik 0.63, Osborn recall 0.61 for sim-trained). Train and validate on real labelled data, report PR-AUC not accuracy, and quantify the detection floor via injection-recovery.

Caveats from the sweep

TLS SDE statistic is chi-square based and not numerically identical to BLS SDE. Quote the safe wording: "about 10% higher detection efficiency at matched false-alarm rate."
UMI (arXiv:2604.06602, 2026) is a newer GPU detrender, not a validated baseline. Optional.
The Osborn recall 0.61 result is the key reason to avoid training only on simulations.