PS7 Methods: Scientifically Validated Pipeline, Papers to Cite, Proposal-Ready Numbers
Clean, cited synthesis for the proposal. Raw dump lives in the project at
dirty.md. Companions: ps7-how-it-works, ps7-exoplanet-light-curves, ps7-data-strategy, decision-ps3-vs-ps7. Sourced from a 2-agent literature sweep, 17 peer-reviewed papers.
The headline for our proposal
The single most important finding: a feature-based gradient-boosting classifier matches deep CNNs and beats classical BLS, peer-reviewed (Malik et al. 2022, MNRAS). That is exactly our intended Mallorn-style approach, so we can cite a published result to justify it over a heavier deep-learning model. The honest counterpoint to state up front: on real, noisy TESS data even the best methods drop (Malik real-TESS precision 0.63; Osborn sim-trained CNN recall 0.61), which is why we train and validate on real labelled data and add a physics-based validation second stage.
Recommended pipeline (every stage has a citation)
- Detrend with the Tukey biweight time-windowed slider, window about 3x the maximum transit duration. Validated as the best general-purpose detrender (Hippke et al. 2019, the
wotanbenchmark). Use a spline plus Huber estimator for very active stars. Start from TESS PDCSAP flux (systematics already cotrended via CBVs). - Search for periodic transits with Transit Least Squares (Hippke and Heller 2019), which fits a limb-darkened shape rather than a box. Detection cut at SDE greater than or equal to 7 to 9. Cross-reference the SPOC convention of MES 7.1 sigma.
- Engineer features and classify with gradient boosting (LightGBM or XGBoost or CatBoost), following Malik et al. 2022. The discriminating features, consistent across the literature, are: odd-even depth difference, secondary-eclipse depth and significance, transit shape (U versus V, ingress and egress ratio), centroid offset, depth versus stellar radius, MES or SNR, period, and duration.
- Fit parameters for predicted planets with the Mandel and Agol 2002 analytic transit model via
batman(Kreidberg 2015), inside an MCMC, to recover depth, period, and duration with posterior confidence intervals. - Validate statistically with TRICERATOPS (Giacalone et al. 2021) for TESS, assigning a false-positive probability. Validate the top candidates at FPP less than 0.015 and NFPP less than 0.001.
- Quantify performance with an injection-recovery completeness grid (log spacing in period by radius-ratio, about 50 injected transits per cell). This is both our sensitivity-floor figure and a quantitative deliverable.
Proposal-ready benchmark numbers
| Claim | Number | Method | Source |
|---|---|---|---|
| Biweight detrending recovers nearly all shallow transits | 99% (Kepler), 94% (K2) of shallowest | injection-retrieval, searched with TLS | Hippke et al. 2019 |
| TLS beats BLS for small-planet recovery | TLS ~93% vs BLS ~76% | limb-darkened vs box search | Hippke and Heller 2019 |
| Standard transit-search detection threshold | SDE ~7 gives ~1% false-positive rate (use 7 to 9) | BLS/TLS periodogram | Hippke and Heller 2019; Kovacs 2002 |
| TESS SPOC detection threshold | MES = 7.1 sigma | SPOC matched-filter search | TESS SPOC / Jenkins |
| Feature-based GBDT matches deep learning, beats BLS | Kepler AUC 0.948 (recall 0.96); sim TESS AUC 0.92; real TESS acc 0.98 / recall 0.82 / precision 0.63 | 789 tsfresh features + LightGBM | Malik et al. 2022 |
| Deep CNN baseline (Kepler) | AUC 0.988, acc 96.0%, P 0.93, R 0.95 | dual global+local 1D-CNN | Shallue and Vanderburg 2018 |
| CNN on real TESS (the noise penalty) | average precision 69.3%, acc 97.8% | AstroNet-Vetting | Yu et al. 2019 |
| Simulation-trained CNN fails on real data | recall ~61% on real TESS | CNN trained on simulated TESS | Osborn et al. 2020 |
| Explainable deep classifier beats Robovetter | 0.968 / 0.974 / 0.996 vs 0.951 / 0.975 / 0.994 (P/R/acc) | multi-branch DNN | Valizadegan et al. 2022 (ExoMiner) |
| RF/GP validation, calibrated probabilities | AUC 0.999; validated 50 planets at p greater than 0.99 | RF + Gaussian Process on 38 features | Armstrong et al. 2021 |
| TESS statistical validation thresholds | FPP less than 0.015 AND NFPP less than 0.001 | Bayesian FP modelling | Giacalone et al. 2021 (TRICERATOPS) |
| Kepler/K2 validation threshold | FPP less than 0.01 (1,284 validated, 428 flagged) | population-synthesis FPP | Morton et al. 2016 (vespa) |
| batman forward-model speed (for MCMC) | 1,000,000 models in 30 s, accurate to 0.03 ppm | analytic Mandel and Agol | Kreidberg 2015 |
| Injection-recovery design | log grid period x radius, ~50 injections per cell | inject, rerun pipeline, measure recovered fraction | Christiansen et al.; TESS occurrence papers |
| Class-imbalance handling | class_weight balanced + synthetic augmentation + 10-fold CV; report PR-AUC not accuracy | weighting + calibration | Armstrong 2021; Yu 2019; Tey 2023 |
Papers to cite (bibliography)
Detection and signal processing
- Hippke, David, Kovacs, Heller 2019, "Wotan: Comprehensive Time-series Detrending in Python", AJ 158, 143. arXiv:1906.00966.
- Hippke and Heller 2019, "Transit Least Squares", A&A 623, A39. arXiv:1901.02015.
- Kovacs, Zucker, Mazeh 2002, "Box-fitting algorithm (BLS)", A&A 391, 369. arXiv:astro-ph/0206099.
- Mandel and Agol 2002, "Analytic Light Curves for Planetary Transit Searches", ApJ 580, L171. arXiv:astro-ph/0210099.
- Kreidberg 2015, "batman: BAsic Transit Model cAlculatioN", PASP 127, 1161. arXiv:1507.08285.
Machine-learning classification
- Shallue and Vanderburg 2018, "Identifying Exoplanets with Deep Learning (AstroNet)", AJ 155, 94. arXiv:1712.05044.
- Ansdell et al. 2018, "Scientific Domain Knowledge Improves Exoplanet Transit Classification", ApJL 869, L7. arXiv:1810.13434.
- Yu et al. 2019, "Identifying Exoplanets with Deep Learning III (TESS)", AJ 158, 25. arXiv:1904.02726.
- Osborn et al. 2020, "Rapid Classification of TESS Candidates with CNNs", A&A 633, A53. arXiv:1902.08544.
- Tey et al. 2023, "Identifying Exoplanets with Deep Learning V", AJ 165, 95. arXiv:2301.01371.
- Valizadegan et al. 2022, "ExoMiner", ApJ 926, 120. arXiv:2111.10009.
- Malik, Moster, Obermeier 2022, "Exoplanet Detection using Machine Learning", MNRAS 513, 5505. arXiv:2011.14135. (the key feature-based GBDT paper for our approach)
- McCauliff et al. 2015, "Automatic Classification of Kepler Candidates (Autovetter)", ApJ 806, 6. arXiv:1408.1496.
- Armstrong, Gamper, Damoulas 2021, "Exoplanet Validation with Machine Learning: 50 New Kepler Planets", MNRAS 504, 5327. arXiv:2008.10516.
- Armstrong et al. 2017, "Transit Shapes and Self-Organizing Maps", MNRAS 465, 2634. arXiv:1611.01968.
Statistical validation
- Morton et al. 2016, "False Positive Probabilities for All Kepler Objects of Interest (vespa)", ApJ 822, 86. arXiv:1605.02825.
- Giacalone et al. 2021, "Vetting K2 and TESS Planets with TRICERATOPS", AJ 161, 24. arXiv:2002.00691.
How this maps to our approach and our EDA
- Our feature-based gradient-boosting plan is published-method-backed by Malik et al. 2022 (matches deep learning, beats BLS, runs on CPU in minutes, gives interpretable importances). We cite it as the spine of the proposal.
- Our crude EDA classifier reached 0.50 (three-class) and 0.67 (planet versus rest). The literature explains the gap: we were missing the strongest features (odd-even, proper trapezoid shape, centroid) and used folded-only data. Adding the validated feature set should move us toward the Malik and Armstrong range.
- Two-stage architecture is well-precedented: fast ML triage, then physics-based validation (TRICERATOPS) at FPP less than 0.015. This is a defensible, citable design.
- Honest ceilings to acknowledge in the proposal: real-TESS precision is hard (Malik 0.63, Osborn recall 0.61 for sim-trained). Train and validate on real labelled data, report PR-AUC not accuracy, and quantify the detection floor via injection-recovery.
Caveats from the sweep
- TLS SDE statistic is chi-square based and not numerically identical to BLS SDE. Quote the safe wording: "about 10% higher detection efficiency at matched false-alarm rate."
- UMI (arXiv:2604.06602, 2026) is a newer GPU detrender, not a validated baseline. Optional.
- The Osborn recall 0.61 result is the key reason to avoid training only on simulations.