S&P 500 CEO dataset
CEO Tracker
Every CEO at every S&P 500 company since 2005, sourced from SEC filings.
Coverage
500 companies in the current S&P 500. Coverage starts in 2005. Pre-2005 CEOs are anchored from the earliest available baseline filing,.
Validated on a 100-company subset against the Gentry et al. academic CEO dataset (S&P 1500, 2000-2018): 202 CEO matches, 10 date mismatches. 8 of those turned out to be errors in the academic dataset, not here.
How the data was built
Filings are filtered in stages, cheap to expensive:
- Pull every 8-K, 10-K, and DEF 14A filed by each ticker since 2005
- Sentence-level keyword and regex filter
- spaCy dependency parsing to confirm the named person is grammatically the subject of the transition verb
- Claude Haiku extracts structured records (name, role, date, predecessor, successor) from the survivors
- A tenure builder dedupes raw events into per-CEO stints, handling co-CEOs, interim-to-permanent promotions, mid-window CIK changes (Google -> Alphabet), and pre-window incumbents
- Claude Sonnet web-search verification on anything still flagged, with durable corrections written back
The web-search step has guardrails against the model writing today's date as an end date for currently-serving CEOs. The baseline pipeline is separate because the main pipeline only sees transitions, not whoever was already CEO when the 2005 window opened.
Limitations
- Coverage starts in 2005. Earlier tenures are anchored, not reconstructed.
- Some dates are flagged inferred or tentative when the filing doesn't pin them to a day. Visible on company pages.
- Web verification handles structural issues. Not every individual date is independently fact-checked.
- When a ticker has changed entities over time (mergers, spinoffs, ticker reuse), only the current entity's CEOs are tracked.