S&P 500 CEO dataset

CEO Tracker

Every CEO at every S&P 500 company since 2005, sourced from SEC filings.

Coverage

500 companies in the current S&P 500. Coverage starts in 2005. Pre-2005 CEOs are anchored from the earliest available baseline filing,.

Validated on a 100-company subset against the Gentry et al. academic CEO dataset (S&P 1500, 2000-2018): 202 CEO matches, 10 date mismatches. 8 of those turned out to be errors in the academic dataset, not here.

How the data was built

Filings are filtered in stages, cheap to expensive:

  1. Pull every 8-K, 10-K, and DEF 14A filed by each ticker since 2005
  2. Sentence-level keyword and regex filter
  3. spaCy dependency parsing to confirm the named person is grammatically the subject of the transition verb
  4. Claude Haiku extracts structured records (name, role, date, predecessor, successor) from the survivors
  5. A tenure builder dedupes raw events into per-CEO stints, handling co-CEOs, interim-to-permanent promotions, mid-window CIK changes (Google -> Alphabet), and pre-window incumbents
  6. Claude Sonnet web-search verification on anything still flagged, with durable corrections written back

The web-search step has guardrails against the model writing today's date as an end date for currently-serving CEOs. The baseline pipeline is separate because the main pipeline only sees transitions, not whoever was already CEO when the 2005 window opened.

Limitations

  • Coverage starts in 2005. Earlier tenures are anchored, not reconstructed.
  • Some dates are flagged inferred or tentative when the filing doesn't pin them to a day. Visible on company pages.
  • Web verification handles structural issues. Not every individual date is independently fact-checked.
  • When a ticker has changed entities over time (mergers, spinoffs, ticker reuse), only the current entity's CEOs are tracked.