Research that investigates the vertical integration between different layers of the transaction supply chain is vital in protecting Ethereum’s decentralisation and censorship-resistance. Insights from these studies inform public understanding of the externalities of MEV and can steer protocol development.

The availability of data regarding the different actors in the transaction supply chain is the foundation upon which such research stands. Dashboards like mevboost.pics, relayscan.io, and the Flashbots Transparency Dashboard furnish researchers with critical evidence to scrutinize the activities and dominance of builders and relayers. However, similar public infrastructure for understanding the searcher segment is conspicuously absent.

The permissionless and covert nature of searching renders collecting data regarding searcher activities a challenging task. Searcher teams that maintain internal searcher datasets have financial incentives to withhold this information. Builders, who may possess searcher datasets (like Titan Builder, who has done this great public research on searcher dominance), find themselves unable to disclose their datasets fully due to their sensitive position as builders.

Building an open-source searcher dataset and dashboard collapses a critical gap between private and public knowledge. With an accessible searcher dataset, we can enable more research that sheds light on searcher dominance, searcher-builder integration, and related dynamics.

In this document, we introduce our methodology and discuss findings from a two-week period. Researchers and other interested parties can then utilize this public resource to more accurately assess searcher dominance and the practical risks of vertical integration within the Ethereum ecosystem.

Searcher Dataset Methodology

This searcher dataset is comprised of two sets of searcher dataset: searchers who pursue atomic strategies (DEX-DEX arbitrage, sandwiching, and liquidation) and searchers who pursue non-atomic strategies (CEX-DEX arbitrage and a small amount of cross-chain arbitrage).

In this post, we explain the methodology behind this dataset and present a sample dataset from a recent 14-day period. We analyzed transactions from blocks 17,563,790 to 17,779,790, which took place from June 26 to July 26 2023. In total, we identified 157 atomic and 293 non-atomic MEV searchers. All relevant code can be found in this repo.

Identifying Atomic MEV Searcher Addresses

MEV searchers who pursue atomic strategies can be identified easily by their onchain footprints. Using Zeromev’s API, which employs a slight modification of Flashbots’ mev-inspect-py for MEV detection, we identify atomic MEV transactions in each block. Specifically, we collect transactions that has the mev_type of arb, frontrun, backrun, and liquid; we exclude transactions labeled as swap for now, but will return to them for non-atomic MEV analysis in the later section.

In every MEV transaction, an EOA initiates a function call to a highly gas optimized smart contract to extract MEV. The smart contract invoked, represented by the “address_to” field returned by the Zeromev API, is the searcher bot extracting MEV in that transaction.

From these addresses, we filter out known non-MEV smart contracts (such as routers, wash trading bots, telegram bots, etc) as provided by this list, which aggregates multiple lists of labels and is actively maintained by the community.

In total, we identified 157 atomic MEV searcher addresses active during this 14-day period. The result shows the list of searcher addresses, ordered by the number of transactions that each has created.

Using Zeromev’s MEV classifier helps us capture an accurate lower bound of active MEV searchers who are pursuing atomic strategies. This searcher dataset is ultimately limited by capabilities of Zeromev and mev-inspect-py, which have known issues and can miss long-tail MEV strategies that fall through its classification algorithm. Nonetheless, mev-inspect-py and Zeromev capture a robust lower bound of MEV activities from which we form the searcher dataset.

Identifying Non-Atomic MEV Searcher Addresses

A searcher dataset would not be complete without a section on non-atomic MEV searchers, which includes searchers who pursue CEX-DEX arbitrage and (minimally) cross-chain arbitrage. In “A Tale of Two Arbitrages”, it is estimated that at least “60% of [arbitrage] opportunities (by revenue) are executed via CeFi-DeFi arbitrage”.

From all the uni-directional swaps labeled by Zeromev, we identify a transaction as a CEX-DEX arbitrage if it fulfills one of the following heuristics: