Initial commit: MEV taxonomy, production pipeline, mermaid architecture, scaling notes

Made-with: Cursor
This commit is contained in:
d-bis infra
2026-04-12 18:16:20 -07:00
commit 432273773a
6 changed files with 397 additions and 0 deletions

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.DS_Store
*.swp
*~

41
README.md Normal file
View File

@@ -0,0 +1,41 @@
# MEV searcher pipeline reference
**Purpose:** Documentation-only reference for **MEV and arbitrage opportunity taxonomy** and a **typical production searcher pipeline** (data, simulation, strategy, execution, capital, latency). This repository is **not** an execution stack; it frames concepts and diagrams for engineers and operators.
**Implementation source of truth:** The in-repo MEV platform specifications and code live in the **proxmox** parent submodule **`MEV_Bot`** (Gitea `d-bis/MEV_Bot`). Start there for service boundaries, schemas, and build scope:
- `specs/README.md` — spec index and dependency order
- `specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md` — Rust services, NATS-style contracts
- `specs/SEARCH_AND_SIMULATION_SPEC.md`, `EXECUTION_BUNDLE_AND_RELAY_SPEC.md`, etc.
**This repo contains:**
| Document | Description |
|----------|-------------|
| [docs/OPPORTUNITY_TAXONOMY.md](docs/OPPORTUNITY_TAXONOMY.md) | Categories of state-transition / orderflow inefficiencies |
| [docs/PRODUCTION_PIPELINE.md](docs/PRODUCTION_PIPELINE.md) | End-to-end pipeline narrative + mapping to `MEV_Bot` services |
| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Mermaid architecture diagrams (primary diagrams live here) |
| [docs/SCALING_AND_REALITY.md](docs/SCALING_AND_REALITY.md) | What scales, competition, and practical constraints |
## Submodule (parent proxmox repo)
From the proxmox repository root:
```bash
git submodule update --init mev-searcher-pipeline-reference
```
Clone with submodule:
```bash
git clone --recurse-submodules <proxmox-url>
```
## Remote
- **HTTPS:** `https://gitea.d-bis.org/d-bis/mev-searcher-pipeline-reference.git`
- **Default branch:** `main`
## License
Internal reference material; align licensing with your org policy if you redistribute.

91
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,91 @@
# Architecture diagrams (Mermaid)
**Last Updated:** 2026-04-13
**Document Version:** 1.0
**Status:** Reference
Diagrams are **illustrative**: production systems differ by chain, relay, custody, and team policy (including exclusion of harmful MEV). For implementation-level naming, use **`MEV_Bot/specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md`** in the proxmox parent submodule.
---
## 1. Online pipeline (steady state)
```mermaid
flowchart TB
subgraph dataLayer [Data_layer]
Rpc[Rpc_nodes_archive_plus_head]
Mempool[Mempool_or_private_flow]
Indexers[Pool_indexers_and_logs]
end
subgraph core [Core_compute]
Graph[Liquidity_graph_hot_state]
Sim[Deterministic_EVM_simulation]
Strat[Strategy_and_risk_limits]
end
subgraph exec [Execution]
Bundle[Bundle_builder_signed_txs]
Relay[Relay_or_builder_auction]
Chain[Chain_settlement]
end
subgraph capital [Capital_and_ops]
Inv[Inventory_and_treasury]
Obs[Observability_and_safety]
end
Rpc --> Indexers
Mempool --> Sim
Indexers --> Graph
Graph --> Sim
Sim --> Strat
Strat --> Bundle
Bundle --> Relay
Relay --> Chain
Chain --> Inv
Strat --> Obs
Relay --> Obs
Inv --> Strat
```
**Reading order:** Data feeds refresh graph and triggers; simulation consumes graph plus pending hints; strategy gates bundles; execution competes for inclusion; settlement updates inventory; observability closes the loop.
---
## 2. Pending transaction to candidate bundle (simplified)
```mermaid
flowchart LR
Pending[Pending_tx_or_signal]
Local[Local_pre_state_S]
Post[Post_state_S_prime]
Cand[Candidate_searcher_txs]
Profit[Profit_and_gas_check]
Bundle[Bundle_payload]
Pending --> Local
Local --> Post
Post --> Cand
Cand --> Profit
Profit --> Bundle
```
This is the **backrun-shaped** view: model how someone elses transaction moves state, then evaluate whether your bundle is profitable **after** that transition (and under your ordering assumptions).
---
## 3. Typical technology map (non-prescriptive)
| Concern | Examples (illustrative only) |
|--------|------------------------------|
| RPC / chain access | Self-hosted execution client, archive node, websocket subscriptions |
| Hot state | Redis, in-memory graph, columnar snapshots for replay |
| Messaging | NATS, Kafka, or in-process channels (see `MEV_Bot` MVP: NATS) |
| Simulation | revm, Foundry-style forks, custom EVM + state DB |
| Storage | PostgreSQL for pools, runs, PnL |
| Signing | HSM, remote signer, segregated keys per role |
| Submission | Relay HTTP APIs, builder gRPC, public `eth_sendRawTransaction` |
---
## Related
- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) — narrative and `MEV_Bot` mapping
- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) — opportunity classes
- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — constraints

View File

@@ -0,0 +1,89 @@
# Opportunity taxonomy (reference)
**Last Updated:** 2026-04-13
**Document Version:** 1.0
**Status:** Reference (not legal or trading advice)
This document groups common **MEV and arbitrage** patterns as **state transition inefficiencies** in AMMs and surrounding orderflow. Wording is descriptive; viability varies by chain, fee regime, mempool visibility, and regulation.
---
## 1. Cross-DEX arbitrage (atomic)
**Idea:** The same asset pair trades at different effective prices across venues (different pools or routers).
**Execution sketch:** Buy on the relatively cheap pool and sell on the relatively expensive one within **one atomic** on-chain transaction (often via a router or custom executor).
**Typical traits:** High event frequency for liquid pairs; per-event profit often small; **very high** competition.
---
## 2. Triangular and multi-hop arbitrage
**Idea:** Mispricing along a **cycle** of pools, for example `A → B → C → A`, not necessarily visible as a two-pool spread.
**Sources:** Routing blind spots, stale aggregator paths, fee tier fragmentation, thin intermediate legs.
**Typical traits:** Medium frequency; pathfinding cost; size limited by the weakest pool on the path.
---
## 3. Backrun arbitrage
**Idea:** Profit from **another actors trade** that moves prices: observe a pending or included trade, then trade **into** the post-trade price (often immediately after, hence “backrun”).
**Dependencies:** Mempool or private-flow visibility, fast simulation, and competitive ordering.
**Typical traits:** Very high event count on public mempools; outcome depends on **ordering** and builder/relay dynamics.
---
## 4. Sandwich (front-run + back-run)
**Idea:** Place trades **before and after** a victim swap so the victim executes at a worse price; the searcher unwinds into the moved price.
**Typical traits:** Potentially large profit per victim trade; **high** execution and revert risk; **wallet protections**, builder policies, and **legal/reputational** exposure vary by jurisdiction and venue.
**Internal policy note:** Treat user-harming extraction as a **compliance and product ethics** topic, not only a technical optimization. Many teams **exclude** sandwich strategies by policy.
---
## 5. Liquidation arbitrage
**Idea:** Capture **liquidation incentives** (bonus, spread, or protocol-defined rewards) when positions become undercollateralized.
**Typical traits:** Burst-driven in volatility; can require **inventory** and gas bidding; protocol-specific rules dominate economics.
---
## 6. Cross-chain arbitrage
**Idea:** Price differences for the same economic exposure across chains.
**Constraints:** Bridging latency, reorg risk, inventory on each chain, and trust assumptions of bridges.
**Typical traits:** Medium frequency for some pairs; **capital-heavy**; operational complexity dominates.
---
## 7. Oracle and pricing lag
**Idea:** A protocols **on-chain price** lags tradable spot; actors who understand update rules may trade around the lag (within protocol constraints).
**Typical traits:** Infrequent relative to DEX arb; requires **deep protocol** knowledge; high impact when it appears.
---
## 8. AMM curve shape (convexity and fee tiers)
**Idea:** Non-linear pricing (constant product, concentrated liquidity, stable swaps) means large trades create **local** mispricings that other trades can close.
**Typical traits:** Often embedded inside other categories (backrun, triangular, cross-DEX) rather than a standalone label.
---
## Related
- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) — how these opportunities are detected and acted on in a production-shaped stack
- [ARCHITECTURE.md](ARCHITECTURE.md) — diagrams
- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — which patterns scale under competition

114
docs/PRODUCTION_PIPELINE.md Normal file
View File

@@ -0,0 +1,114 @@
# Production pipeline (reference)
**Last Updated:** 2026-04-13
**Document Version:** 1.0
**Status:** Reference
This is a **systems-level** description of how serious searchers and builders often structure work: **predict post-trade state, simulate, bid for ordering, settle**. Details differ by chain, relay, and builder market.
---
## Core loop (conceptual)
1. **Ingest** chain head state, pool reserves, and (where available) **pending** transactions or private flow.
2. **Model** candidate state transitions (e.g. apply pending tx to a local view of state).
3. **Search** for profitable routes or reactions (cycles, backruns, liquidations).
4. **Simulate** deterministically in an EVM-equivalent environment with **gas** and fee models.
5. **Decide** under risk limits (inventory, gas caps, failure modes).
6. **Package** signed transactions into **bundles** or other relay payloads.
7. **Submit** to relays, builders, or public mempool; track inclusion and PnL.
8. **Observe** failures, latency, and counterparties; feed back into strategy.
The system is not simply “find arb then send tx”; it is **compete for ordering** against others running similar loops.
---
## Layer breakdown
### Data layer
**Inputs:** RPC (archive when historical replay matters), mempool or builder streams, pool factories, logs, optional CEX/oracle feeds.
**Outputs:** Normalized pool graph edges, reserves, fees, and **candidate triggers** (new block, new pending tx, new pool).
**Typical stack (illustrative):** Self-hosted or low-latency RPC, websocket or fiber mempool feeds, indexers, Redis or in-memory hot state, PostgreSQL for metadata and analytics.
---
### Simulation engine
**Role:** Given state `S` and a candidate action, compute post-state, balances, and **net** profit after gas and protocol fees.
**Requirements:** Deterministic EVM behavior, correct token transfer semantics, gas metering, and **parallel** evaluation of many candidates.
This layer is often the **main technical moat** after raw data access.
---
### Strategy engine
**Role:** Choose **which** opportunities to pursue, size, and gas or builder tip policy under constraints (slippage, inventory, max loss, cooldowns).
**Includes:** Graph search / path enumeration, scoring, and integration with **capital** limits.
---
### Execution layer
**Role:** Turn a simulated opportunity into an **included** on-chain outcome.
**Mechanisms (chain-dependent):** Bundles via relays (e.g. Flashbots-style flows), direct builder relationships, private orderflow, or public mempool with aggressive priority fees.
**Win rate** is dominated by this layer plus simulation quality, not only “alpha” detection.
---
### Smart contract layer
**Role:** Atomic multi-hop swaps, flash loans, callbacks, and **revert-safe** accounting so failed attempts do not destroy inventory.
**Requirements:** Gas discipline, minimal external calls, clear failure modes.
---
### Capital layer
**Role:** Inventory across tokens and chains, treasury movements, and **rebalancing** after runs.
**Techniques:** Pre-funded inventory, internal netting across strategies, flash liquidity (bounded by depth and fees).
---
### Latency and placement
**Role:** Reduce time from **signal** to **submission**, and improve placement in the **auction** for blockspace.
**Knobs:** Geography, peering, NIC/kernel tuning, colocation (where allowed), direct builder APIs.
---
## Mapping to `MEV_Bot` implementation specs
The **`MEV_Bot`** submodule (see parent **proxmox** repo) names concrete services in `specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md`. Rough alignment:
| This reference layer | `MEV_Bot` service (MVP spec) |
|----------------------|----------------------------|
| Data: pools, reserves, blocks | `PoolIndexerService`, `StateIngestionService` |
| Liquidity graph | `LiquidityGraphService` |
| Opportunity search | `OpportunitySearchService` |
| Simulation | `SimulationService` |
| Strategy / risk (partially in sim + ops) | Config + scoring in `SEARCH_AND_SIMULATION_SPEC` / ops docs |
| Bundle build | `BundleBuilderService` |
| Execution / relay | `ExecutionGateway` |
| Settlement and PnL | `SettlementAnalyticsService` |
| Mempool-driven rescoring (optional) | `MempoolWatcherService` (noted as optional in spec) |
Use **`MEV_Bot/specs/README.md`** for the authoritative reading order and scope boundaries.
---
## Related
- [ARCHITECTURE.md](ARCHITECTURE.md) — Mermaid diagrams of the same pipeline
- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) — what kinds of opportunities exist
- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — competition and scaling

View File

@@ -0,0 +1,59 @@
# Scaling and competitive reality (reference)
**Last Updated:** 2026-04-13
**Document Version:** 1.0
**Status:** Reference
This compresses “what scales” and “what bites” for teams sizing MEV and arbitrage infrastructure. It is **not** a revenue forecast.
---
## What scales (qualitatively)
| Pattern | Scales under competition? | Notes |
|--------|---------------------------|--------|
| Simple two-pool DEX arb | Usually **poorly** | Commoditized; margins often near gas |
| Triangular / multi-hop | **Mixed** | Limited by weakest leg; search cost grows |
| Backruns (mempool-visible flow) | **Often better** | High frequency; outcome dominated by simulation + execution |
| Sandwiching | **Mixed** (and often **excluded** by policy) | High per-trade upside when it works; protections, reverts, and ethics matter |
| Liquidations | **Bursty** | Large per-event when volatility spikes; protocol-specific |
| Cross-chain | **Mixed** | Capital and bridge latency bound scale |
| Oracle lag | **Rare** | High impact episodes; requires deep protocol knowledge |
Top teams typically run a **multi-strategy** portfolio (detection + simulation + execution shared), rather than a single “arb bot.”
---
## Hard constraints
1. **Near zero-sum extraction** against unsophisticated flow or passive LPs: your profit is often someone elses worse execution or lower returns.
2. **Competitors** include specialized firms with dedicated data, execution, and builder relationships.
3. **Edge decays** as more capital and automation chase the same signals.
4. **Operational risk:** reorgs, relay failures, buggy adapters, key compromise, and regulatory attention (especially user-harming strategies).
---
## Detection funnel (intuition)
- **Many** raw signals per second (logs, mempool hints, block diffs).
- **Some** survive simulation as technically valid.
- **Few** remain profitable after gas, tips, and failure probability.
- **Fewer still** win the **ordering auction** against peers.
Framing the system as **winning blockspace under uncertainty** matches production more closely than “find positive arb.”
---
## Relation to implementation work
Building a **reliable** internal platform (indexing, deterministic simulation, bundle lifecycle, observability) is valuable even when **public mempool alpha** is thin: the same components support **risk management**, **internal routing**, **testing**, and **incident replay**.
For build scope and safety gates, follow **`MEV_Bot/specs/MVP_SCOPE.md`** and **`MEV_Bot/specs/OBSERVABILITY_SAFETY_AND_ROLLOUT.md`** in the proxmox parent submodule.
---
## Related
- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md)
- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md)
- [ARCHITECTURE.md](ARCHITECTURE.md)