From 432273773ac46efa96cd6764726ebd5ead7bba92 Mon Sep 17 00:00:00 2001 From: d-bis infra Date: Sun, 12 Apr 2026 18:16:20 -0700 Subject: [PATCH] Initial commit: MEV taxonomy, production pipeline, mermaid architecture, scaling notes Made-with: Cursor --- .gitignore | 3 + README.md | 41 +++++++++++++ docs/ARCHITECTURE.md | 91 ++++++++++++++++++++++++++++ docs/OPPORTUNITY_TAXONOMY.md | 89 +++++++++++++++++++++++++++ docs/PRODUCTION_PIPELINE.md | 114 +++++++++++++++++++++++++++++++++++ docs/SCALING_AND_REALITY.md | 59 ++++++++++++++++++ 6 files changed, 397 insertions(+) create mode 100644 .gitignore create mode 100644 README.md create mode 100644 docs/ARCHITECTURE.md create mode 100644 docs/OPPORTUNITY_TAXONOMY.md create mode 100644 docs/PRODUCTION_PIPELINE.md create mode 100644 docs/SCALING_AND_REALITY.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..239beea --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +.DS_Store +*.swp +*~ diff --git a/README.md b/README.md new file mode 100644 index 0000000..1aa7e3c --- /dev/null +++ b/README.md @@ -0,0 +1,41 @@ +# MEV searcher pipeline reference + +**Purpose:** Documentation-only reference for **MEV and arbitrage opportunity taxonomy** and a **typical production searcher pipeline** (data, simulation, strategy, execution, capital, latency). This repository is **not** an execution stack; it frames concepts and diagrams for engineers and operators. + +**Implementation source of truth:** The in-repo MEV platform specifications and code live in the **proxmox** parent submodule **`MEV_Bot`** (Gitea `d-bis/MEV_Bot`). Start there for service boundaries, schemas, and build scope: + +- `specs/README.md` — spec index and dependency order +- `specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md` — Rust services, NATS-style contracts +- `specs/SEARCH_AND_SIMULATION_SPEC.md`, `EXECUTION_BUNDLE_AND_RELAY_SPEC.md`, etc. + +**This repo contains:** + +| Document | Description | +|----------|-------------| +| [docs/OPPORTUNITY_TAXONOMY.md](docs/OPPORTUNITY_TAXONOMY.md) | Categories of state-transition / orderflow inefficiencies | +| [docs/PRODUCTION_PIPELINE.md](docs/PRODUCTION_PIPELINE.md) | End-to-end pipeline narrative + mapping to `MEV_Bot` services | +| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Mermaid architecture diagrams (primary diagrams live here) | +| [docs/SCALING_AND_REALITY.md](docs/SCALING_AND_REALITY.md) | What scales, competition, and practical constraints | + +## Submodule (parent proxmox repo) + +From the proxmox repository root: + +```bash +git submodule update --init mev-searcher-pipeline-reference +``` + +Clone with submodule: + +```bash +git clone --recurse-submodules +``` + +## Remote + +- **HTTPS:** `https://gitea.d-bis.org/d-bis/mev-searcher-pipeline-reference.git` +- **Default branch:** `main` + +## License + +Internal reference material; align licensing with your org policy if you redistribute. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..5d7990b --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,91 @@ +# Architecture diagrams (Mermaid) + +**Last Updated:** 2026-04-13 +**Document Version:** 1.0 +**Status:** Reference + +Diagrams are **illustrative**: production systems differ by chain, relay, custody, and team policy (including exclusion of harmful MEV). For implementation-level naming, use **`MEV_Bot/specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md`** in the proxmox parent submodule. + +--- + +## 1. Online pipeline (steady state) + +```mermaid +flowchart TB + subgraph dataLayer [Data_layer] + Rpc[Rpc_nodes_archive_plus_head] + Mempool[Mempool_or_private_flow] + Indexers[Pool_indexers_and_logs] + end + subgraph core [Core_compute] + Graph[Liquidity_graph_hot_state] + Sim[Deterministic_EVM_simulation] + Strat[Strategy_and_risk_limits] + end + subgraph exec [Execution] + Bundle[Bundle_builder_signed_txs] + Relay[Relay_or_builder_auction] + Chain[Chain_settlement] + end + subgraph capital [Capital_and_ops] + Inv[Inventory_and_treasury] + Obs[Observability_and_safety] + end + Rpc --> Indexers + Mempool --> Sim + Indexers --> Graph + Graph --> Sim + Sim --> Strat + Strat --> Bundle + Bundle --> Relay + Relay --> Chain + Chain --> Inv + Strat --> Obs + Relay --> Obs + Inv --> Strat +``` + +**Reading order:** Data feeds refresh graph and triggers; simulation consumes graph plus pending hints; strategy gates bundles; execution competes for inclusion; settlement updates inventory; observability closes the loop. + +--- + +## 2. Pending transaction to candidate bundle (simplified) + +```mermaid +flowchart LR + Pending[Pending_tx_or_signal] + Local[Local_pre_state_S] + Post[Post_state_S_prime] + Cand[Candidate_searcher_txs] + Profit[Profit_and_gas_check] + Bundle[Bundle_payload] + Pending --> Local + Local --> Post + Post --> Cand + Cand --> Profit + Profit --> Bundle +``` + +This is the **backrun-shaped** view: model how someone else’s transaction moves state, then evaluate whether your bundle is profitable **after** that transition (and under your ordering assumptions). + +--- + +## 3. Typical technology map (non-prescriptive) + +| Concern | Examples (illustrative only) | +|--------|------------------------------| +| RPC / chain access | Self-hosted execution client, archive node, websocket subscriptions | +| Hot state | Redis, in-memory graph, columnar snapshots for replay | +| Messaging | NATS, Kafka, or in-process channels (see `MEV_Bot` MVP: NATS) | +| Simulation | revm, Foundry-style forks, custom EVM + state DB | +| Storage | PostgreSQL for pools, runs, PnL | +| Signing | HSM, remote signer, segregated keys per role | +| Submission | Relay HTTP APIs, builder gRPC, public `eth_sendRawTransaction` | + +--- + +## Related + +- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) — narrative and `MEV_Bot` mapping +- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) — opportunity classes +- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — constraints diff --git a/docs/OPPORTUNITY_TAXONOMY.md b/docs/OPPORTUNITY_TAXONOMY.md new file mode 100644 index 0000000..80069f7 --- /dev/null +++ b/docs/OPPORTUNITY_TAXONOMY.md @@ -0,0 +1,89 @@ +# Opportunity taxonomy (reference) + +**Last Updated:** 2026-04-13 +**Document Version:** 1.0 +**Status:** Reference (not legal or trading advice) + +This document groups common **MEV and arbitrage** patterns as **state transition inefficiencies** in AMMs and surrounding orderflow. Wording is descriptive; viability varies by chain, fee regime, mempool visibility, and regulation. + +--- + +## 1. Cross-DEX arbitrage (atomic) + +**Idea:** The same asset pair trades at different effective prices across venues (different pools or routers). + +**Execution sketch:** Buy on the relatively cheap pool and sell on the relatively expensive one within **one atomic** on-chain transaction (often via a router or custom executor). + +**Typical traits:** High event frequency for liquid pairs; per-event profit often small; **very high** competition. + +--- + +## 2. Triangular and multi-hop arbitrage + +**Idea:** Mispricing along a **cycle** of pools, for example `A → B → C → A`, not necessarily visible as a two-pool spread. + +**Sources:** Routing blind spots, stale aggregator paths, fee tier fragmentation, thin intermediate legs. + +**Typical traits:** Medium frequency; pathfinding cost; size limited by the weakest pool on the path. + +--- + +## 3. Backrun arbitrage + +**Idea:** Profit from **another actor’s trade** that moves prices: observe a pending or included trade, then trade **into** the post-trade price (often immediately after, hence “backrun”). + +**Dependencies:** Mempool or private-flow visibility, fast simulation, and competitive ordering. + +**Typical traits:** Very high event count on public mempools; outcome depends on **ordering** and builder/relay dynamics. + +--- + +## 4. Sandwich (front-run + back-run) + +**Idea:** Place trades **before and after** a victim swap so the victim executes at a worse price; the searcher unwinds into the moved price. + +**Typical traits:** Potentially large profit per victim trade; **high** execution and revert risk; **wallet protections**, builder policies, and **legal/reputational** exposure vary by jurisdiction and venue. + +**Internal policy note:** Treat user-harming extraction as a **compliance and product ethics** topic, not only a technical optimization. Many teams **exclude** sandwich strategies by policy. + +--- + +## 5. Liquidation arbitrage + +**Idea:** Capture **liquidation incentives** (bonus, spread, or protocol-defined rewards) when positions become undercollateralized. + +**Typical traits:** Burst-driven in volatility; can require **inventory** and gas bidding; protocol-specific rules dominate economics. + +--- + +## 6. Cross-chain arbitrage + +**Idea:** Price differences for the same economic exposure across chains. + +**Constraints:** Bridging latency, reorg risk, inventory on each chain, and trust assumptions of bridges. + +**Typical traits:** Medium frequency for some pairs; **capital-heavy**; operational complexity dominates. + +--- + +## 7. Oracle and pricing lag + +**Idea:** A protocol’s **on-chain price** lags tradable spot; actors who understand update rules may trade around the lag (within protocol constraints). + +**Typical traits:** Infrequent relative to DEX arb; requires **deep protocol** knowledge; high impact when it appears. + +--- + +## 8. AMM curve shape (convexity and fee tiers) + +**Idea:** Non-linear pricing (constant product, concentrated liquidity, stable swaps) means large trades create **local** mispricings that other trades can close. + +**Typical traits:** Often embedded inside other categories (backrun, triangular, cross-DEX) rather than a standalone label. + +--- + +## Related + +- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) — how these opportunities are detected and acted on in a production-shaped stack +- [ARCHITECTURE.md](ARCHITECTURE.md) — diagrams +- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — which patterns scale under competition diff --git a/docs/PRODUCTION_PIPELINE.md b/docs/PRODUCTION_PIPELINE.md new file mode 100644 index 0000000..3ee81d2 --- /dev/null +++ b/docs/PRODUCTION_PIPELINE.md @@ -0,0 +1,114 @@ +# Production pipeline (reference) + +**Last Updated:** 2026-04-13 +**Document Version:** 1.0 +**Status:** Reference + +This is a **systems-level** description of how serious searchers and builders often structure work: **predict post-trade state, simulate, bid for ordering, settle**. Details differ by chain, relay, and builder market. + +--- + +## Core loop (conceptual) + +1. **Ingest** chain head state, pool reserves, and (where available) **pending** transactions or private flow. +2. **Model** candidate state transitions (e.g. apply pending tx to a local view of state). +3. **Search** for profitable routes or reactions (cycles, backruns, liquidations). +4. **Simulate** deterministically in an EVM-equivalent environment with **gas** and fee models. +5. **Decide** under risk limits (inventory, gas caps, failure modes). +6. **Package** signed transactions into **bundles** or other relay payloads. +7. **Submit** to relays, builders, or public mempool; track inclusion and PnL. +8. **Observe** failures, latency, and counterparties; feed back into strategy. + +The system is not simply “find arb then send tx”; it is **compete for ordering** against others running similar loops. + +--- + +## Layer breakdown + +### Data layer + +**Inputs:** RPC (archive when historical replay matters), mempool or builder streams, pool factories, logs, optional CEX/oracle feeds. + +**Outputs:** Normalized pool graph edges, reserves, fees, and **candidate triggers** (new block, new pending tx, new pool). + +**Typical stack (illustrative):** Self-hosted or low-latency RPC, websocket or fiber mempool feeds, indexers, Redis or in-memory hot state, PostgreSQL for metadata and analytics. + +--- + +### Simulation engine + +**Role:** Given state `S` and a candidate action, compute post-state, balances, and **net** profit after gas and protocol fees. + +**Requirements:** Deterministic EVM behavior, correct token transfer semantics, gas metering, and **parallel** evaluation of many candidates. + +This layer is often the **main technical moat** after raw data access. + +--- + +### Strategy engine + +**Role:** Choose **which** opportunities to pursue, size, and gas or builder tip policy under constraints (slippage, inventory, max loss, cooldowns). + +**Includes:** Graph search / path enumeration, scoring, and integration with **capital** limits. + +--- + +### Execution layer + +**Role:** Turn a simulated opportunity into an **included** on-chain outcome. + +**Mechanisms (chain-dependent):** Bundles via relays (e.g. Flashbots-style flows), direct builder relationships, private orderflow, or public mempool with aggressive priority fees. + +**Win rate** is dominated by this layer plus simulation quality, not only “alpha” detection. + +--- + +### Smart contract layer + +**Role:** Atomic multi-hop swaps, flash loans, callbacks, and **revert-safe** accounting so failed attempts do not destroy inventory. + +**Requirements:** Gas discipline, minimal external calls, clear failure modes. + +--- + +### Capital layer + +**Role:** Inventory across tokens and chains, treasury movements, and **rebalancing** after runs. + +**Techniques:** Pre-funded inventory, internal netting across strategies, flash liquidity (bounded by depth and fees). + +--- + +### Latency and placement + +**Role:** Reduce time from **signal** to **submission**, and improve placement in the **auction** for blockspace. + +**Knobs:** Geography, peering, NIC/kernel tuning, colocation (where allowed), direct builder APIs. + +--- + +## Mapping to `MEV_Bot` implementation specs + +The **`MEV_Bot`** submodule (see parent **proxmox** repo) names concrete services in `specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md`. Rough alignment: + +| This reference layer | `MEV_Bot` service (MVP spec) | +|----------------------|----------------------------| +| Data: pools, reserves, blocks | `PoolIndexerService`, `StateIngestionService` | +| Liquidity graph | `LiquidityGraphService` | +| Opportunity search | `OpportunitySearchService` | +| Simulation | `SimulationService` | +| Strategy / risk (partially in sim + ops) | Config + scoring in `SEARCH_AND_SIMULATION_SPEC` / ops docs | +| Bundle build | `BundleBuilderService` | +| Execution / relay | `ExecutionGateway` | +| Settlement and PnL | `SettlementAnalyticsService` | +| Mempool-driven rescoring (optional) | `MempoolWatcherService` (noted as optional in spec) | + +Use **`MEV_Bot/specs/README.md`** for the authoritative reading order and scope boundaries. + +--- + +## Related + +- [ARCHITECTURE.md](ARCHITECTURE.md) — Mermaid diagrams of the same pipeline +- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) — what kinds of opportunities exist +- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — competition and scaling diff --git a/docs/SCALING_AND_REALITY.md b/docs/SCALING_AND_REALITY.md new file mode 100644 index 0000000..6d4734a --- /dev/null +++ b/docs/SCALING_AND_REALITY.md @@ -0,0 +1,59 @@ +# Scaling and competitive reality (reference) + +**Last Updated:** 2026-04-13 +**Document Version:** 1.0 +**Status:** Reference + +This compresses “what scales” and “what bites” for teams sizing MEV and arbitrage infrastructure. It is **not** a revenue forecast. + +--- + +## What scales (qualitatively) + +| Pattern | Scales under competition? | Notes | +|--------|---------------------------|--------| +| Simple two-pool DEX arb | Usually **poorly** | Commoditized; margins often near gas | +| Triangular / multi-hop | **Mixed** | Limited by weakest leg; search cost grows | +| Backruns (mempool-visible flow) | **Often better** | High frequency; outcome dominated by simulation + execution | +| Sandwiching | **Mixed** (and often **excluded** by policy) | High per-trade upside when it works; protections, reverts, and ethics matter | +| Liquidations | **Bursty** | Large per-event when volatility spikes; protocol-specific | +| Cross-chain | **Mixed** | Capital and bridge latency bound scale | +| Oracle lag | **Rare** | High impact episodes; requires deep protocol knowledge | + +Top teams typically run a **multi-strategy** portfolio (detection + simulation + execution shared), rather than a single “arb bot.” + +--- + +## Hard constraints + +1. **Near zero-sum extraction** against unsophisticated flow or passive LPs: your profit is often someone else’s worse execution or lower returns. +2. **Competitors** include specialized firms with dedicated data, execution, and builder relationships. +3. **Edge decays** as more capital and automation chase the same signals. +4. **Operational risk:** reorgs, relay failures, buggy adapters, key compromise, and regulatory attention (especially user-harming strategies). + +--- + +## Detection funnel (intuition) + +- **Many** raw signals per second (logs, mempool hints, block diffs). +- **Some** survive simulation as technically valid. +- **Few** remain profitable after gas, tips, and failure probability. +- **Fewer still** win the **ordering auction** against peers. + +Framing the system as **winning blockspace under uncertainty** matches production more closely than “find positive arb.” + +--- + +## Relation to implementation work + +Building a **reliable** internal platform (indexing, deterministic simulation, bundle lifecycle, observability) is valuable even when **public mempool alpha** is thin: the same components support **risk management**, **internal routing**, **testing**, and **incident replay**. + +For build scope and safety gates, follow **`MEV_Bot/specs/MVP_SCOPE.md`** and **`MEV_Bot/specs/OBSERVABILITY_SAFETY_AND_ROLLOUT.md`** in the proxmox parent submodule. + +--- + +## Related + +- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) +- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) +- [ARCHITECTURE.md](ARCHITECTURE.md)