From 432273773ac46efa96cd6764726ebd5ead7bba92 Mon Sep 17 00:00:00 2001
From: d-bis infra <infra@d-bis.org>
Date: Sun, 12 Apr 2026 18:16:20 -0700
Subject: [PATCH] Initial commit: MEV taxonomy, production pipeline, mermaid
 architecture, scaling notes

Made-with: Cursor
---
 .gitignore                   |   3 +
 README.md                    |  41 +++++++++++++
 docs/ARCHITECTURE.md         |  91 ++++++++++++++++++++++++++++
 docs/OPPORTUNITY_TAXONOMY.md |  89 +++++++++++++++++++++++++++
 docs/PRODUCTION_PIPELINE.md  | 114 +++++++++++++++++++++++++++++++++++
 docs/SCALING_AND_REALITY.md  |  59 ++++++++++++++++++
 6 files changed, 397 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 README.md
 create mode 100644 docs/ARCHITECTURE.md
 create mode 100644 docs/OPPORTUNITY_TAXONOMY.md
 create mode 100644 docs/PRODUCTION_PIPELINE.md
 create mode 100644 docs/SCALING_AND_REALITY.md
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..239beea
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,3 @@
+.DS_Store
+*.swp
+*~
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..1aa7e3c
--- /dev/null
+++ b/README.md
@@ -0,0 +1,41 @@
+# MEV searcher pipeline reference
+
+**Purpose:** Documentation-only reference for **MEV and arbitrage opportunity taxonomy** and a **typical production searcher pipeline** (data, simulation, strategy, execution, capital, latency). This repository is **not** an execution stack; it frames concepts and diagrams for engineers and operators.
+
+**Implementation source of truth:** The in-repo MEV platform specifications and code live in the **proxmox** parent submodule **`MEV_Bot`** (Gitea `d-bis/MEV_Bot`). Start there for service boundaries, schemas, and build scope:
+
+- `specs/README.md` — spec index and dependency order  
+- `specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md` — Rust services, NATS-style contracts  
+- `specs/SEARCH_AND_SIMULATION_SPEC.md`, `EXECUTION_BUNDLE_AND_RELAY_SPEC.md`, etc.
+
+**This repo contains:**
+
+| Document | Description |
+|----------|-------------|
+| [docs/OPPORTUNITY_TAXONOMY.md](docs/OPPORTUNITY_TAXONOMY.md) | Categories of state-transition / orderflow inefficiencies |
+| [docs/PRODUCTION_PIPELINE.md](docs/PRODUCTION_PIPELINE.md) | End-to-end pipeline narrative + mapping to `MEV_Bot` services |
+| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Mermaid architecture diagrams (primary diagrams live here) |
+| [docs/SCALING_AND_REALITY.md](docs/SCALING_AND_REALITY.md) | What scales, competition, and practical constraints |
+
+## Submodule (parent proxmox repo)
+
+From the proxmox repository root:
+
+```bash
+git submodule update --init mev-searcher-pipeline-reference
+```
+
+Clone with submodule:
+
+```bash
+git clone --recurse-submodules <proxmox-url>
+```
+
+## Remote
+
+- **HTTPS:** `https://gitea.d-bis.org/d-bis/mev-searcher-pipeline-reference.git`  
+- **Default branch:** `main`
+
+## License
+
+Internal reference material; align licensing with your org policy if you redistribute.
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
new file mode 100644
index 0000000..5d7990b
--- /dev/null
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,91 @@
+# Architecture diagrams (Mermaid)
+
+**Last Updated:** 2026-04-13  
+**Document Version:** 1.0  
+**Status:** Reference
+
+Diagrams are **illustrative**: production systems differ by chain, relay, custody, and team policy (including exclusion of harmful MEV). For implementation-level naming, use **`MEV_Bot/specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md`** in the proxmox parent submodule.
+
+---
+
+## 1. Online pipeline (steady state)
+
+```mermaid
+flowchart TB
+  subgraph dataLayer [Data_layer]
+    Rpc[Rpc_nodes_archive_plus_head]
+    Mempool[Mempool_or_private_flow]
+    Indexers[Pool_indexers_and_logs]
+  end
+  subgraph core [Core_compute]
+    Graph[Liquidity_graph_hot_state]
+    Sim[Deterministic_EVM_simulation]
+    Strat[Strategy_and_risk_limits]
+  end
+  subgraph exec [Execution]
+    Bundle[Bundle_builder_signed_txs]
+    Relay[Relay_or_builder_auction]
+    Chain[Chain_settlement]
+  end
+  subgraph capital [Capital_and_ops]
+    Inv[Inventory_and_treasury]
+    Obs[Observability_and_safety]
+  end
+  Rpc --> Indexers
+  Mempool --> Sim
+  Indexers --> Graph
+  Graph --> Sim
+  Sim --> Strat
+  Strat --> Bundle
+  Bundle --> Relay
+  Relay --> Chain
+  Chain --> Inv
+  Strat --> Obs
+  Relay --> Obs
+  Inv --> Strat
+```
+
+**Reading order:** Data feeds refresh graph and triggers; simulation consumes graph plus pending hints; strategy gates bundles; execution competes for inclusion; settlement updates inventory; observability closes the loop.
+
+---
+
+## 2. Pending transaction to candidate bundle (simplified)
+
+```mermaid
+flowchart LR
+  Pending[Pending_tx_or_signal]
+  Local[Local_pre_state_S]
+  Post[Post_state_S_prime]
+  Cand[Candidate_searcher_txs]
+  Profit[Profit_and_gas_check]
+  Bundle[Bundle_payload]
+  Pending --> Local
+  Local --> Post
+  Post --> Cand
+  Cand --> Profit
+  Profit --> Bundle
+```
+
+This is the **backrun-shaped** view: model how someone else’s transaction moves state, then evaluate whether your bundle is profitable **after** that transition (and under your ordering assumptions).
+
+---
+
+## 3. Typical technology map (non-prescriptive)
+
+| Concern | Examples (illustrative only) |
+|--------|------------------------------|
+| RPC / chain access | Self-hosted execution client, archive node, websocket subscriptions |
+| Hot state | Redis, in-memory graph, columnar snapshots for replay |
+| Messaging | NATS, Kafka, or in-process channels (see `MEV_Bot` MVP: NATS) |
+| Simulation | revm, Foundry-style forks, custom EVM + state DB |
+| Storage | PostgreSQL for pools, runs, PnL |
+| Signing | HSM, remote signer, segregated keys per role |
+| Submission | Relay HTTP APIs, builder gRPC, public `eth_sendRawTransaction` |
+
+---
+
+## Related
+
+- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) — narrative and `MEV_Bot` mapping  
+- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) — opportunity classes  
+- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — constraints  
diff --git a/docs/OPPORTUNITY_TAXONOMY.md b/docs/OPPORTUNITY_TAXONOMY.md
new file mode 100644
index 0000000..80069f7
--- /dev/null
+++ b/docs/OPPORTUNITY_TAXONOMY.md
@@ -0,0 +1,89 @@
+# Opportunity taxonomy (reference)
+
+**Last Updated:** 2026-04-13  
+**Document Version:** 1.0  
+**Status:** Reference (not legal or trading advice)
+
+This document groups common **MEV and arbitrage** patterns as **state transition inefficiencies** in AMMs and surrounding orderflow. Wording is descriptive; viability varies by chain, fee regime, mempool visibility, and regulation.
+
+---
+
+## 1. Cross-DEX arbitrage (atomic)
+
+**Idea:** The same asset pair trades at different effective prices across venues (different pools or routers).
+
+**Execution sketch:** Buy on the relatively cheap pool and sell on the relatively expensive one within **one atomic** on-chain transaction (often via a router or custom executor).
+
+**Typical traits:** High event frequency for liquid pairs; per-event profit often small; **very high** competition.
+
+---
+
+## 2. Triangular and multi-hop arbitrage
+
+**Idea:** Mispricing along a **cycle** of pools, for example `A → B → C → A`, not necessarily visible as a two-pool spread.
+
+**Sources:** Routing blind spots, stale aggregator paths, fee tier fragmentation, thin intermediate legs.
+
+**Typical traits:** Medium frequency; pathfinding cost; size limited by the weakest pool on the path.
+
+---
+
+## 3. Backrun arbitrage
+
+**Idea:** Profit from **another actor’s trade** that moves prices: observe a pending or included trade, then trade **into** the post-trade price (often immediately after, hence “backrun”).
+
+**Dependencies:** Mempool or private-flow visibility, fast simulation, and competitive ordering.
+
+**Typical traits:** Very high event count on public mempools; outcome depends on **ordering** and builder/relay dynamics.
+
+---
+
+## 4. Sandwich (front-run + back-run)
+
+**Idea:** Place trades **before and after** a victim swap so the victim executes at a worse price; the searcher unwinds into the moved price.
+
+**Typical traits:** Potentially large profit per victim trade; **high** execution and revert risk; **wallet protections**, builder policies, and **legal/reputational** exposure vary by jurisdiction and venue.
+
+**Internal policy note:** Treat user-harming extraction as a **compliance and product ethics** topic, not only a technical optimization. Many teams **exclude** sandwich strategies by policy.
+
+---
+
+## 5. Liquidation arbitrage
+
+**Idea:** Capture **liquidation incentives** (bonus, spread, or protocol-defined rewards) when positions become undercollateralized.
+
+**Typical traits:** Burst-driven in volatility; can require **inventory** and gas bidding; protocol-specific rules dominate economics.
+
+---
+
+## 6. Cross-chain arbitrage
+
+**Idea:** Price differences for the same economic exposure across chains.
+
+**Constraints:** Bridging latency, reorg risk, inventory on each chain, and trust assumptions of bridges.
+
+**Typical traits:** Medium frequency for some pairs; **capital-heavy**; operational complexity dominates.
+
+---
+
+## 7. Oracle and pricing lag
+
+**Idea:** A protocol’s **on-chain price** lags tradable spot; actors who understand update rules may trade around the lag (within protocol constraints).
+
+**Typical traits:** Infrequent relative to DEX arb; requires **deep protocol** knowledge; high impact when it appears.
+
+---
+
+## 8. AMM curve shape (convexity and fee tiers)
+
+**Idea:** Non-linear pricing (constant product, concentrated liquidity, stable swaps) means large trades create **local** mispricings that other trades can close.
+
+**Typical traits:** Often embedded inside other categories (backrun, triangular, cross-DEX) rather than a standalone label.
+
+---
+
+## Related
+
+- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md) — how these opportunities are detected and acted on in a production-shaped stack  
+- [ARCHITECTURE.md](ARCHITECTURE.md) — diagrams  
+- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — which patterns scale under competition  
diff --git a/docs/PRODUCTION_PIPELINE.md b/docs/PRODUCTION_PIPELINE.md
new file mode 100644
index 0000000..3ee81d2
--- /dev/null
+++ b/docs/PRODUCTION_PIPELINE.md
@@ -0,0 +1,114 @@
+# Production pipeline (reference)
+
+**Last Updated:** 2026-04-13  
+**Document Version:** 1.0  
+**Status:** Reference
+
+This is a **systems-level** description of how serious searchers and builders often structure work: **predict post-trade state, simulate, bid for ordering, settle**. Details differ by chain, relay, and builder market.
+
+---
+
+## Core loop (conceptual)
+
+1. **Ingest** chain head state, pool reserves, and (where available) **pending** transactions or private flow.  
+2. **Model** candidate state transitions (e.g. apply pending tx to a local view of state).  
+3. **Search** for profitable routes or reactions (cycles, backruns, liquidations).  
+4. **Simulate** deterministically in an EVM-equivalent environment with **gas** and fee models.  
+5. **Decide** under risk limits (inventory, gas caps, failure modes).  
+6. **Package** signed transactions into **bundles** or other relay payloads.  
+7. **Submit** to relays, builders, or public mempool; track inclusion and PnL.  
+8. **Observe** failures, latency, and counterparties; feed back into strategy.
+
+The system is not simply “find arb then send tx”; it is **compete for ordering** against others running similar loops.
+
+---
+
+## Layer breakdown
+
+### Data layer
+
+**Inputs:** RPC (archive when historical replay matters), mempool or builder streams, pool factories, logs, optional CEX/oracle feeds.
+
+**Outputs:** Normalized pool graph edges, reserves, fees, and **candidate triggers** (new block, new pending tx, new pool).
+
+**Typical stack (illustrative):** Self-hosted or low-latency RPC, websocket or fiber mempool feeds, indexers, Redis or in-memory hot state, PostgreSQL for metadata and analytics.
+
+---
+
+### Simulation engine
+
+**Role:** Given state `S` and a candidate action, compute post-state, balances, and **net** profit after gas and protocol fees.
+
+**Requirements:** Deterministic EVM behavior, correct token transfer semantics, gas metering, and **parallel** evaluation of many candidates.
+
+This layer is often the **main technical moat** after raw data access.
+
+---
+
+### Strategy engine
+
+**Role:** Choose **which** opportunities to pursue, size, and gas or builder tip policy under constraints (slippage, inventory, max loss, cooldowns).
+
+**Includes:** Graph search / path enumeration, scoring, and integration with **capital** limits.
+
+---
+
+### Execution layer
+
+**Role:** Turn a simulated opportunity into an **included** on-chain outcome.
+
+**Mechanisms (chain-dependent):** Bundles via relays (e.g. Flashbots-style flows), direct builder relationships, private orderflow, or public mempool with aggressive priority fees.
+
+**Win rate** is dominated by this layer plus simulation quality, not only “alpha” detection.
+
+---
+
+### Smart contract layer
+
+**Role:** Atomic multi-hop swaps, flash loans, callbacks, and **revert-safe** accounting so failed attempts do not destroy inventory.
+
+**Requirements:** Gas discipline, minimal external calls, clear failure modes.
+
+---
+
+### Capital layer
+
+**Role:** Inventory across tokens and chains, treasury movements, and **rebalancing** after runs.
+
+**Techniques:** Pre-funded inventory, internal netting across strategies, flash liquidity (bounded by depth and fees).
+
+---
+
+### Latency and placement
+
+**Role:** Reduce time from **signal** to **submission**, and improve placement in the **auction** for blockspace.
+
+**Knobs:** Geography, peering, NIC/kernel tuning, colocation (where allowed), direct builder APIs.
+
+---
+
+## Mapping to `MEV_Bot` implementation specs
+
+The **`MEV_Bot`** submodule (see parent **proxmox** repo) names concrete services in `specs/SERVICE_ARCHITECTURE_AND_MESSAGE_CONTRACTS.md`. Rough alignment:
+
+| This reference layer | `MEV_Bot` service (MVP spec) |
+|----------------------|----------------------------|
+| Data: pools, reserves, blocks | `PoolIndexerService`, `StateIngestionService` |
+| Liquidity graph | `LiquidityGraphService` |
+| Opportunity search | `OpportunitySearchService` |
+| Simulation | `SimulationService` |
+| Strategy / risk (partially in sim + ops) | Config + scoring in `SEARCH_AND_SIMULATION_SPEC` / ops docs |
+| Bundle build | `BundleBuilderService` |
+| Execution / relay | `ExecutionGateway` |
+| Settlement and PnL | `SettlementAnalyticsService` |
+| Mempool-driven rescoring (optional) | `MempoolWatcherService` (noted as optional in spec) |
+
+Use **`MEV_Bot/specs/README.md`** for the authoritative reading order and scope boundaries.
+
+---
+
+## Related
+
+- [ARCHITECTURE.md](ARCHITECTURE.md) — Mermaid diagrams of the same pipeline  
+- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md) — what kinds of opportunities exist  
+- [SCALING_AND_REALITY.md](SCALING_AND_REALITY.md) — competition and scaling  
diff --git a/docs/SCALING_AND_REALITY.md b/docs/SCALING_AND_REALITY.md
new file mode 100644
index 0000000..6d4734a
--- /dev/null
+++ b/docs/SCALING_AND_REALITY.md
@@ -0,0 +1,59 @@
+# Scaling and competitive reality (reference)
+
+**Last Updated:** 2026-04-13  
+**Document Version:** 1.0  
+**Status:** Reference
+
+This compresses “what scales” and “what bites” for teams sizing MEV and arbitrage infrastructure. It is **not** a revenue forecast.
+
+---
+
+## What scales (qualitatively)
+
+| Pattern | Scales under competition? | Notes |
+|--------|---------------------------|--------|
+| Simple two-pool DEX arb | Usually **poorly** | Commoditized; margins often near gas |
+| Triangular / multi-hop | **Mixed** | Limited by weakest leg; search cost grows |
+| Backruns (mempool-visible flow) | **Often better** | High frequency; outcome dominated by simulation + execution |
+| Sandwiching | **Mixed** (and often **excluded** by policy) | High per-trade upside when it works; protections, reverts, and ethics matter |
+| Liquidations | **Bursty** | Large per-event when volatility spikes; protocol-specific |
+| Cross-chain | **Mixed** | Capital and bridge latency bound scale |
+| Oracle lag | **Rare** | High impact episodes; requires deep protocol knowledge |
+
+Top teams typically run a **multi-strategy** portfolio (detection + simulation + execution shared), rather than a single “arb bot.”
+
+---
+
+## Hard constraints
+
+1. **Near zero-sum extraction** against unsophisticated flow or passive LPs: your profit is often someone else’s worse execution or lower returns.  
+2. **Competitors** include specialized firms with dedicated data, execution, and builder relationships.  
+3. **Edge decays** as more capital and automation chase the same signals.  
+4. **Operational risk:** reorgs, relay failures, buggy adapters, key compromise, and regulatory attention (especially user-harming strategies).
+
+---
+
+## Detection funnel (intuition)
+
+- **Many** raw signals per second (logs, mempool hints, block diffs).  
+- **Some** survive simulation as technically valid.  
+- **Few** remain profitable after gas, tips, and failure probability.  
+- **Fewer still** win the **ordering auction** against peers.
+
+Framing the system as **winning blockspace under uncertainty** matches production more closely than “find positive arb.”
+
+---
+
+## Relation to implementation work
+
+Building a **reliable** internal platform (indexing, deterministic simulation, bundle lifecycle, observability) is valuable even when **public mempool alpha** is thin: the same components support **risk management**, **internal routing**, **testing**, and **incident replay**.
+
+For build scope and safety gates, follow **`MEV_Bot/specs/MVP_SCOPE.md`** and **`MEV_Bot/specs/OBSERVABILITY_SAFETY_AND_ROLLOUT.md`** in the proxmox parent submodule.
+
+---
+
+## Related
+
+- [OPPORTUNITY_TAXONOMY.md](OPPORTUNITY_TAXONOMY.md)  
+- [PRODUCTION_PIPELINE.md](PRODUCTION_PIPELINE.md)  
+- [ARCHITECTURE.md](ARCHITECTURE.md)