Scaling
RAVEN is designed as a single-binary, single-instance tool for most deployments. This page covers capacity planning and multi-instance patterns for larger environments.
Single Instance Capacity
A single RAVEN instance on modest hardware handles:
| Scenario | Routes | RAM | CPU |
|---|---|---|---|
| Small network, 1-2 routers, partial table | ~100K routes | ~512MB | minimal |
| Medium network, 5 routers, full DFZ | ~5M routes | ~4GB | 1-2 cores |
| Large network, 10+ routers, full DFZ per router | ~120M routes | ~16GB | 4 cores |
The memory target for a full-table multi-router deployment is under 16GB RSS. RAVEN uses a compressed prefix trie (BART) for memory-efficient route storage.
Sizing Guidelines
RAM: The dominant factor is the number of routes in the Route Table. Each route object is approximately 512 bytes. Budget 1GB RAM per 1M routes, plus 2GB baseline for the RPKI cache and process overhead.
CPU: RAVEN is designed for high throughput on the BMP ingestion hot path. During initial table dump (when routers first connect), CPU usage spikes briefly then settles to near-zero for steady-state operation. Use 4 cores for large deployments.
Network: BMP is TCP — kernel flow control handles backpressure naturally. No special network sizing needed beyond reliable connectivity between routers and the RAVEN host.
Disk: RAVEN is stateless by default — no disk I/O in steady state. If you enable file output (NDJSON logging), size accordingly.
Reducing Memory Usage
If you do not need full pre-policy visibility, configure your routers to send only post-policy BMP: FRR — post-policy only bmp targets raven monitor ipv4 unicast post-policy monitor ipv6 unicast post-policy exit
Post-policy tables are significantly smaller than pre-policy since import policies filter out many routes before installation.
Multiple Routers, Single RAVEN Instance
The recommended pattern for most networks — one RAVEN instance receiving BMP from all your routers: Router 1 ──BMP──┐ Router 2 ──BMP──┼──▶ RAVEN ──▶ Prometheus ──▶ Grafana Router 3 ──BMP──┘
RAVEN handles concurrent BMP sessions with one goroutine per session. There is no practical limit on the number of BMP sessions a single instance can accept beyond the RAM budget for the combined route table.
Filtering at the Router
For very large deployments, reduce the route table size by filtering BMP exports at the router: FRR — only send routes from specific peers bmp targets raven monitor ipv4 unicast pre-policy neighbor 192.0.2.1 bmp neighbor 192.0.2.2 bmp exit
This is useful when you want to focus RAVEN on specific peering sessions rather than your full BGP table.
High Availability
RAVEN is a passive observability tool — it has no role in the data plane. A RAVEN outage does not affect routing. Design for availability based on your monitoring SLA, not routing resilience requirements.
Simple HA pattern — two independent instances: Router 1 ──BMP──▶ RAVEN-1 ──▶ Prometheus-1 Router 2 ──BMP──▶ RAVEN-2 ──▶ Prometheus-2
Run two RAVEN instances receiving from different sets of routers. Aggregate in Prometheus using federation or a shared remote write target. This gives you redundancy without coordination complexity.
Full redundancy — all routers to both instances: ┌──▶ RAVEN-1 ──▶ Prometheus Router 1 ──BMP───┤ └──▶ RAVEN-2 ──▶ Prometheus
Both instances receive the full route table independently. Prometheus
aggregates via federation. Use different job labels to distinguish instances.
Metrics will be duplicated — use max() or avg() in your PromQL queries
rather than sum().
Multiple RTR Caches
Always configure a backup RTR cache:
rtr:
caches:
- address: "routinator-primary.example.com:3323"
preference: 1
- address: "routinator-backup.example.com:3323"
preference: 2
If the primary cache goes down, RAVEN automatically fails over to the backup
and logs a warning. The raven_rtr_session_state metric will show 0 for the
failed cache — alert on this.
RTR Cache Staleness
RAVEN tracks how long since the last successful RTR sync and exposes this
via raven_rtr_cache_stale. Configure a Prometheus alert:
- alert: RAVENRTRCacheStale
expr: raven_rtr_cache_stale == 1
for: 5m
labels:
severity: critical
annotations:
summary: "RAVEN RTR cache stale — validation data may be outdated"
The default staleness threshold is 10 minutes. Adjust in config:
rtr:
caches:
- address: "routinator.example.com:3323"
preference: 1
staleness-threshold: 600 # seconds
Querying a Remote RAVEN Instance
The RAVEN CLI can query any RAVEN instance on your network:
raven routes --address 192.0.2.10:11020 --posture origin-invalid
raven status --address 192.0.2.10:11020
raven what-if --address 192.0.2.10:11020 --reject-invalid
This is useful for operations teams who want CLI access without running a local RAVEN instance.
Prometheus Federation
When running multiple RAVEN instances, aggregate metrics via Prometheus federation: