Taiko Monitoring Guide

This guide covers monitoring setup for your Taiko node, including metrics collection, alerting, and dashboard configuration.

Overview

Monitoring your Taiko node is crucial for:

Ensuring node health and uptime
Tracking performance metrics
Detecting issues before they become critical
Understanding resource usage patterns

Metrics Endpoints

Taiko exposes the following metrics endpoints:

Endpoint	Port	Description
Prometheus Metrics	9090	Node metrics in Prometheus format
Health Check	8545/health	Basic health status
Node Status	8545/status	Detailed node status

Setting Up Prometheus

1. Install Prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus

# Create prometheus user
sudo useradd --no-create-home --shell /bin/false prometheus
sudo chown -R prometheus:prometheus /opt/prometheus

2. Configure Prometheus

/opt/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'taiko_node'
  static_configs:
    - targets: ['localhost:9090']
      labels:
        instance: 'main'
        node_type: 'taiko'

3. Create Prometheus Service

/etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
  --config.file /opt/prometheus/prometheus.yml \
  --storage.tsdb.path /opt/prometheus/data \
  --web.console.templates=/opt/prometheus/consoles \
  --web.console.libraries=/opt/prometheus/console_libraries
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Key Metrics to Monitor

Node Health Metrics

Basic Metrics
Performance
Taiko Specific

Metric	Description	Alert Threshold
`up`	Node availability	< 1
`taiko_node_height`	Current block height	Stalled > 5 min
`taiko_node_peers`	Connected peers	< 3
`taiko_node_syncing`	Sync status	true > 30 min

Metric	Description	Alert Threshold
`process_cpu_seconds_total`	CPU usage	> 80%
`process_resident_memory_bytes`	Memory usage	> 90%
`taiko_node_disk_usage`	Disk usage	> 85%
`taiko_node_network_bytes`	Network I/O	Depends on plan

Metric	Description	Alert Threshold
`taiko_proposer_proposals_submitted`	Proposals submitted	Low rate
`taiko_prover_proofs_generated`	Proofs generated	Low rate
`taiko_l1_submission_time`	L1 submission time	> 15 min
`taiko_contestation_status`	Block contestations	Monitor for issues

Setting Up Alerts

1. Configure Alertmanager

/opt/prometheus/alertmanager.yml
global:
resolve_timeout: 5m

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'telegram'

receivers:
- name: 'telegram'
telegram_configs:
- bot_token: 'YOUR_BOT_TOKEN'
  chat_id: YOUR_CHAT_ID
  parse_mode: 'HTML'

2. Create Alert Rules

/opt/prometheus/alerts.yml
groups:
- name: taiko_alerts
interval: 30s
rules:
- alert: NodeDown
  expr: up{job="taiko_node"} == 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Taiko node is down"
    description: "Node {{ $labels.instance }} has been down for more than 2 minutes."
    
- alert: LowPeerCount
  expr: taiko_node_peers < 3
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Low peer count"
    description: "Node has only {{ $value }} peers connected."
    
- alert: ProofGenerationStalled
  expr: increase(taiko_prover_proofs_generated[10m]) == 0
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "Proof generation stalled"
    description: "No proofs generated in the last 15 minutes."
    
- alert: L1SubmissionDelayed
  expr: taiko_l1_submission_time > 900
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "L1 submission delayed"
    description: "L1 submission taking longer than 15 minutes."

Monitoring Commands

Check Node Status

# Basic status
curl -s localhost:8545 -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

# Check sync status
curl -s localhost:8545 -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'

# Get peer count
curl -s localhost:8545 -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}'

Log Analysis

# View recent logs
journalctl -u taiko-node -n 100 --no-pager

# Follow logs in real-time
journalctl -u taiko-node -f

# Search for errors
journalctl -u taiko-node | grep -i error | tail -20

# Export logs for analysis
journalctl -u taiko-node --since "1 hour ago" > node-logs.txt

Dashboard Examples

Basic Node Dashboard

Key panels to include:

Node Status: Up/Down indicator
Block Height: Current vs network height
Peer Count: Connected peers over time
Resource Usage: CPU, Memory, Disk
Proposal Status: Block proposals (if proposer)
Proof Generation: Proving metrics (if prover)
L1 Submissions: Rollup data submissions
Rewards: Earned rewards tracking

Example Query Expressions

# Uptime percentage (last 24h)
avg_over_time(up{job="taiko_node"}[24h]) * 100

# Blocks behind network
max(taiko_node_latest_block_height) - taiko_node_height

# Proof generation rate
rate(taiko_prover_proofs_generated[5m])

# Memory usage percentage
100 * (process_resident_memory_bytes / node_memory_MemTotal_bytes)

Best Practices

Regular Backups: Backup Prometheus data regularly
Retention Policy: Set appropriate data retention (e.g., 30 days)
Alert Fatigue: Tune alerts to reduce false positives
Dashboard Organization: Create separate dashboards for different concerns
Documentation: Document custom metrics and alert thresholds

Overview​

Metrics Endpoints​

Setting Up Prometheus​

1. Install Prometheus​

2. Configure Prometheus​

3. Create Prometheus Service​

Key Metrics to Monitor​

Node Health Metrics​

Setting Up Alerts​

1. Configure Alertmanager​

2. Create Alert Rules​

Monitoring Commands​

Check Node Status​

Log Analysis​

Dashboard Examples​

Basic Node Dashboard​

Example Query Expressions​

Best Practices​

Additional Resources​