Skip to main content

Planq Monitoring Guide

This guide covers monitoring setup for your Planq node, including metrics collection, alerting, and dashboard configuration.

Overview

Monitoring your Planq node is crucial for:

  • Ensuring node health and uptime
  • Tracking performance metrics
  • Detecting issues before they become critical
  • Understanding resource usage patterns

Metrics Endpoints

Planq exposes the following metrics endpoints:

EndpointPortDescription
Prometheus Metrics26660Node metrics in Prometheus format
Health Check1317/healthBasic health status
Node Status26657/statusDetailed node status
EVM RPC8545Ethereum-compatible RPC

Setting Up Prometheus

1. Install Prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus

# Create prometheus user
sudo useradd --no-create-home --shell /bin/false prometheus
sudo chown -R prometheus:prometheus /opt/prometheus

2. Configure Prometheus

/opt/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'planq_node'
static_configs:
- targets: ['localhost:26660']
labels:
instance: 'main'
node_type: 'planq'

3. Create Prometheus Service

/etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
--config.file /opt/prometheus/prometheus.yml \
--storage.tsdb.path /opt/prometheus/data \
--web.console.templates=/opt/prometheus/consoles \
--web.console.libraries=/opt/prometheus/console_libraries
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Key Metrics to Monitor

Node Health Metrics

MetricDescriptionAlert Threshold
upNode availability< 1
tendermint_consensus_heightCurrent block heightStalled > 5 min
tendermint_p2p_peersConnected peers< 3
tendermint_consensus_fast_syncingSync statustrue > 30 min

Setting Up Alerts

1. Configure Alertmanager

/opt/prometheus/alertmanager.yml
global:
resolve_timeout: 5m

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'telegram'

receivers:
- name: 'telegram'
telegram_configs:
- bot_token: 'YOUR_BOT_TOKEN'
chat_id: YOUR_CHAT_ID
parse_mode: 'HTML'

2. Create Alert Rules

/opt/prometheus/alerts.yml
groups:
- name: planq_alerts
interval: 30s
rules:
- alert: NodeDown
expr: up{job="planq_node"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Planq node is down"
description: "Node {{ $labels.instance }} has been down for more than 2 minutes."

- alert: LowPeerCount
expr: tendermint_p2p_peers < 3
for: 5m
labels:
severity: warning
annotations:
summary: "Low peer count"
description: "Node has only {{ $value }} peers connected."

- alert: NodeNotSyncing
expr: increase(tendermint_consensus_height[5m]) == 0
for: 10m
labels:
severity: critical
annotations:
summary: "Node stopped syncing"
description: "Block height has not increased for 10 minutes."

- alert: EVMBlockStalled
expr: increase(eth_block_number[2m]) == 0
for: 5m
labels:
severity: critical
annotations:
summary: "EVM blocks not being produced"
description: "EVM block number has not increased for 5 minutes."

Monitoring Commands

Check Node Status

# Basic status
curl -s localhost:26657/status | jq .

# Check sync status
curl -s localhost:26657/status | jq .result.sync_info

# Get peer count
curl -s localhost:26657/net_info | jq .result.n_peers

# Check EVM status
curl -X POST -H "Content-Type: application/json" \
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545

Log Analysis

# View recent logs
journalctl -u planqd -n 100 --no-pager

# Follow logs in real-time
journalctl -u planqd -f

# Search for errors
journalctl -u planqd | grep -i error | tail -20

# Export logs for analysis
journalctl -u planqd --since "1 hour ago" > node-logs.txt

Dashboard Examples

Basic Node Dashboard

Key panels to include:

  1. Node Status: Up/Down indicator
  2. Block Height: Current vs network height
  3. EVM Block: Latest EVM block number
  4. Peer Count: Connected peers over time
  5. Resource Usage: CPU, Memory, Disk
  6. RPC Requests: API usage metrics
  7. Gas Usage: EVM transaction costs

Example Query Expressions

# Uptime percentage (last 24h)
avg_over_time(up{job="planq_node"}[24h]) * 100

# Blocks behind network
max(tendermint_consensus_height) - tendermint_consensus_height

# EVM RPC request rate
rate(json_rpc_requests_total[5m])

# Memory usage percentage
100 * (process_resident_memory_bytes / node_memory_MemTotal_bytes)

Best Practices

  1. Regular Backups: Backup Prometheus data regularly
  2. Retention Policy: Set appropriate data retention (e.g., 30 days)
  3. Alert Fatigue: Tune alerts to reduce false positives
  4. Dashboard Organization: Create separate dashboards for different concerns
  5. Documentation: Document custom metrics and alert thresholds

Additional Resources