Monitoring Your Gensyn Node

Keeping track of your Gensyn RL Swarm node is important for optimal performance and troubleshooting. This guide shows you how to monitor your node effectively.

Gensyn Dashboard

The primary way to monitor your node's performance is through the official Gensyn dashboard.

Accessing the Dashboard

Visit: dashboard.gensyn.ai
Network Overview: See total nodes, training sessions, and network health
Node Statistics: Find your node in the participant list
Training Progress: Monitor ongoing AI model training sessions

What You Can See:

Active Nodes: Total number of participants in the network
Training Sessions: Current and completed AI training jobs
Network Health: Overall system status and performance
Your Contribution: Your node's participation and contributions

Local Node Monitoring

Check Container Status

# List running containers
docker ps

# Check specific Gensyn container
docker ps | grep swarm

# View container resource usage
docker stats

# Get detailed container info
docker inspect <container_name>

Monitor System Resources

CPU and Memory Usage

# Monitor system resources
htop

# Or use basic tools
top

# Check memory usage
free -h

# Check disk usage
df -h

GPU Monitoring (if using GPU mode)

# Monitor GPU usage
nvidia-smi

# Continuous GPU monitoring
watch -n 1 nvidia-smi

# Check GPU memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv

Log Management

Viewing Logs

# View live logs from running container
docker logs -f <container_name>

# View last 100 lines of logs
docker logs --tail 100 <container_name>

# View logs with timestamps
docker logs -t <container_name>

# Save logs to file
docker logs <container_name> > gensyn_logs.txt

Log Locations

Your Gensyn node stores logs in several locations:

rl-swarm/
├── logs/
│   ├── training.log          # AI training session logs
│   ├── network.log           # P2P network communication
│   ├── error.log             # Error messages and warnings
│   └── performance.log       # Performance metrics

Understanding Log Messages

Normal Operations

# Examples of healthy log messages
[INFO] Connected to Gensyn testnet
[INFO] Training session started
[INFO] Model parameters updated
[INFO] Peer synchronization complete

Warning Signs

# Watch out for these messages
[WARN] Network connection unstable
[ERROR] Training session failed
[ERROR] Insufficient memory
[WARN] GPU memory low

Performance Metrics

Key Metrics to Monitor

Connection Status: Is your node connected to the network?
Training Participation: How many training sessions are you joining?
Resource Usage: CPU, RAM, and GPU utilization
Network Bandwidth: Upload/download speeds
Error Rate: Frequency of errors or failed operations

Creating a Monitoring Script

#!/bin/bash
# Simple monitoring script for Gensyn node

echo "=== Gensyn Node Status ==="
echo "Date: $(date)"
echo ""

# Check if container is running
echo "Container Status:"
docker ps | grep swarm || echo "No Gensyn container running"
echo ""

# Check system resources
echo "System Resources:"
echo "Memory: $(free -h | grep '^Mem:' | awk '{print $3 "/" $2}')"
echo "Disk: $(df -h / | tail -1 | awk '{print $3 "/" $2 " (" $5 " used)"}')"
echo ""

# Check GPU if available
if command -v nvidia-smi &> /dev/null; then
  echo "GPU Status:"
  nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv,noheader,nounits
fi

echo ""
echo "Recent logs:"
docker logs --tail 5 $(docker ps | grep swarm | awk '{print $1}') 2>/dev/null || echo "No recent logs available"

Save this as monitor_gensyn.sh and run with:

chmod +x monitor_gensyn.sh
./monitor_gensyn.sh

Network Connectivity

Check Network Status

# Test internet connectivity
ping -c 4 8.8.8.8

# Check DNS resolution
nslookup dashboard.gensyn.ai

# Test connection to Gensyn services
curl -I https://dashboard.gensyn.ai

Port Configuration

Ensure your firewall allows the necessary connections:

# Check firewall status (Ubuntu/Debian)
sudo ufw status

# If you need to open ports (adjust as needed)
# sudo ufw allow <port_number>

Automated Monitoring

Set up Log Rotation

# Create logrotate configuration
sudo tee /etc/logrotate.d/gensyn << EOF
/home/$USER/rl-swarm/logs/*.log {
  daily
  missingok
  rotate 7
  compress
  delaycompress
  notifempty
  create 644 $USER $USER
}
EOF

Monitor with Cron Jobs

# Edit crontab
crontab -e

# Add monitoring job (runs every 5 minutes)
*/5 * * * * /path/to/monitor_gensyn.sh >> /var/log/gensyn_monitor.log

Troubleshooting Monitoring Issues

Common Problems

Dashboard not showing your node:
- Check internet connection
- Verify node is running with docker ps
- Check logs for connection errors
High resource usage:
- Monitor with docker stats
- Check if multiple training sessions are running
- Consider upgrading hardware
Connection drops:
- Check network stability
- Review firewall settings
- Look for network-related log messages

Health Check Commands

# Quick health check
docker ps | grep swarm && echo "✓ Container running" || echo "✗ Container not found"

# Check if logs are being generated
ls -la logs/
tail -n 1 logs/*.log | head -10

# Verify network connectivity
ping -c 1 dashboard.gensyn.ai && echo "✓ Network OK" || echo "✗ Network issue"

Performance Optimization

Resource Monitoring Tips

Memory: Ensure you have enough RAM available
Storage: Keep sufficient disk space free
Network: Stable internet connection is crucial
GPU: Monitor GPU memory and utilization

When to Restart

Consider restarting your node if you notice:

Consistently high error rates in logs
Network connectivity issues
Memory leaks (constantly increasing RAM usage)
Performance degradation

# Graceful restart
docker-compose down
docker-compose run --rm --build -Pit swarm-cpu  # or swarm-gpu

Getting Help

If you notice issues in your monitoring:

Check the logs first - they usually contain helpful error messages
Visit the dashboard to see if it's a network-wide issue
Review system resources to ensure your hardware can handle the load
Check our Troubleshooting Guide for common solutions

Remember: It's normal for AI training to use significant resources. The key is ensuring stable operation without overwhelming your system!

Gensyn Dashboard​

Accessing the Dashboard​

What You Can See:​

Local Node Monitoring​

Check Container Status​

Monitor System Resources​

CPU and Memory Usage​

GPU Monitoring (if using GPU mode)​

Log Management​

Viewing Logs​

Log Locations​

Understanding Log Messages​

Normal Operations​

Warning Signs​

Performance Metrics​

Key Metrics to Monitor​

Creating a Monitoring Script​

Network Connectivity​

Check Network Status​

Port Configuration​

Automated Monitoring​

Set up Log Rotation​

Monitor with Cron Jobs​

Troubleshooting Monitoring Issues​

Common Problems​

Health Check Commands​

Performance Optimization​

Resource Monitoring Tips​

When to Restart​

Getting Help​