Monitoring Your Gensyn Node
Keeping track of your Gensyn RL Swarm node is important for optimal performance and troubleshooting. This guide shows you how to monitor your node effectively.
Gensyn Dashboard
The primary way to monitor your node's performance is through the official Gensyn dashboard.
Accessing the Dashboard
- Visit: dashboard.gensyn.ai
- Network Overview: See total nodes, training sessions, and network health
- Node Statistics: Find your node in the participant list
- Training Progress: Monitor ongoing AI model training sessions
What You Can See:
- Active Nodes: Total number of participants in the network
- Training Sessions: Current and completed AI training jobs
- Network Health: Overall system status and performance
- Your Contribution: Your node's participation and contributions
Local Node Monitoring
Check Container Status
# List running containers
docker ps
# Check specific Gensyn container
docker ps | grep swarm
# View container resource usage
docker stats
# Get detailed container info
docker inspect <container_name>
Monitor System Resources
CPU and Memory Usage
# Monitor system resources
htop
# Or use basic tools
top
# Check memory usage
free -h
# Check disk usage
df -h
GPU Monitoring (if using GPU mode)
# Monitor GPU usage
nvidia-smi
# Continuous GPU monitoring
watch -n 1 nvidia-smi
# Check GPU memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
Log Management
Viewing Logs
# View live logs from running container
docker logs -f <container_name>
# View last 100 lines of logs
docker logs --tail 100 <container_name>
# View logs with timestamps
docker logs -t <container_name>
# Save logs to file
docker logs <container_name> > gensyn_logs.txt
Log Locations
Your Gensyn node stores logs in several locations:
rl-swarm/
├── logs/
│ ├── training.log # AI training session logs
│ ├── network.log # P2P network communication
│ ├── error.log # Error messages and warnings
│ └── performance.log # Performance metrics
Understanding Log Messages
Normal Operations
# Examples of healthy log messages
[INFO] Connected to Gensyn testnet
[INFO] Training session started
[INFO] Model parameters updated
[INFO] Peer synchronization complete
Warning Signs
# Watch out for these messages
[WARN] Network connection unstable
[ERROR] Training session failed
[ERROR] Insufficient memory
[WARN] GPU memory low
Performance Metrics
Key Metrics to Monitor
- Connection Status: Is your node connected to the network?
- Training Participation: How many training sessions are you joining?
- Resource Usage: CPU, RAM, and GPU utilization
- Network Bandwidth: Upload/download speeds
- Error Rate: Frequency of errors or failed operations
Creating a Monitoring Script
#!/bin/bash
# Simple monitoring script for Gensyn node
echo "=== Gensyn Node Status ==="
echo "Date: $(date)"
echo ""
# Check if container is running
echo "Container Status:"
docker ps | grep swarm || echo "No Gensyn container running"
echo ""
# Check system resources
echo "System Resources:"
echo "Memory: $(free -h | grep '^Mem:' | awk '{print $3 "/" $2}')"
echo "Disk: $(df -h / | tail -1 | awk '{print $3 "/" $2 " (" $5 " used)"}')"
echo ""
# Check GPU if available
if command -v nvidia-smi &> /dev/null; then
echo "GPU Status:"
nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv,noheader,nounits
fi
echo ""
echo "Recent logs:"
docker logs --tail 5 $(docker ps | grep swarm | awk '{print $1}') 2>/dev/null || echo "No recent logs available"
Save this as monitor_gensyn.sh
and run with:
chmod +x monitor_gensyn.sh
./monitor_gensyn.sh
Network Connectivity
Check Network Status
# Test internet connectivity
ping -c 4 8.8.8.8
# Check DNS resolution
nslookup dashboard.gensyn.ai
# Test connection to Gensyn services
curl -I https://dashboard.gensyn.ai
Port Configuration
Ensure your firewall allows the necessary connections:
# Check firewall status (Ubuntu/Debian)
sudo ufw status
# If you need to open ports (adjust as needed)
# sudo ufw allow <port_number>
Automated Monitoring
Set up Log Rotation
# Create logrotate configuration
sudo tee /etc/logrotate.d/gensyn << EOF
/home/$USER/rl-swarm/logs/*.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 644 $USER $USER
}
EOF
Monitor with Cron Jobs
# Edit crontab
crontab -e
# Add monitoring job (runs every 5 minutes)
*/5 * * * * /path/to/monitor_gensyn.sh >> /var/log/gensyn_monitor.log
Troubleshooting Monitoring Issues
Common Problems
-
Dashboard not showing your node:
- Check internet connection
- Verify node is running with
docker ps
- Check logs for connection errors
-
High resource usage:
- Monitor with
docker stats
- Check if multiple training sessions are running
- Consider upgrading hardware
- Monitor with
-
Connection drops:
- Check network stability
- Review firewall settings
- Look for network-related log messages
Health Check Commands
# Quick health check
docker ps | grep swarm && echo "✓ Container running" || echo "✗ Container not found"
# Check if logs are being generated
ls -la logs/
tail -n 1 logs/*.log | head -10
# Verify network connectivity
ping -c 1 dashboard.gensyn.ai && echo "✓ Network OK" || echo "✗ Network issue"
Performance Optimization
Resource Monitoring Tips
- Memory: Ensure you have enough RAM available
- Storage: Keep sufficient disk space free
- Network: Stable internet connection is crucial
- GPU: Monitor GPU memory and utilization
When to Restart
Consider restarting your node if you notice:
- Consistently high error rates in logs
- Network connectivity issues
- Memory leaks (constantly increasing RAM usage)
- Performance degradation
# Graceful restart
docker-compose down
docker-compose run --rm --build -Pit swarm-cpu # or swarm-gpu
Getting Help
If you notice issues in your monitoring:
- Check the logs first - they usually contain helpful error messages
- Visit the dashboard to see if it's a network-wide issue
- Review system resources to ensure your hardware can handle the load
- Check our Troubleshooting Guide for common solutions
Remember: It's normal for AI training to use significant resources. The key is ensuring stable operation without overwhelming your system!