Monitor Your Cardano Stake Pool Nodes Using The OFO Stack (Debian Buster)
We have been monitoring our nodes with the ELK Stack, Prometheus, and Beats since we started our stake pool. Performance and reliability is important to us, especially when it comes to our delegators returns. Beyond that, the Cardano network benefits from having professionally run stake pools with healthy nodes.
Although we were happy with that setup, there has been a number of concerns with the future viability of ELK as its new licensing makes it less than an open source platform. Wanting to support the excellent projects https://opensearch.org[OpenSearch] and https://fluentbit.io[Fluentbit] with another use case, and to simplify our monitoring solution we decided to share our notes on how to monitor your Cardano Stake Pool with an https://ofostack.org[OFO Stack].
The notes below will instruct you on how to monitor your Cardano Relay Node or Cardano Producer Node with an https://ofostack.org[OFO Stack] (OpenSearch, Fluentbit, OpenSearch Dashboards) on Debian Buster 10 and using https://caddyserver.com/[Caddy] as a reverse proxy with SSL.
WARNING: This guide does not intend to teach you how to harden your monitoring server or node infrastructure. You should take the proper precautions before placing into production. One important, but by no means exhaustive hardening measure is: place a firewall in front of all systems and restrict access to ports to only trusted systems.
Cardano Stake Pool Monitoring Server
The following steps should be performed on the freshly installed machine to be used as the Cardano Stake Pool monitoring server.
Update the Operating System
Run the following commands as root:
apt update -y
apt upgrade -y
apt dist-upgrade -y
apt autoremove -y
shutdown -r now
Configure Swap Space (Optional)
Run the following commands as root:
swapon --show
fallocate -l 10G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo "/swapfile swap swap defaults 0 0" >> /etc/fstab
swapon --show
shutdown -r now
swapon --show
Add OpenSearch User
Run the following commands as root:
adduser \
--system \
--shell /bin/bash \
--gecos 'OpenSearch User' \
--group \
--disabled-password \
--home /opt/opensearch \
opensearch
Install Java 11
Run the following commands as root:
apt update
apt install -y openjdk-11-jdk
Switch to OpenSearch User
Run the following commands as root:
su - opensearch
cd ~
Install OpenSearch
Run the following commands as opensearch user:
wget https://artifacts.opensearch.org/releases/bundle/opensearch/1.0.0/opensearch-1.0.0-linux-x64.tar.gz
tar -xvzf opensearch-1.0.0-linux-x64.tar.gz
rm opensearch-1.0.0-linux-x64.tar.gz
Create a new file ‘/lib/systemd/system/opensearch.service’ (as root) with the following contents:
[Unit]
Description=OpenSearch
Documentation=https://opensearch.org/docs/
Wants=network-online.target
After=network-online.target
[Service]
Type=forking
RuntimeDirectory=opensearch
#PrivateTmp=true
WorkingDirectory=/opt/opensearch/opensearch-1.0.0
User=opensearch
Group=opensearch
ExecStart=/opt/opensearch/opensearch-1.0.0/opensearch-tar-install.sh
# StandardOutput is configured to redirect to journalctl since
# some error messages may be logged in standard output before
# elasticsearch logging system is initialized. Elasticsearch
# stores its logs in /var/log/elasticsearch and does not use
# journalctl by default. If you also want to enable journalctl
# logging, you can simply remove the "quiet" option from ExecStart.
StandardOutput=journal
StandardError=inherit
# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65535
# Specifies the maximum number of processes
LimitNPROC=4096
# Specifies the maximum size of virtual memory
LimitAS=infinity
# Specifies the maximum file size
LimitFSIZE=infinity
# Disable timeout logic and wait until process is stopped
TimeoutStopSec=0
# SIGTERM signal is used to stop the Java process
KillSignal=SIGTERM
# Send the signal only to the JVM rather than its control group
KillMode=process
# Java process is never killed
SendSIGKILL=no
# When a JVM receives a SIGTERM signal it exits with code 143
SuccessExitStatus=143
# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75
[Install]
WantedBy=multi-user.target
Edit /etc/sysctl.conf (as root) and add (if not in file) or modify the following line:
vm.max_map_count=262144
Run the following commands as root:
sysctl -p
systemctl daemon-reload
systemctl start opensearch.service
systemctl enable opensearch.service
Verify Succesful Startup
Run the following commands as root:
curl -XGET https://localhost:9200 -u 'admin:admin' --insecure
curl -XGET https://localhost:9200/_cat/nodes?v -u 'admin:admin' --insecure
curl -XGET https://localhost:9200/_cat/plugins?v -u 'admin:admin' --insecure
Switch to OpenSearch User
Run the following commands as root:
su - opensearch
cd ~
Install OpenSearch Dashboards
Run the following commands as opensearch user:
wget https://artifacts.opensearch.org/releases/bundle/opensearch-dashboards/1.0.0/opensearch-dashboards-1.0.0-linux-x64.tar.gz
tar -zxf opensearch-dashboards-1.0.0-linux-x64.tar.gz
rm opensearch-dashboards-1.0.0-linux-x64.tar.gz
Create file ‘/lib/systemd/system/opensearch-dashboards.service’(as root) with the following contents:
[Unit]
Description=OpenSeach-Dasboards
[Service]
Type=simple
User=opensearch
Group=opensearch
ExecStart=/opt/opensearch/opensearch-dashboards-1.0.0/bin/opensearch-dashboards
Restart=always
WorkingDirectory=/opt/opensearch/opensearch-dashboards-1.0.0
[Install]
WantedBy=multi-user.target
Run the following commands as root:
systemctl daemon-reload
systemctl start opensearch-dashboards.service
systemctl enable opensearch-dashboards.service
Install Caddy
Run the following commands as root:
wget https://github.com/caddyserver/caddy/releases/download/v2.3.0/caddy_2.3.0_linux_amd64.deb
dpkg -i caddy_2.3.0_linux_amd64.deb
rm caddy_2.3.0_linux_amd64.deb
Configure Caddy
Edit ‘/etc/caddy/Caddyfile’ and replace all contents with the following:
your_monitoring_server.hostname.com {
reverse_proxy localhost:5601
}
Allow UFW (Firewall)
WARNING: Please set the ports to values you are comfortable with based on your own personal risk tolerance. We suggest only allowing trusted systems to connect and to place an appropriately configured firewall in front of all production systems.
Run the following commands as root:
ufw allow proto tcp from any to any port 443
ufw allow proto tcp from any to any port 80
ufw allow proto tcp from any to any port 9200
Start Caddy Service
Run the following commands as root:
systemctl start caddy.service
systemctl enable caddy.service
Switch to OpenSearch User
Run the following commands as root:
su - opensearch
cd ~
Change Default Passwords
Generate Password Hashes
Run the following commands as opensearch user:
chmod +x ~/opensearch-1.0.0/plugins/opensearch-security/tools/hash.sh
~/opensearch-1.0.0/plugins/opensearch-security/tools/hash.sh
Edit Internal User Passwords
Replace hashes in file in appropriate places. Use the steps above this section to generate the appropriate hashes. Delete accounts that will not be needed or used.
~/opensearch-1.0.0/plugins/opensearch-security/securityconfig/internal_users.yml
Apply Security Settings
Run the following commands as opensearch user:
chmod +x ~/opensearch-1.0.0/plugins/opensearch-security/tools/securityadmin.sh
cd ~/opensearch-1.0.0/plugins/opensearch-security/tools/
./securityadmin.sh -cd ../securityconfig/ -icl -nhnv \
-cacert ../../../config/root-ca.pem \
-cert ../../../config/kirk.pem \
-key ../../../config/kirk-key.pem
Cardano Stake Pool Node Monitoring Configuration
The following steps should be performed on your Cardano Stake Pool nodes.
Install Fluent-bit on Nodes
Run the following commands as root:
curl https://packages.fluentbit.io/fluentbit.key | sudo apt-key add -
echo 'deb https://packages.fluentbit.io/debian/buster buster main' > /etc/apt/sources.list.d/fluentbit.list
apt update -y
apt install -y td-agent-bit
systemctl enable td-agent-bit
systemctl start td-agent-bit
Configure Fluentd User
Login to the web interface with the administrative user at https://your_monitoring_server.hostname.com.
- Duplicate Logstash Role in OpenSearch Dashboards user managment
- Call it fluentbit
- Add fluentbit permissions to appropriate index as needed
Create Index Pattern
Create an index pattern ‘system_metrics-*’.
Configure System Metrics Reporting
Edit ‘/etc/td-agent-bit/td-agent-bit.conf’ (as root) and replace with the following contents:
[SERVICE]
flush 5
daemon Off
log_level info
parsers_file parsers.conf
plugins_file plugins.conf
http_server Off
http_listen 0.0.0.0
http_port 2020
storage.metrics on
# Duplicate and edit this section for each network interface you want monitored
[INPUT]
Name netif
Tag node-name-example-cardano-producer
Interval_Sec 1
Interval_NSec 0
Interface enp1s0
[INPUT]
name cpu
tag node-name-example-cardano-producer
# Read interval (sec) Default: 1
interval_sec 1
[INPUT]
Name disk
Tag node-name-example-cardano-producer
Interval_Sec 1
Interval_NSec 0
[INPUT]
Name mem
Tag node-name-example-cardano-producer
[INPUT]
Name tail
Tag node-name-example-cardano-producer
Parser cardano
Path /var/log/cardano/nodelog.log
[OUTPUT]
Name es
Match *
Host your_monitoring_server.hostname.com
Port 9200
Logstash_Format True
Logstash_Prefix system_metrics
tls on
tls.verify off
Include_Tag_Key True
Tag_Key Tag
HTTP_User fluentbit
HTTP_Passwd yourfluentbituserpasswordhere
[OUTPUT]
name stdout
match *
systemctl restart td-agent-bit
Modify Node Configuration
Edit ‘/etc/cardano/mainnet-config.json’ (as root) and set the values as appropriate. An example working configuration is below:
{
"ApplicationName": "cardano-sl",
"ApplicationVersion": 1,
"ByronGenesisFile": "mainnet-byron-genesis.json",
"ByronGenesisHash": "5f20df933584822601f9e3f8c024eb5eb252fe8cefb24d1317dc3d432e940ebb",
"LastKnownBlockVersion-Alt": 0,
"LastKnownBlockVersion-Major": 3,
"LastKnownBlockVersion-Minor": 0,
"MaxKnownMajorProtocolVersion": 2,
"Protocol": "Cardano",
"RequiresNetworkMagic": "RequiresNoMagic",
"ShelleyGenesisFile": "mainnet-shelley-genesis.json",
"ShelleyGenesisHash": "1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81",
"TraceBlockFetchClient": false,
"TraceBlockFetchDecisions": false,
"TraceBlockFetchProtocol": false,
"TraceBlockFetchProtocolSerialised": false,
"TraceBlockFetchServer": false,
"TraceChainDb": false,
"TraceChainSyncBlockServer": false,
"TraceChainSyncClient": false,
"TraceChainSyncHeaderServer": false,
"TraceChainSyncProtocol": false,
"TraceDNSResolver": false,
"TraceDNSSubscription": false,
"TraceErrorPolicy": false,
"TraceForge": true,
"TraceHandshake": false,
"TraceIpSubscription": false,
"TraceLocalChainSyncProtocol": false,
"TraceLocalErrorPolicy": false,
"TraceLocalHandshake": false,
"TraceLocalTxSubmissionProtocol": false,
"TraceLocalTxSubmissionServer": false,
"TraceMempool": false,
"TraceMux": false,
"TraceTxInbound": false,
"TraceTxOutbound": false,
"TraceTxSubmissionProtocol": false,
"TracingVerbosity": "MaximalVerbosity",
"TurnOnLogMetrics": true,
"TurnOnLogging": true,
"defaultBackends": [
"KatipBK"
],
"defaultScribes": [
[
"FileSK",
"/var/log/cardano/nodelog.log"
]
],
"hasEKG": 12788,
"hasPrometheus": [
"0.0.0.0",
12798
],
"minSeverity": "Warning",
"options": {
"mapSubtrace": {
"cardano.node.metrics": {
"subtrace": "Neutral"
}
}
},
"rotation": {
"rpKeepFilesNum": 10,
"rpLogLimitBytes": 5000000,
"rpMaxAgeHours": 24
},
"setupBackends": [
"KatipBK"
],
"setupScribes": [
{
"scFormat": "ScJson",
"scKind": "FileSK",
"scName": "/var/log/cardano/nodelog.log",
"scRotation": null
}
]
}
Change the log location ownership to the appropriate user. ‘cardano’ is the example user. Run the following commands as root:
mkdir /var/log/cardano
chown cardano:cardano /var/log/cardano
systemctl restart cardano-node.service
Edit ‘/etc/td-agent-bit/parsers.conf’ (as root) and add the following section:
[PARSER]
Name cardano
Format json
Time_Key at
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Run the following commands as root:
systemctl restart td-agent-bit
You’re Done
If everything went well you are now shipping your metrics to indices with the following format: ‘system_metrics-YYYY.MM.DD’ and can access them via an index pattern ‘system_metrics-*’. Have fun creating your Dashboards and visualizations!
Bonus Notes
- Documents containing ‘cardano.node.metrics’ in the ’ns’ field are metrics reported from your cardano node software
- Individual nodes can be identified by a unique ‘Tag’ field
- Unique ‘data.name’ fields contain the name of the content of the record and ‘data.node.contents’ contains the value.