Building a Python Sidecar to Monitor Server Performance

Enhance Your Server Monitoring with a Lightweight Python Sidecar

C05348A3-9AB8-42C9-A6E0-81DB3AC59FEB
           

In modern cloud and microservices architectures, sidecars play an essential role by offloading auxiliary tasks from the main application. A Python sidecar can be particularly useful for monitoring system health without burdening the primary application. This guide will walk you through creating a lightweight Python script that continuously collects server performance metrics, logs them, and optionally sends alerts when thresholds are exceeded.

judy-beth-morris-D5bZ2wzgUkA-unsplash

A Python-based sidecar is a simple yet effective way to monitor server performance in real-time. With extensions like alerts and remote logging, it can help ensure system stability and proactive issue resolution. Try integrating this into your infrastructure to enhance observability without impacting core services.

I wrote this script when I worked in an environment where I had to monitor a large number of hosts running Docker containers (about 600 X-Large running over 30,000 containers), and I wasn't allowed to use existing monitoring software, even open-source ones. But, that's the power of Python, a few lines of code can really go a long way.

Dependencies

The script relies on the psutil library, which allows us to fetch system performance metrics such as CPU load, memory usage, and disk activity. Installing this library on some flavors of Linux can sometimes be tricky, but there might be older versions available to install your OS' packet manager repository.

The script also relies on the schedule library and uses threads. This way, the script runs continuously, and the frequency of when you want to check for certain conditions can be adjusted within the script.

I decided to use Slack for alerting, by posting messages to a particular channel. You may also want to adjust this to your own needs. This particular version runs on an EC2 instance on AWS, so the Slack URL is retrieved from Parameter Store using the Boto3 library, once again, please adjust to your situation.

Features

I obviously wanted the script to monitor for Disk, CPU, and Memory usage, the basics. The thresholds and frequency can be adjusted.

One of the processes running on some servers had the largest memory leak I have ever encountered, about 1GB per day. As a result, the script is also logging the memory usage over time for each process using over 2GB (once again, adjust to your needs). This is a good way to identify these conditions, you can grep the log and see the memory usage growing over time.

I also wanted the script to check for dropped network connections, but this was generating too many alerts, maybe a sign of larger issues going on in the environment.

CPU can be tricky to monitor, a temporary surge in CPU usage can actually be OK, but CPU usage near 90-100% for a long time is definitely not good, unless maybe you are training an AI model. Once again, adjust to your needs!!!

The Python Code

The full code is available on GitHub at:

https://github.com/Christophe-Gauge/python/blob/main/sidecar_monitor.py

The schedule for each check is set near the bottom of the script, the schedule.every(5).to(15).minutes syntax means that this task will execute sometime between every 5 to 15 minutes, adding a bit of randomness so that all the checks hopefully don't all happen at the same time, causing an unwanted CPU spike.

schedule.every(5).to(15).minutes.do(run_threaded, Check_Disk_Usage)
    schedule.every(15).to(45).minutes.do(run_threaded, Check_Memory_Usage)
    schedule.every(10).to(20).minutes.do(run_threaded, Check_Required_Containers)
    # schedule.every(4).to(6).hours.do(run_threaded, Check_Network_Drops)
    schedule.every(1).to(2).hours.do(run_threaded, Check_CPU_Usage)
    schedule.every(24).hours.do(resetSlackCount)

The watch_and_notify_events function runs as a thread and is quite powerful. It will attach to the Docker Socket and listen for events for the containers you want to watch. It will send Slack messages for every state change of any of these containers. If you have a "bouncing" container that continuously crashes and restarts, this will generate a lot of alerts, so there is also a daily maximum number of Slack notifications, just so that you don't drown in alerts.

Using the script

Running as a Background Service

To run the sidecar script in the background, use:

nohup python sidecar_monitor.py &

Running as a Service

For production use, please run as a service, use your particular OS' settings for that.

License

This project is licensed under the GNU Lesser General Public License v3.0 (LGPL-3.0).

Disclaimer

This software is provided "as is," without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

Posted Comments: 0

Tagged with:
Python notification linux networking