Backup Configuration Files to GitHub

Sample Python code to commit critical server configuration files to a GitHub repository

C05348A3-9AB8-42C9-A6E0-81DB3AC59FEB
           

This exercise is kind of the reverse of the one described in Distributed Server Management with GitHub. This time, we want to backup critical server configuration files, such as HAproxy or nginx files, to GitHub. Editing these files is usually done via trial and error...edit the file, verify the syntax, restart the process, and verify that the change had the intended behavior. This is obviously done on a development system, not in production.

While it is absolutely a good idea to store these files in GitHub, the normal GitHub workflow to commit a change to the repository and pull in from the servers doesn't really apply. These configuration files are also usually environment-specific, you can't just take the DEV config file to the Production server, there are DNS names, different logging parameters, and just many things that make them different. I personally found that the best workflow to use is almost a reverse of the normal GitHub workflow, make the change to the file, test and make sure that it works as intended, and then commit the new version of the file to GitHub, as a backup. The backup can be used in case that server needs to be rebuilt, or also as a back-out option, to revert the next change to the previous working version if things go wrong.

This is the basic purpose of this Python script, to commit these files to GitHub when they have changed. The script can also redact specific parameters such as passwords, which of course you never want to store on GitHub or any other code repository. Another advantage of this script is that it can upload files to a single GitHub repository, in a directory named after the server name, instead of requiring a separate repository for each server.

The script uses the PyGitHub library, so let's start by installing that, as well as the Yaml library.

pip3 install PyGithub

Obtaining an API Token from GitHub

To use the following script, you'll first need to obtain an API token from GitHub. In your GitHub account, navigate to "Settings", and "Developer settings", and then click on "personal access token" and create a new token. Per this article, select "repo" as the scope.

You definitely need to protect that personal access token as it allows access to your account, not just any particular repository. For this, I decided to use the RSA-crypto library to encrypt it. The following code will install the library, create a new RSA key set, and encrypt the personal access token in a configuration file as a value of the option named ghe_token. See Public/Private Key RSA Encryption and Python Encryption for additional details on using that library and the associated command-line tool.

We'll first create a new encryption, assuming that you don't already have one, and then set the value of the GitHub API token obtained above as an entry (option) named ghe_token in the configuration file.

pip3 install rsa-crypto 

rsa_crypto create
Enter key password: 
Re-enter key password: 
Creating key...
Created password-protected private/public keys file /Users/me/rsa_key.bin
Use the "extract" keyword to create public and private key files.

rsa_crypto set -o ghe_token
Using key: /Users/me/rsa_key.bin
Opening encrypted key /Users/me/rsa_key.bin
Enter key password: 
Enter value: 
set section:  option:ghe_token
Updated /Users/me/.rsa_values.conf

The Python script will then be configured to read and decrypt the ghe_token configuration “option” to retrieve and use the personal access token. If this seems too complex feel free to find a different solution but please never embed any access token directly in your script, there is a high likelihood that it will end up on GitHub for anyone to see and use!

The Python script

Source code is also available on GitHub.

The configuration variables are pretty straightforward, file_list contains the list of files to commit to GitHub if they exist locally.

The ghe_repo variable should point to your GitHub repository. I strongly suggest that you use a different repository for the purpose of backing up configuration files and a different repository if using the Distributed Server Management with GitHub script.

The ghe_branch variable allows you to point to a different branch, depending on how you use GitHub.

The remote_base_path variable exists only because of a previous bug in the PyGitHub library where the remote path had to start with a /, but this is no longer the case.

The script will create a log file, and will also attempt to determint the name of the user running the script to include as a commit comment.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

__author__ = "Christophe Gauge"
__version__ = "1.0.4"

'''
Backup HAproxy and nginx configuration files.
'''


# I M P O R T S ###############################################################


import github
import os
import sys
import time
import io
import logging
import traceback
import hashlib
import rsa_crypto
import argparse
import socket
import getpass
import base64
if (sys.version_info > (3, 0)):
    from urllib.parse import urljoin
else:
    from urlparse import urljoin


# G L O B A L S ###############################################################

file_list = ['/tmp/test.txt', '/etc/haproxy/haproxy.cfg', '/etc/nginx/nginx.conf']

# # Uncomment if you are using your own internal GitHub Repository
# ghe_organization = 'my_ghe_repo'
# ghe_hostname = mydomain.com

ghe_repo = 'Christophe-Gauge/GitHub'
remote_base_path = ''

ghe_branch = 'main'

logger = logging.getLogger()
logging.basicConfig(level=logging.INFO)
logger.info("Path:    %s" % (os.path.realpath(__file__)))
logger.info("Version:  %s" % (__version__))

args = argparse.Namespace(option='ghe_token')
ghe_token = rsa_crypto.decrypt_value(args)


# C O D E #####################################################################


def main():
    """Main function."""
    global logger
    global repo
    global args

    # Get the server name to use as a directory in GitHub
    server_name = socket.gethostname().split('.')[0].lower()
    # Get the username to log who made the change, nobody will be a Cron task or such
    try:
        user_name = getpass.getuser()
    except Exception as e:
        user_name = None
    if user_name is None:
        try:
            user_name = os.getlogin()
        except OSError as e:
            user_name = 'nobody'
        except Exception as e:
            user_name = 'unknown'

    gh = github.Github(login_or_token=ghe_token)
    repo = gh.get_repo(ghe_repo)

    # # Uncomment if you are using your own internal GitHub Repository
    # gh = github.Github(base_url=f"https://{ghe_hostname}/api/v3", login_or_token=ghe_token)
    # org = gh.get_organization(ghe_organization)
    # repo = org.get_repo(ghe_repo)

    for local_file_name in file_list:
        file_content = ''
        if os.path.exists(local_file_name):
            logger.info('File %s exists, processing.' % local_file_name)
            # Redacting HAproxy auth passwords, more may be needed for your use-case
            with io.open(local_file_name, "r", encoding="utf-8") as f:
                for line in f:
                    if 'auth' in line:
                        file_content += line[:line.find('auth')] + 'auth <REMOVED>\n'
                    else:
                        file_content += line
            # print(file_content)

            data = file_content.encode('utf-8', 'ignore')
            filesize = len(data)
            content = "blob " + str(filesize) + "\0" + data.decode('utf-8')
            encoded_content = content.encode('utf-8')
            localSHA = hashlib.sha1(encoded_content).hexdigest()

            remote_file_name = urljoin(remote_base_path, server_name + '/' + os.path.basename(local_file_name))
            logger.info(f"Saving local file {local_file_name} to remote GitHub repo {repo.full_name} file {remote_file_name}")

            try:
                remoteSHA = repo.get_contents(remote_file_name, ref=ghe_branch).sha
            except github.UnknownObjectException as e:
                logger.error(f"Remote file not found {remote_file_name}")
                remoteSHA = None
            except Exception as e:
                logger.error("Error {0}".format(str(e)))
                logger.error(traceback.format_exc())
                remoteSHA = None

            if remoteSHA == localSHA:
                logger.info('Remote file is present, hash is the same, NOT updating.')
                continue
            else:
                try:
                    if remoteSHA is None:
                        logger.info('Remote file file is NOT present, creating new file')
                        repo.create_file(remote_file_name, f"Updated by {user_name}", data, branch=ghe_branch)
                    else:
                        logger.info('Remote file file is present but hash has changed, updating file')
                        repo.update_file(remote_file_name, f"Updated by {user_name}", data, remoteSHA, branch=ghe_branch)

                except Exception as e:
                    logger.error("Error {0}".format(str(e)))
                    logger.error(traceback.format_exc())
                logger.info('Done updating GitHub')

        else:
            logger.warning('File does not exist %s' % local_file_name)

    logger.info('*** DONE ***')
    sys.exit(0)

###############################################################################


if __name__ == "__main__":
    main()

# E N D   O F   F I L E #######################################################
Posted Comments: 0