How to backup PostgreSQL on Ubuntu to the cloud of your choice?

Dec 30, 2016
Postgresql

PostgreSQL is very popular solution as backend for many web projects (I believe there are other reasons to use PSQL for data store). I recently wrote short note about PSQL dump, but I’d love to extend its capabilities by offloading it to the separate storage (rather than keeping dumps in the production compute instance). Essentially, what I am going to describe here is how to backup PostgreSQL databases to the cloud (public) storage. I will use two different options: Amazon Simple Storage Service (S3) and Google Cloud Platform (GCP) and Nearline storage class (may be even Coldline to cut costs).

My strategy is based on three steps:

Local database dump;
Copy .bak files to object storage;
Clean up local folder (for dump).

Backup PostgreSQL #

PostgeSQL as any other database engine has internal tool to make DB dump. Its structure (dump file) contains PSQL queries to create DB, tables and insert data (CREATE, INSERT commands) and this is what we are going to do first. First let’s check list of available DBs (if you don’t remember, don’t know) through the psql client.

1postgres=# \list
2List of databases
3  Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
4-----------+----------+----------+-------------+-------------+-----------------------
5hr        | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
6
7[ ... skipped ... ]
8
9(4 rows)

We have some default DBs and hr with postgres as owner. I am going to dump hr to /hr/home.

1pg_dump hr > hr.sql

Now, when the dump is created, we can check its content simply typing:

1cat /home/hr/hr.sql

We see that it contains native language (PSQL) with INSERT, CREATE etc. This will be our backup file. And here we have thousand options to do this. For instance we can do this using AWS CLI as I plan to use S3 as my storage. We can also automate it (third-party backup tool).

PostgreSQL backup to the cloud #

By having this file in local disk we are protected agains software (PSQL) failures or users typos, however we are not protected agains general server (VPS) failure. There are many strategies for backups, but essentially every single admin tries to follow 3-2-1 rule (where you need 3 backups of your data in 2 different locations with at least one off-site). So we are going to have 1 off-site for sure (Cloud Storage). The rest, I believe, is easy to implement. There are several ways to off-load data to the cloud (depends on admin skills and available time to setup and configure accordingly). I have two options listed below: semi-automate via AWS CLI and automate via CB Backup for Linux.

Semi-automate backup to Amazon S3 #

I am going to add AWS CLI to my Ubuntu and configure access to Scalable Storage in the cloud (S3) using IAM access and secret key. This gives me unlimited objects storage “attached” to my Linux VM and I can use it for my assets and backups. It is described here. Make sure you change “AWS Access Key ID [None]” and “AWS Secret Access Key [None]” to yours, which you can generate in your AWS Console. If it does not install AWS CLI, refer to the installation page, you may miss some of the required dependencies. Make sure you give appropriate permissions to your IAM user (at least we need full access to S3 services). Good tool for managing IAM permissions.

AWS IAM Permissions management

Cloud backup tool #

Another good option to do this is to have third-party tool to automate backup. In this case I want to use Cloud Backup for Linux as it has very reach CLI and can use any storage (AWS, GCP, Azure, CTL etc).

1cbb addBackupPlan -n "PostgreSQL Backup" -a "AWS_S3" -f "/home/hr/hr.sql" -c yes -every week -at "23:00" -weekday "mo, tu, we, th, fr" -notification on

By having this in CLI we set backup for hr.sql to AWS_S3 with compression on daily basis at 11pm on weekdays only.

Wrap it up #

Now we have two things working independently (we execute dump manually), CB off-loads at 11pm. The missing brick here is cron. I made simple bash script with two lines: pg_dump and cbb plan -r “PostgreSQL Backup”. I also disabled cbb plan scheduler, so It never runs based on its own schedule as I want this handled by cron. Cron schedule depends on RPO, I set daily in my case. My cron example below:

1crontab -e
200 23 * * * /home/hr/hr.sh

Here is nice cron shortcut with asterisks / number in its line:

Crontab example

That’s it!