Most of the posts highlight what I am focused on and express work and personal experience. Reason I put them here - recall later or help someone else with similar challenge.
PostgreSQL is very popular solution as backend for many web projects (I believe there are other reasons to use PSQL for data store). I recently wrote short note about PSQL dump, but I'd love to extend its capabilities by offloading it to the separate storage (rather than keeping dumps in the production compute instance). Essentially, what I am going to describe here is how to backup PostgreSQL databases to the cloud (public) storage. I will use two different options: Amazon Simple Storage Service (S3) and Google Cloud Platform (GCP) and Nearline storage class (may be even Coldline to cut costs).
My strategy is based on three steps:
PostgeSQL as any other database engine has internal tool to make DB dump. Its structure (dump file) contains PSQL queries to create DB, tables and insert data (CREATE, INSERT commands) and this is what we are going to do first. First let's check list of available DBs (if you don't remember, don't know) through the psql client.
postgres=# \list List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- hr | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | [ ... skipped ... ] (4 rows)
We have some default DBs and hr with postgres as owner. I am going to dump hr to /hr/home.
pg_dump hr > hr.sql
Now, when the dump is created, we can check its content simply typing:
We see that it contains native language (PSQL) with INSERT, CREATE etc. This will be our backup file. And here we have thousand options to do this. For instance we can do this using AWS CLI as I plan to use S3 as my storage. We can also automate it (third-party backup tool).
By having this file in local disk we are protected agains software (PSQL) failures or users typos, however we are not protected agains general server (VPS) failure. There are many strategies for backups, but essentially every single admin tries to follow 3-2-1 rule (where you need 3 backups of your data in 2 different locations with at least one off-site). So we are going to have 1 off-site for sure (Cloud Storage). The rest, I believe, is easy to implement. There are several ways to off-load data to the cloud (depends on admin skills and available time to setup and configure accordingly). I have two options listed below: semi-automate via AWS CLI and automate via CB Backup for Linux.
I am going to add AWS CLI to my Ubuntu and configure access to Scalable Storage in the cloud (S3) using IAM access and secret key. This gives me unlimited objects storage "attached" to my Linux VM and I can use it for my assets and backups. It is described here.
Make sure you change "AWS Access Key ID [None]" and "AWS Secret Access Key [None]" to yours, which you can generate in your AWS Console. If it does not install AWS CLI, refer to the installation page, you may miss some of the required dependencies. Make sure you give appropriate permissions to your IAM user (at least we need full access to S3 services). Good tool for managing IAM permissions.
Another good option to do this is to have third-party tool to automate backup. In this case I want to use Cloud Backup for Linux as it has very reach CLI and can use any storage (AWS, GCP, Azure, CTL etc).
cbb addBackupPlan -n "PostgreSQL Backup" -a "AWS_S3" -f "/home/hr/hr.sql" -c yes -every week -at "23:00" -weekday "mo, tu, we, th, fr" -notification on
By having this in CLI we set backup for hr.sql to AWS_S3 with compression on daily basis at 11pm on weekdays only.
Now we have two things working independently (we execute dump manually), CB off-loads at 11pm. The missing brick here is cron. I made simple bash script with two lines: pg_dump and cbb plan -r "PostgreSQL Backup". I also disabled cbb plan scheduler, so It never runs based on its own schedule as I want this handled by cron. Cron schedule depends on RPO, I set daily in my case. My cron example below:
crontab -e 00 23 * * * /home/hr/hr.sh
Here is nice cron shortcut with asterisks / number in its line: