Reducing Bandwidth Costs by Moving S3 Video to BackBlaze B2 with CloudFlare

I tweeted about the cost savings that I was able to achieve by moving some videos to S3. Unfortunately in my tweet I was off on the number’s a bit. Basically by switching costs went from $163.45 in September to $2.91 on Backblaze B2 in October (not quite as nice as I said in my tweet, only a 56x reduction in cost instead of a 100x reduction #oops).

Regardless of my inability to recall the amounts of bills, I figured it would be good to quickly document how I have everything setup. The setup instructions I followed are basically the same as what’s documented by Backblaze here. For the purposes of this guide, I’ll assume you already have a CloudFlare account (they have free plans available if you don’t).

  1. Create the bucket in B2 ensuring that the bucket is marked as “Public”
  2. Upload the video to the B2 bucket
  3. Take note of the URL where the video (i.e. f002.backblazeb2.com)
  4. In CloudFlare create a CNAME (with the “Orange Cloud”) which points to the host name found in the URL of your uploaded video
  5. Access the video using your-new-cname.yourdomain.extension/file//

As far as I can tell all files in your bucket will always resolve to the same hostname (the fxxx.backblazeb2.com identifier), so this is only necessary to configure once. Going forward just copy the public link from your Backblaze bucket and replace the backblazeb2.com part with your CNAME, and everything should be happy.

Quick disclaimer: I wasn’t compensated by CloudFlare or Backblaze for this, I’m just sharing my experience. I make no guarantees about any results you’ll see.

MS SQL Server Backups to S3 – On Linux!

Today I’m going to go over what is necessary in order to do full and transaction log backups for SQL Server Express on Linux. One of the big limitations of SQL Express is that it doesn’t include the SQL Agent, so most of the maintenance tasks that can normally be designed and implemented within SSMS need to be rethought. Thankfully Microsoft released sqlcmd for Linux, which makes it pretty easy to go ahead and do the backups as simple bash scripts scheduled through cron.

Prerequisites

This post isn’t going to go through all of the steps to install SQL Server and the associated tools, but Microsoft has done a great job of documenting that on their docs site. In order to push the backups to S3 we will need the s3cmd tool:

apt install s3cmd
s3cmd --configure

You’ll need to have an IAM identity with at least enough permissions to write to the S3 bucket you designate in the script. In the configure prompts include the keys and specify what region you want to default to.

The Scripts

In order to do the backups, two scripts are necessary: one for the full backups and one for the transaction log backups. I’ve opted for a very simple structure since I only care about one database, it shouldn’t be very hard to modify the script to generate backups for each database, but I’ll leave that as an exercise for the reader :).

Full Database Backups (fullBackup.sh)

TIMESTAMP=$(date +"%F")
BACKUP_DIR="/var/opt/mssql/backup/$TIMESTAMP"
SA_USER="SA"
SA_PASS="<Your_SA_User_Password>"

mkdir -p "$BACKUP_DIR"

chown -R mssql:mssql $BACKUP_DIR

sqlcmd -S localhost -Q "BACKUP DATABASE [<DBNAME>] TO DISK = N'$BACKUP_DIR/<DBNAME>.bak' WITH NOFORMAT, NOINIT, SKIP, NOREWIND, STATS=10" -U $SA_USER -P $SA_PASS

s3cmd put "$BACKUP_DIR/<DBNAME>.bak" "s3://<BUCKET_NAME>/$TIMESTAMP/<DBNAME>.bak"
rm -f "$BACKUP_DIR/<DBNAME>.bak"

Transaction Log Backups (logBackup.sh)

DATESTAMP=$(date +"%F")
TIMESTAMP=$(date +"%H%M%S")
BACKUP_DIR="/var/opt/mssql/backup/$DATESTAMP/logs/$TIMESTAMP"
SA_USER="SA"
SA_PASS="<Your_SA_User_Password>"

mkdir -p "$BACKUP_DIR"

chown -R mssql:mssql $BACKUP_DIR

sqlcmd -S localhost -Q "BACKUP LOG [<DBNAME>] TO DISK = N'$BACKUP_DIR/<DBNAME>_log.bak' WITH NOFORMAT, NOINIT, SKIP, NOREWIND, NOUNLOAD, STATS=5" -U SA -P $SA_PASS

s3cmd put "$BACKUP_DIR/<DBNAME>_log.bak" "s3://<BUCKET_NAME>/$DATESTAMP/logs/$TIMESTAMP/<DBNAME>_log.bak"

rm -f "$BACKUP_DIR/<DBNAME>_log.bak"

Then schedule them in cron:

0 0 * * * /root/bin/fullBackup.sh
*/15 * * * * /root/bin/logBackup.sh

With the default schedule I have, full backups are taken at midnight and transaction log backups are taken every 15 minutes.

S3 Lifecycles

While the scripts do a good job of cleaning up after themselves, S3 will (by design) never delete your data unless you specifically tell it to. S3 has a nifty feature called “Lifecycles” which allows us to specify rules for object retention (it is a powerful feature that can be used for a number of other things as well). To access it go to the AWS Console and enter into your S3 bucket. Follow these steps to setup object retention:
1. Select the Management Tab
2. Select Lifecycle
3. Click + Add lifecycle rule
4. Name the rule something descriptive (“Expire all files”). Leave the prefix blank
5. Leave Configure transition blank
6. In Expiration set the following options: S3 Lifecycle Creation
7. Click Save

That’s All

At this point we have full and transaction log backups configured, being pushed off site to Amazon S3. These backups are soft deleted after 7 days and fully deleted after 14 days.

HOWTO: Migrating from a VPS to GCP Compute Engine

This post is going to be a brief guide on how to migrate a wordpress site from an existing host to Google Cloud’s Compute Engine service. Note that this guide assumes that multiple sites are being migrated, if only one is then that should make things slightly simpler.

At the end of this guide the site will be migrated, an SSL certificate from Let’s Encrypt will be provisioned and Apache will be doing it’s thing. I’ll leave it as an exercise to the reader to put the site behind CloudFlare (hint: if you are already there the only thing you probably have to change is your A records). For my purposes this guide will also include migrating all images and attachments to Google Cloud Storage.

Step 1: Back up EVERYTHING

In order to make this all work you are going to need backups of everything: the databases, the existing wordpress installs, any other assets you are using, etc. The easy way of doing this is to ssh to your web server and just tar cvzf my_site.tgz /var/www/my_site (this creates a tarball). Then use something like sftp or scp in order to copy the tarball to your local machine. Rinse and repeat for each site.

Repeat this for the database server. Assuming you have 1 database per site (my preferred way of doing it) you can just mysqldump -u root -p <pw> mydb > mydb.sql. If your database is huge it might be prudent to zip or otherwise compress the dump file (mine are only a few megabytes, so I didn’t bother). Once you have the dump file, copy it to your local machine. Keep it up until you have all your databases.

Step 2: Make the Cloud a thing

Now that you have all of the goods on your local machine, it’s time to make a place for them to live. To that end you’ll want to provision a brand new VM on the Compute Engine. I’m not going to walk you through that process as it is pretty well documented (and also just consists of pressing a few buttons. The one thing to watch out for is to make sure that the VM has all of the API Access Scopes that you will need as in order to change them you first have to power off the virtual machine (which is stupid and lots of people have complained about, but that is how it is).

While that is booting up, let’s get the MySQL stuff setup.

If you’re moving to Google Cloud, you might as well move in fully so for MySQL we’ll use Google’s SQL – MySQL Second Generation. The only real weird part here is to make sure you whitelist the IP address of the VM you setup. Alternatively you can go through a process to use the Google SQL Cloud Proxy, but IP whitelisting is a lot easier (and has fewer moving parts).

Step 3: To the CLOUD

Now that your data has a place to live, it’s time to start pushing those bytes to google. This is a 2 step process:
1. Upload the wordpress tarballs to the vm (assuming it has finished booting by now). The easiest way I’ve found to do this is to install gcloud and then execute gcloud compute copy-files ~/local/path <vm_name>:~/ replacing the chevrons and vm_name with the name of the instance you created (mine is web1 because I’m all sorts of imaginative when it comes to naming servers).
2. Navigate to Google Cloud Storage and create a new bucket which is not publicly accessible. After the bucket is created upload all of the sql dumps (in an uncompressed form).

Step 4: Make it Live!

All of the parts are in place, they just need to be configured properly and you’ll have all your blogs running.

Web Server

Decompress the tarballs so that you have the wordpress directories again: tar xvzf my_site.tgz. Copy the resultant directory to wherever you want your blog to live (I just throw them in /var/www/). Next up you’ll need to setup Apache to know about your site.

Since we are only going to support SSL, we will only configure virtual hosts files for SSL. Your configuration should look something like this

<IfModule mod_ssl.c>
        <VirtualHost _default_:443>
                ServerAdmin [email protected]
                ServerName lukebearl.com
                ServerAlias www.lukebearl.com

                DocumentRoot /path/to/lukebearl.com

                <Directory />
                        Options FollowSymLinks
                        AllowOverride None
                </Directory>

                <Directory /path/to/lukebearl.com>
                        Options Indexes FollowSymLinks MultiViews
                        AllowOverride All
                        Require all granted
                </Directory>

                # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
                # error, crit, alert, emerg.
                # It is also possible to configure the loglevel for particular
                # modules, e.g.
                #LogLevel info ssl:warn

                ErrorLog ${APACHE_LOG_DIR}/error.log
                CustomLog ${APACHE_LOG_DIR}/access.log combined

                # For most configuration files from conf-available/, which are
                # enabled or disabled at a global level, it is possible to
                # include a line for only one particular virtual host. For example the
                # following line enables the CGI configuration for this host only
                # after it has been globally disabled with "a2disconf".
                #Include conf-available/serve-cgi-bin.conf

                #   SSL Engine Switch:
                #   Enable/Disable SSL for this virtual host.
                SSLEngine on

                #   A self-signed (snakeoil) certificate can be created by installing
                #   the ssl-cert package. See
                #   /usr/share/doc/apache2/README.Debian.gz for more info.
                #   If both key and certificate are stored in the same file, only the
                #   SSLCertificateFile directive is needed.
                SSLCertificateFile    /etc/letsencrypt/live/lukebearl.com/cert.pem
                SSLCertificateKeyFile /etc/letsencrypt/live/lukebearl.com/privkey.pem
                SSLCertificateChainFile /etc/letsencrypt/live/lukebearl.com/fullchain.pem

                # ... <snipsnip> ...

        </VirtualHost>
</IfModule>

Once you have that setup execute sudo service apache2 reload and then setup the Let’s Encrypt certificate.

In order to do that you’ll first need to make sure that you have certbot installed. After that run this command: sudo certbot certonly --webroot --webroot-path /var/www/html/ --renew-by-default --email [email protected] --text --agree-tos -d lukebearl.com -d www.lukebearl.com The /var/www/html is still serving the default document on port 80 due to Apache’s default site (which we never disabled).

You’ll also want to make sure that the renewal is scheduled in a crontab entry.

At this point you should be able to navigate to your site (and get a database connection error).

SQL

From my SQL control panel in console.cloud.google.com, click the “import” button and then in the dialog that appears find each of the dump files in turn. You’ll also want to use the SQL Console in order to create a user with rights to all of those wordpress databases, but who isn’t root. Make sure the password is decently strong.

After you are done importing all of the databases, go back to the web server and the final configuration task before your site is live can be done.

Editing wp-config.php

cd into the wordpress directory and then open the wp-config.php file in your text editor of choice (like vim). You’ll need to look for and edit all of the DB_* settings to reflect your new MySQL instance. Pay attention to the DB_HOST as that should be the IPv4 address from the SQL management pane.

Extra Credit: Images

If you run an image heavy blog (which this blog obviously is an example of), you’ll notice a considerable speed-up if you make use of the Google Cloud Storage wordpress plugin. One big gotcha that I found is that php5-curl must be installed or this plugin breaks. A big thanks to the developers who work on that plugin as they quickly helped resolve the issue.

Moving the images is a kind of 2 step process:
1. Create the bucket (and make sure to assign the allUsers user “Read” access before you upload anything)
– You probably want to create the bucket as “Multi-Regional” so that images get cached to edge locations.
– Also create a new folder in the bucket named “1”.
2. Copy all of the files from the wp-content/uploads directory to the bucket. When doing this I have found it easiest to cd into the wp-content/uploads directory and then execute the following: gsutil -m cp -r . gs://<bucket_name>/1/
– This may take a few minutes to complete. The -m flag will make it multithreaded.
3. Login to the WordPress Admin and install the plugin.
– After installing, configure the plugin to point to the bucket you just copied everything into. Also make sure that the “Use Secure URLs for serving” flag is checked or you’ll end up with mixed-content errors.

At this point you should have your blog setup on Google Compute Engine using the Cloud MySQL instance and all of your images should be hosted through Google Storage. Let me know how it goes! @lukebearl