Wednesday, 12 April 2017

HOWTO: Migrating from a VPS to GCP Compute Engine

This post is going to be a brief guide on how to migrate a wordpress site from an existing host to Google Cloud’s Compute Engine service. Note that this guide assumes that multiple sites are being migrated, if only one is then that should make things slightly simpler.

At the end of this guide the site will be migrated, an SSL certificate from Let’s Encrypt will be provisioned and Apache will be doing it’s thing. I’ll leave it as an exercise to the reader to put the site behind CloudFlare (hint: if you are already there the only thing you probably have to change is your A records). For my purposes this guide will also include migrating all images and attachments to Google Cloud Storage.

Step 1: Back up EVERYTHING

In order to make this all work you are going to need backups of everything: the databases, the existing wordpress installs, any other assets you are using, etc. The easy way of doing this is to ssh to your web server and just tar cvzf my_site.tgz /var/www/my_site (this creates a tarball). Then use something like sftp or scp in order to copy the tarball to your local machine. Rinse and repeat for each site.

Repeat this for the database server. Assuming you have 1 database per site (my preferred way of doing it) you can just mysqldump -u root -p <pw> mydb > mydb.sql. If your database is huge it might be prudent to zip or otherwise compress the dump file (mine are only a few megabytes, so I didn’t bother). Once you have the dump file, copy it to your local machine. Keep it up until you have all your databases.

Step 2: Make the Cloud a thing

Now that you have all of the goods on your local machine, it’s time to make a place for them to live. To that end you’ll want to provision a brand new VM on the Compute Engine. I’m not going to walk you through that process as it is pretty well documented (and also just consists of pressing a few buttons. The one thing to watch out for is to make sure that the VM has all of the API Access Scopes that you will need as in order to change them you first have to power off the virtual machine (which is stupid and lots of people have complained about, but that is how it is).

While that is booting up, let’s get the MySQL stuff setup.

If you’re moving to Google Cloud, you might as well move in fully so for MySQL we’ll use Google’s SQL – MySQL Second Generation. The only real weird part here is to make sure you whitelist the IP address of the VM you setup. Alternatively you can go through a process to use the Google SQL Cloud Proxy, but IP whitelisting is a lot easier (and has fewer moving parts).

Step 3: To the CLOUD

Now that your data has a place to live, it’s time to start pushing those bytes to google. This is a 2 step process:
1. Upload the wordpress tarballs to the vm (assuming it has finished booting by now). The easiest way I’ve found to do this is to install gcloud and then execute gcloud compute copy-files ~/local/path <vm_name>:~/ replacing the chevrons and vm_name with the name of the instance you created (mine is web1 because I’m all sorts of imaginative when it comes to naming servers).
2. Navigate to Google Cloud Storage and create a new bucket which is not publicly accessible. After the bucket is created upload all of the sql dumps (in an uncompressed form).

Step 4: Make it Live!

All of the parts are in place, they just need to be configured properly and you’ll have all your blogs running.

Web Server

Decompress the tarballs so that you have the wordpress directories again: tar xvzf my_site.tgz. Copy the resultant directory to wherever you want your blog to live (I just throw them in /var/www/). Next up you’ll need to setup Apache to know about your site.

Since we are only going to support SSL, we will only configure virtual hosts files for SSL. Your configuration should look something like this

<IfModule mod_ssl.c>
        <VirtualHost _default_:443>
                ServerAdmin [email protected]
                ServerName lukebearl.com
                ServerAlias www.lukebearl.com

                DocumentRoot /path/to/lukebearl.com

                <Directory />
                        Options FollowSymLinks
                        AllowOverride None
                </Directory>

                <Directory /path/to/lukebearl.com>
                        Options Indexes FollowSymLinks MultiViews
                        AllowOverride All
                        Require all granted
                </Directory>

                # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
                # error, crit, alert, emerg.
                # It is also possible to configure the loglevel for particular
                # modules, e.g.
                #LogLevel info ssl:warn

                ErrorLog ${APACHE_LOG_DIR}/error.log
                CustomLog ${APACHE_LOG_DIR}/access.log combined

                # For most configuration files from conf-available/, which are
                # enabled or disabled at a global level, it is possible to
                # include a line for only one particular virtual host. For example the
                # following line enables the CGI configuration for this host only
                # after it has been globally disabled with "a2disconf".
                #Include conf-available/serve-cgi-bin.conf

                #   SSL Engine Switch:
                #   Enable/Disable SSL for this virtual host.
                SSLEngine on

                #   A self-signed (snakeoil) certificate can be created by installing
                #   the ssl-cert package. See
                #   /usr/share/doc/apache2/README.Debian.gz for more info.
                #   If both key and certificate are stored in the same file, only the
                #   SSLCertificateFile directive is needed.
                SSLCertificateFile    /etc/letsencrypt/live/lukebearl.com/cert.pem
                SSLCertificateKeyFile /etc/letsencrypt/live/lukebearl.com/privkey.pem
                SSLCertificateChainFile /etc/letsencrypt/live/lukebearl.com/fullchain.pem

                # ... <snipsnip> ...

        </VirtualHost>
</IfModule>

Once you have that setup execute sudo service apache2 reload and then setup the Let’s Encrypt certificate.

In order to do that you’ll first need to make sure that you have certbot installed. After that run this command: sudo certbot certonly --webroot --webroot-path /var/www/html/ --renew-by-default --email [email protected] --text --agree-tos -d lukebearl.com -d www.lukebearl.com The /var/www/html is still serving the default document on port 80 due to Apache’s default site (which we never disabled).

You’ll also want to make sure that the renewal is scheduled in a crontab entry.

At this point you should be able to navigate to your site (and get a database connection error).

SQL

From my SQL control panel in console.cloud.google.com, click the “import” button and then in the dialog that appears find each of the dump files in turn. You’ll also want to use the SQL Console in order to create a user with rights to all of those wordpress databases, but who isn’t root. Make sure the password is decently strong.

After you are done importing all of the databases, go back to the web server and the final configuration task before your site is live can be done.

Editing wp-config.php

cd into the wordpress directory and then open the wp-config.php file in your text editor of choice (like vim). You’ll need to look for and edit all of the DB_* settings to reflect your new MySQL instance. Pay attention to the DB_HOST as that should be the IPv4 address from the SQL management pane.

Extra Credit: Images

If you run an image heavy blog (which this blog obviously is an example of), you’ll notice a considerable speed-up if you make use of the Google Cloud Storage wordpress plugin. One big gotcha that I found is that php5-curl must be installed or this plugin breaks. A big thanks to the developers who work on that plugin as they quickly helped resolve the issue.

Moving the images is a kind of 2 step process:
1. Create the bucket (and make sure to assign the allUsers user “Read” access before you upload anything)
– You probably want to create the bucket as “Multi-Regional” so that images get cached to edge locations.
– Also create a new folder in the bucket named “1”.
2. Copy all of the files from the wp-content/uploads directory to the bucket. When doing this I have found it easiest to cd into the wp-content/uploads directory and then execute the following: gsutil -m cp -r . gs://<bucket_name>/1/
– This may take a few minutes to complete. The -m flag will make it multithreaded.
3. Login to the WordPress Admin and install the plugin.
– After installing, configure the plugin to point to the bucket you just copied everything into. Also make sure that the “Use Secure URLs for serving” flag is checked or you’ll end up with mixed-content errors.

At this point you should have your blog setup on Google Compute Engine using the Cloud MySQL instance and all of your images should be hosted through Google Storage. Let me know how it goes! @lukebearl

Tuesday, 11 April 2017

Does it even exist if it isn’t in source control?

I was working on a little side project at work to make my employer’s data backup processes a bit more robust which involved writing a small console utility. I was asked if the code for the utility was already in source control (this code represented about 10 hours of effort on my part), and I almost replied “Does it even exist if it isn’t in source control?” Instead, of course, I simply sent the link to the repo where the code lives.

This made me start thinking about what really goes into a project, small or large. How many small little side projects have I started, and abandoned within a few hours simply because whatever I was working on wasn’t interesting enough to hold my interest? Would it be logical to start every side project by setting up source control?

At the end of the day I think there is a real judgement call that has to be made about what is appropriate to set up a repository for and what can just live (and likely die) on your local disks. When learning new technologies I generally work through a few different types of “Hello World” type applications that are very simple and then I’ll work through a tutorial or two that is a bit more advanced. Generally the bulk of the code for those things is heavily derived from whatever resource I’m using, so there is minimal to no benefit to putting it in source control. As soon as you move beyond the tutorials: enter source control, stage left.

Modern day software professionals have no excuse not to use source control. Github has free public repositories (and at $7/mo, most folks in software can afford the expense), Bitbucket has free public and private repositories, and those are only the two options that I have used extensively. If you don’t trust anyone else with your code, standing up your own git instance on a VPS is possible as well.

If you really don’t like git for some reason (I’d highly encourage you to learn it though), there are plenty of other options:
1. Bitbucket offers mercurial hosting
2. Fossil SCM (with hosting available here) – note I’ve never used Fossil, but I’ve heard many good things about it
3. If you want to kick it old school, there are plenty of SVN hosting options available as well

In the end, the adage “Does it even exist if it isn’t in source control” rings true. There are so many different options suitable for just about any skill level and personal preference and cost bracket that there are no excuses to no simply use source control.

Monday, 3 April 2017

VPS Migration to GCP

I’ve had all of the blogs that I manage on several VPSs hosted through RamNode for the past couple of years. While there is nothing wrong with RamNode, I figured it was about time to try out something new. I originally was going to move to AWS, since their MySQL RDS offering looked pretty compelling, but then I discovered Google pretty much had feature parity with the parts of AWS I was interested in (plus they have a much nicer value proposition than AWS does for hosting costs).

Out With the Old

In order to keep things reasonably fast, I ran 4 separate VMs: web1, web2, db1, and cache1. The initial idea was that web1 and web2 would be replicas of each other and cache1 (which ran Varnish) would be responsible for balancing load. I never got around to any of that, instead I have web1 (which had about 90% of the traffic) and web2 as completely independent machines. Both talked to cache1, which was running varnish and acted as the public entry point to the blogs. Db1 was just a database server, running some recentish version of MariaDB. Also, in order to help secure the database server, all of the VMs ran OpenVPN so that all internal communication was happening in a private network.

Unfortunately, this was probably a bit over-engineered for the amount of traffic that the blogs actually get in aggregate (under 200k uniques/month), plus it was a maintenance nightmare since I needed to watch over 4 separate servers (but it was a fun learning experience).

In With the New

Now that I have moved to Google Compute Engine, I just have 1 moderately beefy VM running which hosts all of the blogs. Instead of running a dedicated VM (which I would have to administrate), I’m using Google’s Cloud SQL MySQL offering (a dbn1-standard-1 instance). Eventually all of the images will be moved to Google Cloud Storage (this is one area where AWS is light years ahead of Google), in order for that work I need the Google sponsored wordpress plugin to actually work properly.

Future Plans

The reason that I wanted to migrate to either AWS or Google is in order to support future growth. While volume is moderately low right now, it is to be hoped that eventually one or more of the blogs being hosted will have considerable amounts of traffic. If that happens being able to reconfigure a few things in order to support GCP’s load balancing will be critical. The only thing preventing me from setting that up right now is that I will need an SSL certificate which can terminate all of the blogs I manage, which would be a bit expensive (especially since I currently just use LetsEncrypt for all my certificates).

Overall we’ll have to see if this ends up being more reliable and at least as fast as what I previously had configured. I have enough CPU and memory budget where I can probably implement a caching strategy again if the performance isn’t quite where it needs to be (or I’ll just end up setting up a second VM in order to handle all of the caching duties). I’m still running everything through CloudFlare, so that makes sizing everything much more forgiving.