A large portion of Backlog’s database system was switched over to Amazon Aurora. We would like to offer you a simple introduction to these details and share a bit about the process.
The Reason for Migrating
At the end of last year, we experienced a major database outage which caused a huge inconvenience to many of our users. After this system failure, we started to evaluate how we could improve the operating environment.
A big challenge for us was that there are many environments at work within Backlog, as we had mentioned in a blog that introduced how to use Terraform. At different periods during development, new environments were put into operation, all of which were not uniform with each other. And because we ended up with so many non-uniform environments running at the same time, including MySQL 5.1 with EC2, MySQL 5.6 with EC2, and RDS for MySQL, maintenance became time-consuming and inefficient.
When considering a migration, being able to quickly resolve technical malfunctions and getting operational management costs down (if just a bit) took priority. Over 100 servers are operating Backlog and managing them can be quite complicated. So with all of this happening under the hood, we wanted to find a way to simplify our operations.
Keeping simplicity in mind, we started looking into the migration from the self-managed MySQL with EC2 environment, to a more simplified environment of the managed service with RDS for MySQL.
At first, we looked into changing over to RDS for MySQL, but we found that we were easily able to migrate from RDS for MySQL to Amazon Aurora. We conducted a test to see if we could move the application over and because we were able to accomplish this without any issues, we decided to migrate officially to Amazon Aurora.
The reasons we decided on Amazon Aurora were its availability and scalability. Also, the Nulab accounts that migrated in advance were operating stably. Our previous success using Amazon RDS for MySQL 5.6 in the new Backlog environment along with the compatibility between the two (which was written up formally on the Aurora site) were the other driving factors that pushed the migration to Amazon Aurora.
This is subjective, but when looking at this from an operational standpoint, the best reason for going with Amazon Aurora was its scalability. The storage in Amazon Aurora can automatically expand from 10 GB of space up to 64 TB without impacting the performance of the database. It is very smooth and simple. With Amazon Aurora, there is no need to conduct manual disk expansion, like we had to do with MySQL.
When we originally performed the disk expansion with RDS for MySQL, Backlog service was stopped, and there was a lot of waiting, as we had no control over how long the process would take. Once we migrate to Aurora, we will not have to conduct this type of maintenance often, and if we do, the operation of Backlog will never need to be paused. So from the user’s perspective, with this managed service, maintenance will be much more seamless.
The Migration Procedure
Now let’s take a look at the procedure for migrating from MySQL 5.1 with EC2 to Amazon Aurora. What we’ve put great importance on is being able to migrate the data in a typical maintenance time frame of several hours.
We used this document that explains the replication compatibility of the different versions of MySQL as a reference and followed it step by step and upgraded to the latest version of MySQL, then took care of the replication chain and then switched over the database.
We didn’t migrate all of the 5TB of data to Amazon Aurora all at once. Instead, there were three significant steps that we broke down and split up into 14 tangible steps, and little by little we completed the system upgrade.
On a larger scale, these are the three steps:
- Update from MySQL 5.1 to MySQL 5.6
- Switch over from MySQL 5.6 to RDS for MySQL 5.6
- Switch over from RDS for MySQL 5.6 to Amazon Aurora
You can see each step in detail. Before the migration, the system looked like the figure below.
The Steps of Migration
Step 1: Create a MySQL 5.5 Slave and Begin the Replication.
Start a new slave running MySQL 5.1, then upgrade that slave to MySQL 5.5 using the mysql_upgrade command. The data snapshot for the new slave will come from the existing slave using xtrabackup.
Step 2: Create MySQL 5.6 Slaves and Begin the Replication.
Next, start a slave running MySQL 5.5 and upgrade it to MySQL 5.6 with the mysql_upgrade command. The data snapshot to create this new slave comes from the salve that was created in step 2. Then create a new slave from the one you have just created.
Step 3: Switch MySQL 5.6 to the Master and Run the Application.
Setup maintenance time to run the application with MySQL 5.6 as the master.
Step 4: Create an Instance of Amazon RDS for MySQL 5.6, Create the MySQL 5.6 Slave.
Create a new instance of RDS for MySQL 5.6. Also, create a new MySQL 5.6 slave.
Step 5: Stop Running the Replication of the MySQL 5.6 Slave, and Create a Snapshot of Each Table with mysqldump.
Up to this point, we’ve stopped replication of the new slave that was created in step 4, and then ran a backup of each table separately. The reason for backing up each table separately was because there are a lot of tables and each table’s size varied, and in order to ensure progress in backing up the database, we needed to back each one up individually.
Actually, before reaching this point, we tried to restore and perform a full backup many times, but the numbers weren’t matching up (with the data replication). It turned out to be difficult to follow the cause of the many failed attempts. No matter how much we delayed replication, we decided to wait until the data matched up, then halted replication and that is how we reached this step and authentically migrated the tables.
The story digresses a bit, but in the process of the migration, we considered migrating the data with AWS Database Migration Service. The migration in the testing environment went well, and with a simple step, we confirmed the data replication. However, at the time that we attempted the migration, this service was still in its beta preview, so we weren’t able to successfully move the real data with the AWS Database Migration Service at the time that we planned to migrate. So with the method we have introduced this time around, we are now implementing the data migration. Because we had already completed the migration at the time that the AWS Database Migration Service became available, we plan to use this service at the next opportunity that we need perform a migration. We will report about it when that time comes.
Step 6: Restore the Data Snapshot into RDS for MySQL5.6.
Restore the dumped data of the various tables from step 5.
Step 7: Make the Replicated MySQL 5.6 Slave as Master and the Amazon RDS for MySQL 5.6 as the Slave.
When the restoration is complete, begin the replication.
Step 8: Start the Replication of the MySQL 5.6 Slave.
Restart the replication to synchronize the master application database.
We had as much as 5TB of data, which took about a week until it synchronized with the master.
Step 9: Switch the Amazon RDS for MySQL 5.6 to the Master and Run the Application.
Set up a time for maintenance and run the application with Amazon RDS for MySQL 5.6 as the master.
Step 10: Create a Read Replica of Amazon RDS for MySQL 5.6 and Stop Replication, Which Will Prevent the Deletion of the Binary Log.
Create a read replica under the control of Amazon RDS for MySQL 5.6, and go ahead and stop the replication. The master will continue to run without losing any of the binary logs.
Step 11: Create a Snapshot of the Amazon RDS for MySQL 5.6 Read Replica.
Take a snapshot from the read replica.
Step 12: Create an Instance of Amazon Aurora from the Amazon RDS for MySQL 5.6 Read Replica Snapshot.
Execute the “Migrate Database” option from the snapshot and create an instance of Amazon Aurora. At this point, launch the writer node and reader node and set-up a cluster.
Step 13: With Amazon RDS for MySQL 5.6 as Master, Start the Replication of Amazon Aurora as the Slave.
After completing the “Migrate Database” step, start the replication from the position of the binary log that you verified in step 10.
Step 14: Switch Amazon Aurora to the Master and Run the Application.
Set up a time for maintenance, and with Amazon Aurora as the master, run the application.
The database migration is now complete.
Although it took over a month to start the practical application of this, we were able to migrate without any big issues smoothly.
The database outage that happened at the year’s end was taken seriously, and we implemented a migration to Amazon Aurora after much careful and careful consideration. We would also like to apologize for all of the maintenance that occurred throughout the process of the migration.
The AWS Service has been incredibly useful, but it progresses very fast, and it is hard to keep up with all of its updates. Nulab will continue to use AWS, and because we still need to catch up with them, we will continue to make daily improvements and perform the maintenance without interrupting Backlog’s service.