How we use our CI infrastructure to automate a database migration and run it every night.
We have one project where loading and transforming data is crucial. We’re migrating from one data model to a quite different one. There are a set of rake tasks which do the load, but on our staging environment it was tedious and error prone to pull down the old database, run the rake tasks, and then upload it to a new one.
We wrote a script to automate this task, so now a developer could run the script whenever desired, but wanted to automate it further. The script takes about 10 minutes to run, and we were having issues where staging was out of sync with the source data, causing developers to waste time chasing down data related issues that could be cleared up with a reload of the data. For instance, if a new product is added to the legacy system, it will show up in the new system within twenty four hours.
We ended up using a circleci scheduled workflow to load our staging environment with the new data nightly. Just like any other circle job, it is defined in the .circleci/config.yml file. The steps include setting up the environment so that we can run the scripts, then running the scripts.
Here’s the part of the config file that actually schedules the workflow:
workflows: version: 2 load-legacy-db: jobs: - load-legacy-db triggers: - schedule: cron: "0 10 * * *" filters: branches: only: - master
So we only run this once a day (at 10AM, UTC) and only for the master branch.
There were a couple of other options for running this. We could have kept it adhoc, but that had already proven problematic. We could have run it via a scheduled heroku task, but that would have involved spinning up some databases via add ons and tearing them down and circle felt more natural than that. We could have run this on a different server using cron, but that would have entailed a bit more sysadmining and keeping track of that infrastructure. Here, we have temporary servers that are spun up and spun down with a minimum of fuss, and the process is self documenting within the config file, which is already embedded in our source repository.
Right now we can tear down the staging database with ease. In the future, we may need to be a bit more careful about that, but the script that we run can be updated to contain that intelligence. The circle config won’t need to change at all. This has helped us with our data integrity and makes sure that we don’t spend time chasing down data issues, and all with a minimum of overhead. We did have to upgrade to the paid level of Circle because of the addtional minutes used, but that was simply a matter of time.