Scrubbing Drupal databases for development

A couple of weeks ago I delivered a webinar on how to manage the process of moving Drupal websites from development to staging and production. The presentation gives an overview of the process including a few key points for the key pieces. Here are the key points:

  • Manage sites in code (keep configuration and updates in code)
  • Have an effective version control system
  • Test, test and run more Tests

In the presentation I covered a pair of scripts that form the basis of creating a scrubbed database for use on development machines. This is a good practice for several reasons including making sure that a development server doesn't send emails or data to Google Analytics for example.

The first piece is a shell script like this:

#!/bin/bash
# Use as ./sql_scrub_dump.sh local_db_user local_db_pass dump_file.gz

# Creates a scrubbed version of a database dump using the local database defined below as the intermediary.
# note: the dump file will be gzip compressed, as will te resulting file.

local_db=6000_temp_scrub

function db_query() {
    sql=${1}
    mysql -u${local_user} -p${local_pass} -e "$sql" ${local_db}
}

local_user=${1}
local_pass=${2}
dump_file=${3}
new_dump_file=scrubbed-${3}

mysql -u${1} -p${2} -e "DROP DATABASE IF EXISTS $local_db"
mysql -u${1} -p${2} -e "CREATE DATABASE $local_db"
gunzip -c $dump_file | mysql -u${1} -p${2} $local_db

mysql -u${1} -p${2} $local_db < scrub.sql
mysql -u${1} -p${2} $local_db < live_to_dev_scrub.sql

# Here we make sure that all the tables are InnoDB.

mysql -u${1} -p${2} -e "SHOW TABLES" $local_db \
  | grep --invert-match '^Tables' \
  | sed -e 's/\(.*\)/ALTER TABLE `\1` ENGINE = innodb;/' \
  | mysql -u${1} -p${2} $local_db

# Dump the scrubbed version and drop the temporary DB.

mysqldump -u${1} -p${2} $local_db | gzip > $new_dump_file
mysql -u${1} -p${2} -e "DROP DATABASE IF EXISTS $local_db"

If you look through what this shell script does it dumps a copy of the database and creates a new database from that dump. In the new, temporary, copy of the database transformations are performed to clean up the data. The cleaning actually happens in this line:

mysql -u${1} -p${2} $local_db < live_to_dev_scrub.sql

Once that transformation is done a dump is generated and one final transformation done to make sure all the tables are Innodb. Then the dump is gzipped and the temporary database deleted.

This sample shows some of what happens in a live_to_dev_scrub.sql. The exact things you'll want to do will vary depending on the site. Many of these concepts will be useful:

-- Here we strip out private information that should not be floating around
-- on developer systems, disable modules that should not or need not be
-- enabled, etc.  admin user passwords are set for ease of testing.
UPDATE users SET name=CONCAT('user', uid), pass='!', init=CONCAT('user', uid, '@example.com') WHERE uid NOT IN (SELECT uid FROM users_roles WHERE rid=3) AND uid > 0;

-- scrub all e-mail addresses - don't annoying people with local dev e-mail.
UPDATE users SET mail=CONCAT('user', uid, '@example.com') WHERE uid > 0;

-- Turn off modules which shouldn't be active in development.
DELETE FROM system WHERE name IN ('twitter', 'googleanalytics', 'securepages');

-- Remove passwords unless users have 'developer role'
UPDATE users SET pass=md5('devpassword') WHERE uid IN (SELECT uid FROM users_roles WHERE rid=3) AND uid > 0;

-- Admin user should not be same but not really well known
UPDATE users SET pass = MD5('supersecret!') WHERE uid = 1;

-- Scrub url aliases for non-admins since these also reveal names
-- Add the IGNORE keyword, since a user may have multiple aliases, and without
-- this keyword the attempt to store duplicate dst values causes the query to fail.
UPDATE IGNORE url_alias SET dst = CONCAT('users/', REPLACE(src,'/', '')) WHERE src IN (SELECT CONCAT('user/', u.uid) FROM users u WHERE u.uid NOT IN (SELECT uid FROM users_roles WHERE rid=3) AND u.uid > 0);

-- don't leave e-mail addresses, etc in comments table.
UPDATE comments SET name='Anonymous', mail='', homepage='http://example.com' WHERE uid=0;

-- Scrub webform submissions.
UPDATE webform_submitted_data set data='*scrubbed*';

-- remove sensitive customer data from custom module
TRUNCATE custom_customer_lead_data;

This is just a starting point. Nobody should use this just as it is. Instead it is important to look at these ideas and see what is necessary for a particular site. For example does a site have Ubercart or an RSVP module that will have personal information about customers? It is also important to consider whether other scripts such as prod_to_stage will be useful. There may be features and data that would be useful on staging that aren't appropriate for development machines.

If working on solutions like this interest you then please take a look at our open jobs. If you would like more advice on this topic or others for your Drupal site consider a
trial subscription to the Acquia Network.

Category: 

8 Comments

a few more items

Hey Josh,

Great write up about automating the process. I wrote about this a while ago - sanitizing a Drupal database - and have now incorporated one of your items there. I think my scrubbing script is a bit more complete which makes some sense because they are based on the scripts used on drupal.org.

Great suggestions

Hey Greg,

Thanks for the suggestions. Indeed you've got a great write-up.

it bears repeating here given an exchange on Twitter that while some of these things could be accomplished by overriding some data in settings.php it is important to consider what data might be floating around on a misplaced laptop or disk drive.

I am already using your code

I am already using your code on one of my sites! :)

Peace,
Jenny
(Google Affiliate)

Thanks for the suggestions

Thanks for the suggestions. Indeed you've got a great write-up.

@ greggles Your article is

@ greggles
Your article is awesome man! Yes your script does look a bit more "complete".

Regards,
Andrew

Your blog rocks!

Your blog rocks!

Hey, thanks for sharing man!

Hey, thanks for sharing man! Iam still kind of at the beginning of webhosting :)