Up until a few weeks ago we’ve been doing weekly off-site backups for our hosted helpdesk infrastructure. Local backups were done more frequently, but usually not as frequently as nightly due to performance considerations (and the fact there’s really no such thing as “nightly” to our international customer base — someone is always impacted).

Backups on our scale are a bit more complicated than a single helpdesk, where a simple mysqldump is usually enough to get the job done. We’re hosting helpdesk customers, especially with legacy Cerb2+Cerb3, who have been with us for years. Between those years of e-mail and attachments it’s not uncommon to see 10GB+ databases. That’s a lot of redundant data to be dumping out to *.sql files on the disk — not to mention the redundant cycles wasted on re-compressing it, and wasted bandwidth on transmitting it off-site.

We’ve done a few things to optimize our process over the years:

  • We use mysqlhotcopy to copy database content directly (*.frm, *.MYD) and we don’t backup indexes (*.MYI), as they can be fully regenerated during an eventual backup recovery.

  • With Cerb2+Cerb3 databases, we also don’t back up the search_index or trigram tables since they can also be regenerated with a ‘re-index’. These inefficient tables (thankfully!) don’t exist in Cerb4.

  • When we designed Cerb4 from scratch, we took what we learned about these accumulative Cerb2+Cerb3 inefficiencies and engineered with them in mind. That’s why our attachments are no longer in the database, and why we rely on database indexes for searching again instead of rolling our own indexes inside the database (among hundreds of other design improvements).

  • Having these Cerb4 attachments on disk, and named by their unique, incremental database ID, makes it trivial for us to do incremental backups (i.e. only backing up this week’s new attachments and not the entire directory or database table, then aggregating them with the off-site backups).

  • We’ve started using Amazon’s Web Services (EC2 + S3) to compress backups off-site. This saves a lot of processing power that otherwise goes to waste compressing backup data, just to delete it the next day (or week) and replace it with fresher data.

We’re still a bit handicapped by these cumbersome databases on the Cerb2+Cerb3 hosting end, but we’re going to stop let it affecting the Cerb4 backup policy. We’re going to start doing daily backups for all Cerb4 helpdesks.

Running backups currently impacts our performance, in a large part, because we’re copying dozens or hundreds of gigabytes from the local RAID to another location on the local RAID (so all disks are involved in jumping back and forth). It would be a lot more logical to do these backups to drives that aren’t involved in serving real-time content to users. This isn’t a revelation, it’s just something we’d sacrificed to economy pricing. Having our new hosting prices based on storage helps us factor these realistic costs in.

Through our optimizations and audits over the past couple weeks (to figure out what’s holding us back from keeping ever-more-frequent backups), we found a major inefficiency on our scale is “original_message.html” attachments. These are the HTML versions of incoming e-mail that usually have a plaintext alternative (which is what we display in the GUI). If an e-mail is HTML-only we end up generating the plaintext part from the HTML.

We’re hosting several hosted helpdesks who have 500,000+ tickets, who never delete a thing (spam or otherwise), and who collect customer through website forms that always generate an unnecessary HTML part. One helpdesk in particular has 530,000 attachments, and ~527,000 of them are a redundant HTML copy of the messages already in the helpdesk.

From now on we’re not going to guarantee that we’ll backup attachments named “original_message.html” — it’s going to be at our discretion. This will save a lot of resources wasted on copying, compressing and transmitting redundant data. It will also shorten the backup windows.

Thanks!

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]


Leave a Comment