Archive for the ‘Web Infrastructure’ Category

Finally, a formal release for my WordPress + Varnish + ESI plugin

Tuesday, January 10th, 2012

A while back I wrote a plugin to take care of a particular client traffic problem. As the traffic came in very quickly and unexpectedly, I had only minutes to come up with a solution. As I knew Varnish pretty well, my initial reaction was to put the site behind Varnish. But, there’s a problem with Varnish and WordPress.

WordPress is a cookie monster. It uses and depends on cookies for almost everything – and Varnish doesn’t cache assets that contain cookies. VCL was modified and tweaked, but, the site was still having problems.

So, a plugin was born. Since I was familiar with ESI, I opted to write a quick plugin to cache the sidebar and the content would be handled by Varnish. On each request, Varnish would assemble the Edge Side Include and serve the page – saving the server from a meltdown.

The plugin was never really production ready, though, I have used it for a year or so when particular client needs came up. When Varnish released 3.0, ESI could work with GZipped/Deflated content which significantly increased the utility of the plugin.

If you would like to read a detailed explanation of how the plugin works and why, here’s the original presentation I gave in Florida.

You can find the plugin on WordPress’s plugin hosting at http://wordpress.org/extend/plugins/cd34-varnish-esi/.

Hey Blackberry, do you get paid for Bandwidth burned on data networks?

Thursday, January 5th, 2012

Requests from:

User-Agent: BlackBerry8530/5.0.0.886 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/105
Accept: application/vnd.rim.html,text/html,application/xhtml+xml,
application/vnd.wap.xhtml+xml,text/vnd.sun.j2me.app-descriptor,
image/vnd.rim.png,image/jpeg,application/xvnd.rim.pme.b,
application/vnd.rim.ucs,image/gif;anim=1,application/vnd.rim.jscriptc;
v=0-8-72,application/x-javascript,application/vnd.rim.css;v=2,text/css;
media=screen,application/vnd.wap.wmlc;q=0.9,application/vnd.wap.wmlscriptc;q=0.7,
text/vnd.wap.wml;q=0.7,*/*;q=0.5

We prefer a number of content-types, but, if worse comes to worse, we’ll accept everything anyhow.

440 bytes transmitted on EVERY request made from a Blackberry when you could have just done:

Accept: */*

and saved 428 bytes PER request.

This particular page had 97 assets, amounting to almost 32k in wasted bandwidth sending headers to the CDN.

Ext4, XFS and BtrFS benchmarks and testing

Tuesday, November 1st, 2011

Recently I talked about versioning filesystems available for OSS systems. While most of our server farms use XFS, we have been moving to Ext4 on a number of machines. This wasn’t done as a precursor to BtrFS but problems we’ve been having with XFS on very large filesystems. The fact that we can migrate Ext4 to BtrFS in-place is just a coincidental bonus.

While ZFS is still a consideration if we move to FreeBSD (I was not suitably impressed with Debian’s K*BSD project enough to consider it stable enough for production), I felt that looking at BtrFS might be worth a look. There is also CephFS but that requires a little more infrastructure as you need to run a cluster and it isn’t really made for single machine deployments.

We’re also going to make some assumptions and do things you might not want to do on a home system. Since the data center we’re in has a 100% power SLA, we can be sure that we won’t lose power and can be a little more aggressive. We disable atime which may negatively impact clients if you are dealing with a disk that handles your mailspool. Also, recent versions of XFS handle atime updates much differently, so, the performance boost from noatime is negligible.

Command used to test:

/usr/sbin/bonnie++ -s 8g -n 512

Ext4

mkfs -t ext4 /dev/sda5
mount -o noatime /dev/sda5 /mnt

Results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
version          8G   332  99 54570  12 23512   5  1615  98 62905   6 131.8   3
Latency             24224us     471ms     370ms   13739us     110ms    5257ms
Version  1.96       ------Sequential Create------ --------Random Create--------
version             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 24764  69 267757  98  2359   6 25084  67 388005  98  1403   3
Latency              1258ms    1402us   11767ms    1187ms      66us   11682ms

1.96,1.96,version,1,1320193244,8G,,332,99,54570,12,23512,5,1615,98,62905,6,131.8,3,512,,,,,24764,69,267757,98,2359,6,25084,67,388005,98,1403,3,24224us,471ms,370ms,13739us,110ms,5257ms,1258ms,1402us,11767ms,1187ms,66us,11682ms

Ext4 with journal conversion and mount options

mkfs -t ext4 /dev/sda5
tune2fs -o journal_data_writeback /dev/sda5
mount -o rw,noatime,data=writeback,barrier=0,nobh,commit=60 /dev/sda5 /mnt

Results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
version          8G   335  99 53396  11 25240   6  1619  99 62724   6 130.9   5
Latency             23955us     380ms     231ms   15962us     143ms   16261ms
Version  1.96       ------Sequential Create------ --------Random Create--------
version             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 24253  65 266963  98  2341   6 24567  65 389243  98  1392   3
Latency              1232ms    1405us   11500ms    1232ms     130us   11543ms

1.96,1.96,version,1,1320192213,8G,,335,99,53396,11,25240,6,1619,99,62724,6,130.9,5,512,,,,,24253,65,266963,98,2341,6,24567,65,389243,98,1392,3,23955us,380ms,231ms,15962us,143ms,16261ms,1232ms,1405us,11500ms,1232ms,130us,11543ms

XFS:

mount -t xfs -f /dev/sda5
mount -o noatime /dev/sda5 /mnt

Results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
version          8G   558  98 55174   9 26660   6  1278  96 62598   6 131.4   5
Latency             14264us     227ms     253ms   77527us   85140us     773ms
Version  1.96       ------Sequential Create------ --------Random Create--------
version             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512  2468  19 386301  99  4311  25  2971  22 375624  99   546   3
Latency              1986ms     346us    1341ms    1580ms      82us    5904ms
1.96,1.96,version,1,1320194740,8G,,558,98,55174,9,26660,6,1278,96,62598,6,131.4,5,512,,,,,2468,19,386301,99,4311,25,2971,22,375624,99,546,3,14264us,227ms,253ms,77527us,85140us,773ms,1986ms,346us,1341ms,1580ms,82us,5904ms

XFS, mount options:

mkfs -t xfs -f /dev/sda5
mount -o noatime,logbsize=262144,logbufs=8 /dev/sda5 /mnt

Results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
version          8G   563  98 55423   9 26710   6  1328  99 62650   6 129.5   5
Latency             14401us     345ms     298ms   20328us     119ms     357ms
Version  1.96       ------Sequential Create------ --------Random Create--------
version             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512  3454  26 385552 100  5966  35  4459  34 375917  99   571   3
Latency              1625ms     360us    1323ms    1243ms      67us    5060ms

1.96,1.96,version,1,1320196498,8G,,563,98,55423,9,26710,6,1328,99,62650,6,129.5,5,512,,,,,3454,26,385552,100,5966,35,4459,34,375917,99,571,3,14401us,345ms,298ms,20328us,119ms,357ms,1625ms,360us,1323ms,1243ms,67us,5060ms

XFS, file system creation options and mount options:

mkfs -t xfs -d agcount=32 -l size=64m -f /dev/sda5
mount -o noatime,logbsize=262144,logbufs=8 /dev/sda5 /mnt

Results:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
version          8G   561  97 54674   9 26502   6  1235  95 62613   6 131.4   5
Latency             14119us     346ms     247ms   94238us   76841us     697ms
Version  1.96       ------Sequential Create------ --------Random Create--------
version             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512  9576  73 383305 100 14398  85  9156  70 373557  99  2375  14
Latency              1110ms     375us     301ms     850ms      36us    5772ms
1.96,1.96,version,1,1320198613,8G,,561,97,54674,9,26502,6,1235,95,62613,6,131.4,5,512,,,,,9576,73,383305,100,14398,85,9156,70,373557,99,2375,14,14119us,346ms,247ms,94238us,76841us,697ms,1110ms,375us,301ms,850ms,36us,5772ms

BtrFS:

mkfs -t btrfs /dev/sda5
mount -o noatime /dev/sda5 /mnt

Also, make sure CONFIG_CRYPTO_CRC32C_INTEL is set in the kernel, or loaded as a module and use an Intel CPU.

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
version          8G   254  99 54778   9 23070   8  1407  92 59932  13 131.2   5
Latency             31553us     264ms     826ms   94269us     180ms   17963ms
Version  1.96       ------Sequential Create------ --------Random Create--------
version             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                512 17034  83 256486 100 13485  97 15282  76 38942  73  1472  23
Latency               126ms    2162us   11295us   71992us   20713us   28647ms

1.96,1.96,version,1,1320204006,8G,,254,99,54778,9,23070,8,1407,92,59932,13,131.2,5,512,,,,,17034,83,256486,100,13485,97,15282,76,38942,73,1472,23,31553us,264ms,826ms,94269us,180ms,17963ms,126ms,2162us,11295us,71992us,20713us,28647ms

Analysis

Ext4 is considerably better than Ext3 was last time we ran the check. Even with the allocation group tweaks and mount options we use, Ext4 isn’t a bad alternative and shows some improvements over XFS. However, BtrFS even with the Intel CRC hardware acceleration, the Random Create Read benchmark shows a significant drop.

I believe our recent conversion to Ext4 isn’t negatively impacting things based on the typical workload machines see.

I’ll continue to work with BtrFS and see if I can figure out why that one particular benchmark performs so poorly, but, some of the other options present in BtrFS since it is a versioning filesystem will be quite useful.

Machine specs:

* Linux version 3.1.0 #3 SMP Tue Nov 1 16:23:42 EDT 2011 i686 GNU/Linux
* P4/3.0ghz, 2gb RAM
* Western Digital SATA2 320gb 7200 RPM Drive

Versioning Filesystem choices using OSS

Tuesday, November 1st, 2011

One of the clusters we have uses DRBD between two machines with GFS2 mounted on DRBD in dual primary. I’d played around with Gluster and Lustre, OCFS2, AFS and many others and I’ve used NetApps in the past, but, I’ve never been extremely happy with any of the distributed and clustered filesystems.

With my recent thinking on SetUID mode or SetGID to deal with particular problems led me to look at a versioning filesystem. Currently that leaves ZFS and BtrFS.

I’ve used ZFS in the past on Solaris and it is supported natively within FreeBSD. Since we use Debian, there is Debian’s K*BSD project which puts the Debian userland on the BSD kernel – making most of our in-house management processes easy to convert. Using ZFS under Linux requires using Fuse which could introduce performance issues.

The other option we have is BtrFS. BtrFS is less mature, but, also has the ability to handle in-place migrations from ext3/ext4. While this doesn’t really help much since we primarily run XFS, future machines could use ext4 until BtrFS is deemed stable enough at which point they could be live converted.

In testing, XFS and Ext4 have similar performance when well tuned which means we shouldn’t see any real significant difference with either. Granted this disagrees with some current benchmarks, but, those benchmarks didn’t appear to set the filesystem up correctly and didn’t modify the mount parameters to allow for more buffers to be used. When dealing with small files, XFS needs a little more RAM and the journal logbuffers needs to be increased – keeping more of the log in RAM before being replayed and committed. Large file performance is usually deemed superior with XFS, but, properly tuning Ext3 (and by inference, Ext4), we can change the performance characteristics of Ext3/4 and get about 95% of XFS’s large file performance.

Currently we keep two generations of weekly machine backups. While this wouldn’t change, we actually could do checkpointing and more frequent snapshots so that a file uploaded and modified or deleted would have a much better chance of being able to be restored. One of the things about versioning filesystems is the ability to do hourly or daily snapshots which should allow us to reduce the data loss if a site is exploited or catastrophically damaged through a mistake.

So, we’ve got three potential solutions in order of confidence that the solution will work:

* FreeBSD ZFS
* Debian/K*BSD ZFS
* Debian BtrFS

This weekend I’ll start putting the two Debian solutions through their paces to see if I feel comfortable with either. I’ve got a chassis swap to do this week and we’ll probably switch that machine from XFS to Ext4 in preparation as well. Most of the new machines we’ve been putting online now use Ext4 due to some of the issues I’ve had with XFS.

Ideally, I would like to start using BtrFS on every machine, but, if I need to move things over to FreeBSD, I would have to make some very tough decisions and migrations.

Never a dull moment.

IPTables Performance

Wednesday, October 26th, 2011

I did a talk a while back at Hack and Tell regarding a DDOS attack that we had and last night I was reminded about a section of it while diagnosing a client machine with some performance problems.

IPTables rule evaluations are sequential. The longer your ruleset, the more time it takes to process each packet. There are shortcuts and hash lookup methods like IPSet and nf-hipac which help when dealing with large rulesets you might need when dealing with a DDOS, but, this client’s machine is dealing with legitimate traffic and SI% was higher than I had suspected it should be.

Creating shortcuts in the rulesets to decide whether to process a packet means that the very first rule should be your ACCEPT Related,Established. Since a packet with those flags set isn’t New and is part of an existing stream, there isn’t a reason to continue with rulechecks. So, we short-circuit the condition and automatically accept the packet. This resulted in a 120ms drop in Time to First Byte – yikes. You might contend that blocking an IP won’t affect the current stream, and, you’d be correct. Only when that IP sends a New connection would it be firewalled.

The next set of rules are your Drops for your highest volume service. In this case, port 80, followed by the ACCEPT New. Obviously, port 443/https may be a good candidate for the first or second ruleset depending on your traffic patterns.

The other services on the machine, ssh, ftp, smtp, pop, imap, etc can be placed in as needed. Your goal is to make sure that http/https is served quickly.

Another thing to consider is using RECENT as minor protection:

/sbin/iptables -A INPUT -p tcp --dport ssh -i eth0 -m state --state NEW -m recent --set
/sbin/iptables -A INPUT -p tcp --dport ssh -i eth0 -m state --state NEW -m recent --update --seconds 60 --hitcount 6 -j DROP

The above ruleset allows 6 connections within a minute to connect to SSH. Once an IP exceeds that connection rate, a 60 second counter is updated each time they connect. In order to connect to that port again, 60 seconds after the LAST connection attempt must lapse. You can protect any port like this, but, you wouldn’t want to rely on this for http/https except in extreme cases.