Posts Tagged ‘Varnish’

First beta of my new webapp, SnapReplay.com

Saturday, January 28th, 2012

After a late night working until 2:45am on the finishing touches, my phone alerted me to the fact it was 5:35am and time for the first Live Photostream to take place.

The stream? Roger Waters – The Wall, from the Burswood Dome in Perth, AUSTRALIA. Special thanks to Paul Andrews for taking the pictures and doing a lot of testing beforehand.

From those that watched the stream in realtime – about 23 people based on the counter that I didn’t reset from my testing – I did receive a bit of good feedback and collected a lot of data to analyze.

Rule #1, do NOT keep modifying things during a live event. I saw a minor tweak, made a change, and broke some functionality. While it didn’t affect things, it did bug me to see the problem during the first test. Live with things until after the event – unless it is truly a showstopper. In this case, it was just an html tweak which caused some javascript to wrap and broke the JQuery click handler.

Rule #2, you can never collect enough data. While watching the stream, I realized I had turned off almost all of the debugging hints in node.js during development as it was really noisy. While most of the static assets are served with Varnish, those requests aren’t hitting the backend, so, I didn’t have a good indicator of real traffic. Running varnishncsa in one window while watching node.js with a bit of debugging turned on allowed me to see things, but, not logging pageviews, socket connects/disconnects and other data eliminates the ability to review things after the fact. I did think about putting some hooks into some of the express events (express being the framework I’m using).

Rule #3, always test your production environment well before the event/launch. As I had a very compressed development timetable knowing on Jan 13 that we wanted to do the first event on Jan 28, some infrastructure decisions I had made were not tested thoroughly beforehand resulting in having to run with a less than optimal setup. While Varnish and socket.io do work well together, some browser combinations had issues when doing brief usability tests. Fifteen days to write a scaleable architecture and an Android app is difficult. While I had no experience with node.js or socket.io prior to Nov 11, and haven’t touched Java since 2002 or so, I did spend a bit of time dealing with issues that came from lack of exposure to both.

As it isn’t recommended for node.js to handle static content, I used Varnish in a ‘cdn’ setup to offload static assets and media content. This worked very well except when I made a modification to some javascript and due to some of the rules in my VCL, I strip querystring arguments – making it impossible to just add ?v=2 to my javascript include. Bans for the CDN were only allowed from another host (remember that ‘test your complete production environment’ before launch?), so, a little manual telnetting from another machine allowed me to purge the javascript.

All in all, a great first test, several positive comments, and a nice, long list of requests/enhancements. I can see that this might be a fairly popular project.

If you would like to help beta test and have an Android phone, Download the app, take a few snapshots or enter texts and watch them show up on the Beta Test page.

If you have an event or are attending a concert where you would like to use SnapRelay, I can create a custom app specifically for your event. Let me know in the comments.

Finally, a formal release for my WordPress + Varnish + ESI plugin

Tuesday, January 10th, 2012

A while back I wrote a plugin to take care of a particular client traffic problem. As the traffic came in very quickly and unexpectedly, I had only minutes to come up with a solution. As I knew Varnish pretty well, my initial reaction was to put the site behind Varnish. But, there’s a problem with Varnish and WordPress.

WordPress is a cookie monster. It uses and depends on cookies for almost everything – and Varnish doesn’t cache assets that contain cookies. VCL was modified and tweaked, but, the site was still having problems.

So, a plugin was born. Since I was familiar with ESI, I opted to write a quick plugin to cache the sidebar and the content would be handled by Varnish. On each request, Varnish would assemble the Edge Side Include and serve the page – saving the server from a meltdown.

The plugin was never really production ready, though, I have used it for a year or so when particular client needs came up. When Varnish released 3.0, ESI could work with GZipped/Deflated content which significantly increased the utility of the plugin.

If you would like to read a detailed explanation of how the plugin works and why, here’s the original presentation I gave in Florida.

You can find the plugin on WordPress’s plugin hosting at http://wordpress.org/extend/plugins/cd34-varnish-esi/.

W3 Total Cache and Varnish

Thursday, July 21st, 2011

Last week I got called into a firestorm to fix a set of machines that were having problems. As Varnish was in the mix, the first thing I noticed was the hit rate was extremely low as Varnish’s VCL wasn’t really configured well for WordPress. Since WordPress uses a lot of cookies and Varnish passes anything with a cookie to the backend, we have to know which cookies we can ignore so that we can get the cache hit rate up.

Obviously, static assets like javascript, css and images generally don’t need cookies, so, those make a good first target. Since some ad networks set their own cookies on the domain, we need to know which ones to set. However, to make a site resilient, we have to get a little more aggressive and tell Varnish to cache things against its judgement. When we do this, we don’t want to have surfers see stale content, so, we need to purge cached objects from Varnish when they are changed to keep the site interactive.

Caching is easy, purging is hard

This particular installation used W3 Total Cache, a plugin that does page caching, javascript/css minification and combining and handles a number of other features. I was unable to find any suggested VCL, but, several posts on the forums show a disinterest in supporting Varnish.

In most cases, once we determine what we’re caching, we need to figure out what to purge. When a surfer posts a comment, we need to clear the cached representation of that post, the Feed RSS and the front page of the site. This allows any post counters to be updated and keeps the RSS feed accurate.

W3TC includes the ability to purge, but, only works in a single server setting. If you put a domain name in the config box, it should work fine. If you put a series of IP addresses, your VCL either needs to override the hostname or, you need to apply the following patch. There are likely to be bugs, so, try this at your own risk.

If you aren’t using the Javascript/CSS Minification and combining or some of the CDN features that W3TC provides, then I would suggest WordPress-Varnish which is maintained by some people very close to the Varnish team.

I’ve maintained the original line of code from W3TC commented above any changes for reference.

--- w3-total-cache/inc/define.php	2011-06-21 23:22:54.000000000 -0400
+++ w3-total-cache-varnish/inc/define.php	2011-07-21 16:10:39.270111723 -0400
@@ -1406,11 +1406,15 @@
  * @param boolean $check_status
  * @return string
  */
-function w3_http_request($method, $url, $data = '', $auth = '', $check_status = true) {
+#cd34, 20110721, added $server IP for PURGE support
+# function w3_http_request($method, $url, $data = '', $auth = '', $check_status = true) {
+function w3_http_request($method, $url, $data = '', $auth = '', $check_status = true, $server = '') {
     $status = 0;
     $method = strtoupper($method);

-    if (function_exists('curl_init')) {
+#cd34, 20110721, don't use CURL for purge
+#    if (function_exists('curl_init')) {
+    if ( (function_exists('curl_init')) && ($method != 'PURGE') ) {
         $ch = curl_init();

         curl_setopt($ch, CURLOPT_URL, $url);
@@ -1474,7 +1478,13 @@
             $errno = null;
             $errstr = null;

-            $fp = @fsockopen($host, $port, $errno, $errstr, 10);
+#cd34, 20110721, if method=PURGE, connect to $server, not $host
+#            $fp = @fsockopen($host, $port, $errno, $errstr, 10);
+            if ( ($method == 'PURGE') && ($server != '') ) {
+                $fp = @fsockopen($server, $port, $errno, $errstr, 10);
+            } else {
+                $fp = @fsockopen($host, $port, $errno, $errstr, 10);
+            }

             if (!$fp) {
                 return false;
@@ -1543,8 +1553,9 @@
  * @param bool $check_status
  * @return string
  */
-function w3_http_purge($url, $auth = '', $check_status = true) {
-    return w3_http_request('PURGE', $url, null, $auth, $check_status);
+#cd34, 20110721, added server IP
+function w3_http_purge($url, $auth = '', $check_status = true, $server = '') {
+    return w3_http_request('PURGE', $url, null, $auth, $check_status, $server);
 }

 /**
diff -Naur w3-total-cache/lib/W3/PgCache.php w3-total-cache-varnish/lib/W3/PgCache.php
--- w3-total-cache/lib/W3/PgCache.php	2011-06-21 23:22:54.000000000 -0400
+++ w3-total-cache-varnish/lib/W3/PgCache.php	2011-07-21 16:04:07.247499682 -0400
@@ -693,7 +693,9 @@
                     $varnish =& W3_Varnish::instance();

                     foreach ($uris as $uri) {
-                        $varnish->purge($uri);
+#cd34, 20110721 Added $domain_url to build purge hostname
+#                        $varnish->purge($uri);
+                        $varnish->purge($domain_url, $uri);
                     }
                 }
             }
diff -Naur w3-total-cache/lib/W3/Varnish.php w3-total-cache-varnish/lib/W3/Varnish.php
--- w3-total-cache/lib/W3/Varnish.php	2011-06-21 23:22:54.000000000 -0400
+++ w3-total-cache-varnish/lib/W3/Varnish.php	2011-07-21 16:04:52.836919164 -0400
@@ -70,7 +70,7 @@
      * @param string $uri
      * @return boolean
      */
-    function purge($uri) {
+    function purge($domain, $uri) {
         @set_time_limit($this->_timeout);

         if (strpos($uri, '/') !== 0) {
@@ -78,9 +78,11 @@
         }

         foreach ((array) $this->_servers as $server) {
-            $url = sprintf('http://%s%s', $server, $uri);
+#cd34, 20110721, Replaced $server with $domain
+#            $url = sprintf('http://%s%s', $server, $uri);
+            $url = sprintf('%s%s', $domain, $uri);

-            $response = w3_http_purge($url, '', true);
+            $response = w3_http_purge($url, '', true, $server);

             if ($this->_debug) {
                 $this->_log($url, ($response !== false ? 'OK' : 'Bad response code.'));
diff -Naur w3-total-cache/w3-total-cache.php w3-total-cache-varnish/w3-total-cache.php
--- w3-total-cache/w3-total-cache.php	2011-06-21 23:22:54.000000000 -0400
+++ w3-total-cache-varnish/w3-total-cache.php	2011-07-21 15:56:53.275922099 -0400
@@ -2,7 +2,7 @@
 /*
 Plugin Name: W3 Total Cache
 Description: The highest rated and most complete WordPress performance plugin. Dramatically improve the speed and user experience of your site. Add browser, page, object and database caching as well as minify and content delivery network (CDN) to WordPress.
-Version: 0.9.2.3
+Version: 0.9.2.3.v
 Plugin URI: http://www.w3-edge.com/wordpress-plugins/w3-total-cache/
 Author: Frederick Townes
 Author URI: http://www.linkedin.com/in/w3edge
@@ -47,4 +47,4 @@
     require_once W3TC_LIB_W3_DIR . '/Plugin/TotalCache.php';
     $w3_plugin_totalcache = & W3_Plugin_TotalCache::instance();
     $w3_plugin_totalcache->run();
-}
\ No newline at end of file
+}

Gracefully Degrading Site with Varnish and High Load

Saturday, July 16th, 2011

If you run Varnish, you might want to gracefully degrade your site when traffic comes unexpectedly. There are other solutions listed on the net which maintain a Three State Throttle, but, it seemed like this could be done easily within Varnish without needing too many external dependencies.

The first challenge was to figure out how we wanted to handle state. Our backend director is set up with a ‘level1′ backend which doesn’t do any health checks. We need at least one node to never fail the health check since the ‘level2′ and ‘level3′ backends will go offline to signify to Varnish that we need to take action. While this scenario considers the failure mode cascades, i.e. level2 fails, then if things continue to increase load, level3 fails, there is nothing preventing you from having separate failure modes and different VCL for those conditions.

You could have VCL that replaced the front page of your site with ‘top news’ during an event which links to your secondary page. You can rewrite your VCL to handle almost any condition and you don’t need to worry about doing a VCL load to update the configuration.

While maintaining three configurations is easier, there are a few extra points of failure added in that system. If the load on the machine gets too high and the cron job or daemon that is supposed to update the VCL doesn’t run quickly enough or has issues with network congestion talking with Varnish, your site could run in a degraded mode much longer than needed. With this solution, in the event that there is too much network congestion or too much load for the backend to respond, Varnish automatically considers that a level3 failure and enacts those rules – without the backend needing to acknowledge the problem.

The basics

First, we set up the script that Varnish will probe. The script doesn’t need to be php and only needs to respond with an error 404 to signify to Varnish that probe request has failed.

<?php
$level = $_SERVER['QUERY_STRING'];
$load = file_get_contents('/proc/loadavg') * 1;
if ( ($level == 2) and ($load > 10) ) {
  header("HTTP/1.0 404 Get the bilge pumps working!");
}
if ( ($level == 3) and ($load > 20) ) {
  header("HTTP/1.0 404 All hands abandon ship");
}
?>

Second, we need to have our backend pool configured to call our probe script:

backend level1 {
  .host = "66.55.44.216";
  .port = "80";
}
backend level2 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?2";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}
backend level3 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?3";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}

director crisis random {
  {
# base that should always respond so we don't get an Error 503
    .backend = level1;
    .weight = 1;
  }
  {
    .backend = level2;
    .weight = 1;
  }
  {
    .backend = level3;
    .weight = 1;
  }
}

Since both of our probes go to the same backend, it doesn’t matter which director we use or what weight we assign. We just need to have one backend configured that won’t fail the probe along with our level2 and level3 probes. In this example, when the load on the server is greater than 10, it triggers a level2 failure. If the load is greater than 20, it triggers a level3 failure.

In this case, when the backend probe request fails, we just rewrite the URL. Any VCL can be added, but, you will have some duplication. Since the VCL is compiled into the Varnish server, it should have negligible performance impact.

sub vcl_recv {
  set req.backend = level2;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level2.php";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level3.php";
  }
  set req.backend = crisis;
}

In this case, when we have a level2 failure, we change any URL requested to serve the file /level2.php. In vcl_fetch, we make a few changes to the object ttl so that we prevent the backend from getting hit too hard. We also change the server name so that we can look at the headers to see what level our server is currently running. In Firefox, there is an extension called Header Spy which will allow you to keep track of a header. Often times I’ll track X-Cache which I set to HIT or MISS to make sure Varnish is caching, but, you could also track Server and be aware of whether things are running properly.

sub vcl_fetch {
  set beresp.ttl = 0s;

  set req.backend = level2;
  if (!req.backend.healthy) {
    set beresp.ttl = 5m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 2 - Warning)";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    set beresp.ttl = 30m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 3 - Critical)";
  }

At this point, we’ve got a system that degrades gracefully, even if the backend cannot respond or update Varnish’s VCL and it self-heals based on the load checks. Ideally you’ll also want to put Grace timers and possibly run Saint mode to handle significant failures, but, this should help your system protect itself from meltdown.

Complete VCL

backend level1 {
  .host = "66.55.44.216";
  .port = "80";
}
backend level2 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?2";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}
backend level3 {
  .host = "66.55.44.216";
  .port = "80";
  .probe = {
    .url = "/load.php?3";
    .timeout = 0.3 s;
    .window = 3;
    .threshold = 3;
    .initial = 3;
  }
}

director crisis random {
  {
# base that should always respond so we don't get an Error 503
    .backend = level1;
    .weight = 1;
  }
  {
    .backend = level2;
    .weight = 1;
  }
  {
    .backend = level3;
    .weight = 1;
  }
}

sub vcl_recv {
  set req.backend = level2;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level2.php";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    unset req.http.cookie;
    set req.url = "/level3.php";
  }
  set req.backend = crisis;
}

sub vcl_fetch {
  set beresp.ttl = 0s;

  set req.backend = level2;
  if (!req.backend.healthy) {
    set beresp.ttl = 5m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 2 - Warning)";
  }
  set req.backend = level3;
  if (!req.backend.healthy) {
    set beresp.ttl = 30m;
    unset beresp.http.set-cookie;
    set beresp.http.Server = "(Level 3 - Critical)";
  }

  if (req.url ~ "\.(gif|jpe?g|png|swf|css|js|flv|mp3|mp4|pdf|ico)(\?.*|)$") {
    set beresp.ttl = 365d;
  }
}

Updated WordPress VCL – still not complete, but, closer

Saturday, July 16th, 2011

Worked with a new client this week and needed to get the VCL working for their installation. They were running W3TC, but, this VCL should work for people running WP-Varnish or any plugin that allows Purging. This VCL is for Varnish 2.x.

There are still some tweaks, but, this appears to be working quite well.

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

acl purge {
    "10.0.1.100";
    "10.0.1.101";
    "10.0.1.102";
    "10.0.1.103";
    "10.0.1.104";
}

sub vcl_recv {
 if (req.request == "PURGE") {
   if (!client.ip ~ purge) {
     error 405 "Not allowed.";
   }
   return(lookup);
 }

  if (req.http.Accept-Encoding) {
#revisit this list
    if (req.url ~ "\.(gif|jpg|jpeg|swf|flv|mp3|mp4|pdf|ico|png|gz|tgz|bz2)(\?.*|)$") {
      remove req.http.Accept-Encoding;
    } elsif (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
      set req.http.Accept-Encoding = "deflate";
    } else {
      remove req.http.Accept-Encoding;
    }
  }
  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    unset req.http.cookie;
    set req.url = regsub(req.url, "\?.*$", "");
  }
  if (req.http.cookie) {
    if (req.http.cookie ~ "(wordpress_|wp-settings-)") {
      return(pass);
    } else {
      unset req.http.cookie;
    }
  }
}

sub vcl_fetch {
# this conditional can probably be left out for most installations
# as it can negatively impact sites without purge support. High
# traffic sites might leave it, but, it will remove the WordPress
# 'bar' at the top and you won't have the post 'edit' functions onscreen.
  if ( (!(req.url ~ "(wp-(login|admin)|login)")) || (req.request == "GET") ) {
    unset beresp.http.set-cookie;
# If you're not running purge support with a plugin, remove
# this line.
    set beresp.ttl = 5m;
  }
  if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
    set beresp.ttl = 365d;
  }
}

sub vcl_deliver {
# multi-server webfarm? set a variable here so you can check
# the headers to see which frontend served the request
#   set resp.http.X-Server = "server-01";
   if (obj.hits > 0) {
     set resp.http.X-Cache = "HIT";
   } else {
     set resp.http.X-Cache = "MISS";
   }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "OK";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not cached";
  }
}