craig campbell

May 27, 2009 at 12:02 am

I have been so incredibly busy lately that I have not had anytime to write here. Anyway I just wanted to take some time to talk about something cool I have been doing for a project I have been working on outside of work.

By now everyone knows what Memcached is so I won't bore you with an introduction. Basically it is an insanely simple caching system designed to eliminate load on the database.

While in an ideal world you would just throw cache on everything and watch your application scale to infinity and beyond, things aren't always that simple. The problem I'm going to show you today has to do with pagination and cached database queries. I will try to make this pretty straightforward so anyone can follow.

There is a user named Pablo Picasso who is using your site. He makes Cap-Sacs for a living and has received 15 messages from people on the site who are interested in purchasing one from him. This means that page 1 displays messages 6 to 15 (the 10 most recent messages) and page 2 displays 1 to 5.

Let's now imagine you cache these pages of results. A new user comes to your site and wants to buy a pink Cap-Sac. They send Picasso a message so your application says, "Oooh Picasso just got a new message. That's pretty rad. Since the query to get his messages is Cached let's just drop the cache for page 1." Now what happens you ask?

The new message shows up on the top of page 1 as expected so what's the problem?? Picasso now has 16 messages. Page 1 will display messages 7 to 16 which is fine, but since page 2 was cached from before it is going to show messages 1 to 5 still. This means that as long as the cache lives poor message number 6 (the last message on page 1 before the new message was recieved) gets dropped from the result set completely. Not to mention that page 1 thinks Picasso has 16 messages while page 2 still thinks he has 15.

This can also allow duplicates of the same item to show up if for example the user had 22 messages and page 1 and 2 were cached but page 3 wasn't. In this case a new message would cause page 3 and page 1 to be correct but since page 2 is still hitting cache, the last item on page 2 would be the same as the first item on page 3.

So what can we do about this? I see a few options

  1. Don't use memcached for paginated result sets
  2. Live with missing and/or duplicate items from time to time in your pagination
  3. Select everything at once and handle the pagination at the PHP level instead of in the database
  4. Use Pseudo cache namespaces to invalidate all cache keys for paginated queries

And here is my analysis of the options:

Option 1
This seems to be the approach that is often taken where I work, and of course it will work, but what happens if the data rarely changes? In that case it seems rather silly to continue to hit the database over and over for the same data.
Option 2
Surprisingly I don't think option 2 is a terrible option because it will definitely scale pretty well. Facebook search results have duplicate items all over most likely for this very reason, but since, in my example, I am thinking about displaying accurate data to the user I would stay away from this. For search results I think this is a much more viable option.
Option 3
This option is another interesting one. It would arguably scale better than any of the other options I have proposed although the problem here is what if Lady Gaga is using your site and she has 15,000 messages from fans. We don't want to have to select 15,000 items from the database if this were to miss cache.
Option 4
Well considering the title of this post I think you all knew this is the road I was going to go down. You get the performance benefits of caching, and you get 100 percent data accuracy.

The approach here is really quite simple and similar to the example found on the memcached FAQ page.

What I have chosen to do is use the DAO name as my main namespace identifier along with the unique userId. Please understand this is a very crude implementation just to illustrate my idea. It is more or less procedural code so I would not recommend doing it like this. I have been using a Cache_Namespace class to handle common logic such as generating the key and grabbing/setting the generation version/incrementing the version. Anyway here goes:

1 <?php 2 3 /** 4 * this would get called to retrieve a page of a user's messages 5 */ 6 public function getMessages($userId, $pageNumber) 7 { 8 $memcache = new Memcache(); 9 10 // this is the cache key to use to find the version to use 11 $versionKey = 'Message_Dao_' . $userId . '_version'; 12 13 // if there is no cache hit for the version set it to a random value 14 // and set it to never expire 15 if (($version = $memcache->get($versionKey)) === false) { 16 17 // we should use a random number here because otherwise there 18 // is a chance the version key will expire and resetting it to 1 19 // means incrementing it might not invalidate all cache keys 20 // if the query has been cached at version 2 for example 21 $version = rand(1, 9999); 22 $memcache->put($versionKey, $version, 0); 23 } 24 25 // this will be the key we use to actually retrieve the data 26 $queryCacheKey = __METHOD__ . '_' . $userId . '_' . $pageNumber . 27 '_' . $version; 28 29 // if there is no cache hit for this data then we need to run the query 30 if (($records = $memcache->get($queryCacheKey)) === false) { 31 32 // run query here SELECT * FROM blah blah blah blah 33 // grab the result set 34 $records = $sth->fetchAssoc(); 35 36 // store the results in cache 37 $memcache->put($queryCacheKey, $records); 38 } 39 40 return $records; 41 } 42 43 /** 44 * this would get called when a user sends another user a message 45 */ 46 public function sendMessage($message, $toId, $fromId) 47 { 48 // run insert INSERT INTO message blah blah 49 // after adding the message to the database we need to invalidate 50 // all cache keys for the person you are sending it to 51 $versionKey = 'Message_Dao_' . $toId . '_version'; 52 53 $memcache = new Memcache(); 54 55 // we call add here because increment won't increment a value that 56 // does not exist. add() will only set a value if it doesn't exist 57 // otherwise it will return false 58 $memcache->add($versionKey, rand(1, 9999)); 59 $memcache->increment($versionKey); 60 } 61

I tried to comment that code pretty well. As you can see the version is used as part of the query cache key so incrementing its value by 1 will make all calls to that method not hit cache. If you have more than one method you want to invalidate you can give them the same namespace key as seen in line 11.

Okay that is all for now.

April 3, 2009 at 10:41 pm

Just when you thought I had run out of material to write about I am back! I have a few topics still lined up, but I have been so busy that I haven't had time to write anything.

The project I have been working on at work has lots of pagination in it, and the other day I was really frustrated with how difficult it was to do simple pagination and how many lines of code it took in the controller. Here I am going to present to you a pagination class I wrote and a usage example.

I am aware that there is a Zend_Pagination class in Zend Framework, but I'm not crazy about it. It offers some nice view helpers to draw pagination on a page, however, it is over 900 lines long, contains a lot of code you probably will never need, and is missing simple functions that you may need such as determining the page offset or the first and last item on the current page.

Here is the class (download) I have written in its entirety (all of 239 lines including comments):

1 <?php 2 /** 3 * Pager 4 * 5 * @package Pager 6 * @author Craig Campbell <clarity1285@gmail.com> 7 * @version $Id$ 8 */ 9 Class Pager 10 { 11 /** 12 * @var int 13 * 14 * total number of items in the data set 15 */ 16 protected $_totalItems; 17 18 /** 19 * @var int 20 * 21 * total number of pages 22 */ 23 protected $_totalPages; 24 25 /** 26 * @var int 27 * 28 * total number of items per page 29 */ 30 protected $_pageSize; 31 32 /** 33 * @var int 34 * 35 * current page number 36 */ 37 protected $_currentPage; 38 39 /** 40 * @var int 41 * 42 * first item on current page 43 */ 44 protected $_firstOnPage; 45 46 /** 47 * @var int 48 * 49 * last item on current page 50 */ 51 protected $_lastOnPage; 52 53 /** 54 * constructor 55 * 56 * @param int $page page number 57 * @param int $pageSize number of items per page 58 * @return void 59 */ 60 public function __construct($page = 1, $pageSize = 10) 61 { 62 if ($page < 1) { 63 throw new Exception('page cannot be less than 1!'); 64 } 65 $this->_currentPage = $page; 66 67 if ($pageSize < 1) { 68 throw new Exception('page size cannot be less than 1!'); 69 } 70 $this->_pageSize = $pageSize; 71 } 72 73 /** 74 * sets the total number of items in this data set 75 * 76 * usually comes from some kind of count query in the database 77 * 78 * @param int $number 79 * @return void 80 */ 81 public function setTotalItems($number) 82 { 83 $this->_totalItems = $number; 84 85 // the first item on this page will be one above the offset 86 $this->_firstOnPage = $this->getOffset() + 1; 87 88 // total number of pages 89 $this->_totalPages = ceil($number / $this->_pageSize); 90 91 // if the total number of items is greater than the first item 92 // and there are more pages 93 if ($number > $this->_firstOnPage && 94 $this->_totalPages > $this->_currentPage) { 95 96 $this->_lastOnPage = $this->getOffset() + $this->_pageSize; 97 return; 98 } 99 100 $this->_lastOnPage = $number; 101 } 102 103 /** 104 * determine the offset 105 * 106 * @return int 107 */ 108 public function getOffset() 109 { 110 return ($this->_currentPage - 1) * $this->_pageSize; 111 } 112 113 /** 114 * determines if there is a previous page 115 * 116 * @return bool 117 */ 118 public function hasPrevious() 119 { 120 return ($this->_currentPage - 1) > 0; 121 } 122 123 /** 124 * determines if there is a next page 125 * 126 * @return bool 127 */ 128 public function hasNext() 129 { 130 return ($this->_currentPage + 1) <= $this->_totalPages; 131 } 132 133 /** 134 * gets the previous page 135 * 136 * @return int 137 */ 138 public function getPrevious() 139 { 140 if ($this->hasPrevious()) { 141 return $this->_currentPage - 1; 142 } 143 return false; 144 } 145 146 /** 147 * gets the next page 148 * 149 * @return int 150 */ 151 public function getNext() 152 { 153 if ($this->hasNext()) { 154 return $this->_currentPage + 1; 155 } 156 return false; 157 } 158 159 /** 160 * gets the total number of items on the current page 161 * 162 * @return int 163 */ 164 public function getCurrentPageSize() 165 { 166 return $this->_lastOnPage - $this->_firstOnPage + 1; 167 } 168 169 /** 170 * gets the current page size 171 * 172 * @return int 173 */ 174 public function getPageSize() 175 { 176 return $this->_pageSize; 177 } 178 179 /** 180 * gets the total number of pages 181 * 182 * @return int 183 */ 184 public function getTotalPages() 185 { 186 return $this->_totalPages; 187 } 188 189 /** 190 * gets the first item on the current page 191 * 192 * @return int 193 */ 194 public function getFirstOnPage() 195 { 196 return $this->_firstOnPage; 197 } 198 199 /** 200 * gets the last item on the current page 201 * 202 * @return int 203 */ 204 public function getLastOnPage() 205 { 206 return $this->_lastOnPage; 207 } 208 209 /** 210 * gets the current page number 211 * 212 * @return int 213 */ 214 public function getCurrentPage() 215 { 216 return $this->_currentPage; 217 } 218 219 /** 220 * gets the total number of items 221 * 222 * @return int 223 */ 224 public function getTotalItems() 225 { 226 return $this->_totalItems; 227 } 228 229 /** 230 * determines if the current page has content 231 * 232 * @return bool 233 */ 234 public function hasContent() 235 { 236 return $this->_currentPage <= $this->_totalPages; 237 } 238 } 239

Keep in mind this is a rough draft, but it seems to work pretty well. All you need to tell the class is 3 things:

  1. The current page number in the constructor
  2. The number of items you would like to see per page in the constructor
  3. The total number of items that you will be paginating in the setTotalItems() method

Now here is an example of this class in action (controller):

$pager = new Pager($_GET['page'], 15); $dao = new User_Dao(); $users = $dao->getAll($pager);

Here is the method in the User_Dao to get the users from the database:

/** * gets all users * * @param Pager $pager * @return Collection */ public function getAll(Pager $pager) { $query = '/* ' . __METHOD__ . ' */' . "SELECT SQL_CALC_FOUND_ROWS u.id id, u.email email FROM user u ORDER BY u.id LIMIT $pager->getPageSize() OFFSET $pager->getOffset()"; $sth = $this->_db->prepare($query); $sth->execute(); $records = $sth->fetchAllAssoc(); $query = '/* ' . __METHOD__ . ' */' . "SELECT FOUND_ROWS() as total_records"; $totalStatement = $this->_db->prepare($query); $totalStatement->execute(); $record = $totalStatement->fetchAssoc(); $pager->setTotalItems($record['total_records']); return $this->_buildCollection($records); }

There are a few things to note here. SQL_CALC_FOUND_ROWS is a totally awesome mysql function that calculates the number of rows that would be returned if you did not include a LIMIT on the query. This is what is used to tell the pager how many total items there are. Also the DAO here is returning a collection. To see how that works check out this post I wrote a little while ago.

Since PHP passes objects by reference by default we can now use the same Pager object that we instantiated in our controller in our view. That would look something like this:

<h2>Page <?php echo $pager->getCurrentPage(); ?> of <?php echo $pager->getTotalPages(); ?></h2> <h3>Displaying <?php echo $pager->getFirstOnPage(); ?> &mdash; <?php echo $pager->getLastOnPage(); ?> of <?php echo $pager->getTotalItems(); ?> Users</h3> <?php foreach ($users as $user): ?> <dl> <dt>id</dt> <dd><?php echo $user->id; ?></dd> <dt>email</dt> <dd><?php echo $user->email; ?></dd> </dl> <?php endforeach; ?> <?php if ($pager->hasPrevious()): ?> <p><a href="/view-users?page=<?php echo $pager->getPrevious(); ?>">previous</a></p> <?php endif; ?> <?php if ($pager->hasNext()): ?> <p><a href="/view-users?page=<?php echo $pager->getNext(); ?>">next</a></p> <?php endif; ?>

I'm probably forgetting something here, but I think it is a good start and in my opinion it certainly eliminates any headaches caused from having to deal with pagination.

March 11, 2009 at 10:56 pm

One thing I have realized in my journey across the internets is that there is no simple, clear way to make an overlay using only one div (at least that I could find). If you have no clue what I am talking about click on the "add comment" link, and you will see. The page dims, the curtain goes up and the audience applauds.

Just a small disclaimer: The CSS that I am about to show you will work just fine in all browsers except for Internet Explorer 6 (what a shock!). I will show you the necessary Javascript (using jQuery) needed to fix this.

The first step is to simply add one div inside the body tag like so:

<body> <div id="overlay"><!-- this is the overlay :) --></div> </body>

The next step is to add the following to your stylesheet:

#overlay { display: none; position: fixed; top: 0px; left: 0px; width: 100%; height: 100%; z-index: 1000; background-color: #333; -moz-opacity: 0.5; opacity: .50; filter: progid:DXImageTransform.Microsoft.Alpha(opacity=50); }

To go through this step by step:

display: none
This simply means don't show this div. The reason is obvious since we don't want our entire page to load with a colour on top of it. One thing to note is that we will have to use javascript to set the display here to block or something other than none when we want this to show up on the page.
position: fixed
This means that the top left point of the div is fixed at point (0,0) or whatever point you specify. This means intelligent browsers can scale the width and height to cover the whole window
z-index: 1000
You can think of the z-index as if you have many sheets of paper on top of another. Each layer is one above the previous. Unfortunately it doesn't always work as advertised. Setting this at 1000 just ensures that it is above all content on the page although it can really be any number greater than zero.
-moz-opacity: 0.5
This is a special css property that is specific to Mozilla Firefox. Like any of the other opacity settings: play around with it until it looks how you want it to.
opacity: .50
This is the default opacity property in css. Safari uses this.
filter: progid:DXImageTransform.Microsoft.Alpha(opacity=50)
This whole nonsense is specifically for Internet Explorer 7. As you can see Microsoft so slyly inserted their name here.

Now you may be thinking, "That's it? We are done? That was so simple!". If only the world were that simple. We still have to take Internet Explorer 6 into account.

I'm pretty sure if Internet Explorer 6 were a child in elementary school it would be the last kid picked in gym class. Hmmm ... Actually it might be the kid that skipped gym class all together and was in the nurse's office being treated for its terrible diseases.

Anyway, the first thing we have to do is create a stylesheet specifically for IE6. To do that we use a CSS conditional statement in our html like so:

<!--[if IE 6]> <link href="overlay_ie6.css" rel="stylesheet" type="text/css" /> <![endif]-->

Now inside of overlay_ie6.css we add the following styles:

#overlay { position: absolute; filter: alpha(opacity=50); }

This is because IE6 does not like position: fixed or the Microsoft filter used for IE7. Therefore we must do this so that it will begin to play nicely. The final step is adding a few lines of Javascript only for IE6. I am using jQuery cause it is my framework of choice and is very speedy and light weight. This is definitely a hack, and I do not really like hacks so if anyone has come across this problem before and has a more elegant solution I urge you to leave me a comment.

if (jQuery.browser.msie && jQuery.browser.version == 6) { var width = $(window).width(); var height = $(window).height(); var pageHeight = document.body.clientHeight; if (pageHeight > height) { height = pageHeight; } $("#overlay").css({width: width, height: height}); }

This code is fairly straight forward, but basically what it is doing is figuring out a pixel width and pixel height to set the overlay to for IE6. IE6 does not like using percentages in this case. First we get the height and width of the current window. Then we get the height of the window with content (assuming all of the horizontal content fits within the width of the window - no side scrolling). Whichever height is greater is the height we use at the end to explicitly set the height and width of the #overlay div.

That's all for now!

February 26, 2009 at 1:37 am

I dare anyone to tell me a better way to spend the evening after an 11 hour workday staring at code than writing a blogpost about code. Haha, you'll have to forgive me if any of my writing seems scatterbrained that is most likely because my head is spinning like one of those amusement park rides that goes completely upside down and then stays there for 10 seconds before moving again.

Anyway I have only written a couple serious posts here and already I have had a few people ask me how I did my syntax highlighting. It is no secret online syntax highlighting solutions are not super amazing. They are pretty good, sure, but not awesome. That is where TextMate comes in. TextMate has a little known feature found in the Bundles menu under TextMate: Create HTML From Document.

textmate menu

That's it! Well not quite. The HTML TextMate generates is not perfect. For one it uses the <pre> tag which has been deprecated in XHTML 1.0 Strict. To fix that is easy just replace <pre with <div in the html as well as the styles. The next thing to do is move your styles out into their own stylesheet. One bizarre thing is that any user generated colours will appear in your stylesheet in the format color: rgba(255, 0, 9, 0.87); which will not work in Internet Explorer so you'll have to convert those to their hex values.

Also removing the <pre> tag will break the code in Internet Explorer:

pre.textmate-source { margin: 0; padding: 0 0 0 2px; font-family: Monaco, monospace; font-size: 12px; line-height: 1.3em; word-wrap: break-word; white-space: pre; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -o-pre-wrap; }

The fix is super simple, however. This is the one I am using on this site:

.textmate-source { margin-top: 30px; margin-bottom: 30px; font-family: "Monaco", monospace; font-size: 11.5px; line-height: 1.3em; word-wrap: break-word; white-space: pre; } .limit-height { height: 400px; overflow: scroll; }

I added the extra margin in there just to give it some breathing room. Also if you want to make a scrollable box you can just add the "limit-height" class to the main div that holds your code. Another thing to note is that for some reason the "line-num" span class is in single quotes. It still validates, but you might as well make it double quotes for consistency. I am probably forgetting something, but I can barely keep my eyes open at the moment so I will add it if I think of it, but yeah very cool. Enjoy!