Beanstalkd:

An Introduction

PHP London, May 6th 2010

Alister Bulman

http://abulman.co.uk

self::about()

self::current()

Scaling PHP-based systems, with related technologies. PHP Developer at Binweevils.com. All the while trying to avoid getting phone calls on weekends...

self::prev()

Programming PHP since 1999, PHP v3, (@11 years ago). Previously in everything from a ZX81, BBC/Electron, 6502 to 8086 Assembler, Pascal and C/C++ before PHP

Programming since 198-cough

What Is A ?

For our purposes, they all describe much the same thing

'Message Queue' is usually more 'enterprise-y, and often has additional'Business Intelligence' within it.

What Is A Job Queue?

(Beanstalkd) is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.
http://kr.github.com/beanstalkd/

Huh?

Why are job queues useful?

Many long running, or intensive tasks could be done in the background

- Background or Parallel Processing


The classic problem is image uploads. Even happy-snap digital cameras can produce an image that is 3-4+ megabytes. You can't just store that and end up resizing it in the browser. You have to resize it, but you also can't do it in the same PHP process that uploaded it - it takes too much time - and memory.

Don't even try to resize 5+ GB of 1080p quality HD video

Use of queues at Flickr
- the most popular use isn't even about resizing images

The Advantages

How It Works

Msmq

We've talked about what you can do - and why you would, lets see how. How would a queue be structured

Queueing via a database


Polling
Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job!

Blocking
I'll wait here till you have something

Dedicated queue systems

There are a number of jobQueue / message queues

Beanstalkd

Many language bindings are available

Quite easy to write, as it's a simple text-based format, ala Memcached

The daemon is equally dumb as a classic MemcacheD. Talking to multiple servers is done from the client-side.

BeanstalkD was written in C for a Facebook app called 'Causes' - which was written in RubyOnRails

Not to be confused with the 'Beanstalkapp.com', which is remote SVN & GIT version control

History

Philotic, Inc. developed beanstalk to improve the response time for the Causes on Facebook application (with over 9.5 million users). Beanstalk decreased the average response time for the most common pages to a tiny fraction of the original, significantly improving the user experience.

Beanstalkd is a series of TUBES

Beanstalkd is a series of TUBES

... The Internet is not something that you just dump something on. It's not a big truck. It's a series of tubes. And if you don't understand, those tubes can be filled and if they are filled, when you put your message in, it gets in line and it's going to be delayed by anyone that puts into that tube enormous amounts of material. - former United States Senator Ted Stevens (R-Alaska)

Killer features

Tubes

Named tubes/queues can be created at will and have messages sent to them

'default' queue is always created and read from, unless overridden

Now _ with _ added _ underlines!

Pick a given set, and stick to them. Make sure you have at least a running worker reading from every tube

watch a tube-name based on the machine's hostname - file uploads onto a specific machine

Priorities

0 to 4,294,967,295 (2^32)

Most urgent: 0

I'd suggest a range from 100 to 1000 - just be consistant

class YourApp_Task {
    const PRI_CHECK_EMAIL   = 500;
    const PRI_PROCESS_IMAGE = 800;
    const PRI_LOGIN_ACTION  = 100;
}
$priority = YourApp_Task::PRI_LOGIN_ACTION;

Delay

Do this after x seconds

Hold a message within the queue before allowing it to be reserved, and acted upon

I want you to do this job - but in at least x seconds time

TTR

How long can a job be reserved (being worked on) before it's retried?

When a job has been reserved. a timer starts counting down from the job's TTR.

lifecycle of a job

   put            reserve               delete
  -----> [READY] ---------> [RESERVED] --------> *poof*

  
Here is a picture with more possibilities:

   put with delay               release with delay
  ----------------> [DELAYED] <------------.
                        |                   |
                        | (time passes)     |
                        |                   |
   put                  v     reserve       |       delete
  -----------------> [READY] ---------> [RESERVED] --------> *poof*
                       ^  ^                |  |
                       |   \  release      |  |
                       |    `-------------'   |
                       |                      |
                       |                      |
                       | kick                 |
                       |            bury      |
                    [BURIED] <---------------'
                       |
                       |  delete
                        `--------> *poof*

What to send through the queue?

64KB default limit.

Queuing / Running

Producer (queues jobs)

<?php
require_once 'pheanstalk_init.php';
$pheanstalk = new Pheanstalk('127.0.0.1');

$pheanstalk->useTube('testtube')
           ->put("job payload goes here\n");

worker (performs jobs)

<?php
require_once 'pheanstalk_init.php';
$pheanstalk = new Pheanstalk('127.0.0.1');

$job = $pheanstalk->watch('testtube')
                  ->ignore('default')
                  ->reserve();

// handwave to run the job
echo $job->getData();

$pheanstalk->delete($job);

Behind the curtains

telnet localhost 11300

It's pretty quick


<?php
require 'pheanstalk_init.php';
$pheanstalk = new Pheanstalk('127.0.0.1');
for ($i = 10000; $i > 0; $i --) {
    $pheanstalk->put("job payload goes here\n");
}

10,000 simple jobs queued:

$ time php -f fillQueue.php

real    0m3.272s
user    0m1.764s
sys     0m0.768s

And to fetch them


<?php
require 'pheanstalk_init.php';
$pheanstalk = new Pheanstalk('127.0.0.1');
for ($i = 10000; $i > 0; $i --)
{
    $job = $pheanstalk->reserve(2);  // reserve - with timeout
    if (! $job) {
       continue;
    }
    $job->getData();
    $pheanstalk->delete($job);
}

10,000 simple jobs de-queued:

$ time php -f getQueue.php

real    0m4.973s
user    0m2.584s
sys     0m1.304s

stats

# telnet localhost 11300
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
stats
OK 814
---
current-jobs-urgent: 7145
current-jobs-ready: 7145
current-jobs-reserved: 0
.....
current-tubes: 1
current-connections: 1
current-producers: 0
current-workers: 0
current-waiting: 0
total-connections: 5
pid: 5756
....

In Summary

Thanks

Questions ?

Beanstalkd on Github: http://kr.github.com/beanstalkd/