Scaling PHP-based systems, with related technologies. PHP Developer at Binweevils.com. All the while trying to avoid getting phone calls on weekends...
Programming PHP since 1999, PHP v3, (@11 years ago). Previously in everything from a ZX81, BBC/Electron, 6502 to 8086 Assembler, Pascal and C/C++ before PHP
Programming since 198-cough
For our purposes, they all describe much the same thing
'Message Queue' is usually more 'enterprise-y, and often has additional'Business Intelligence' within it.
(Beanstalkd) is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.http://kr.github.com/beanstalkd/
It's a big to-do list.
You send it a message, and then can retrieve it later
Sometimes, you won't be pulling jobs out in the same order you that put them in - but we'll come back to that later
Many long running, or intensive tasks could be done in the background
- Background or Parallel Processing
The classic problem is image uploads. Even happy-snap digital cameras can produce an image that is 3-4+ megabytes. You can't just store that and end up resizing it in the browser. You have to resize it, but you also can't do it in the same PHP process that uploaded it - it takes too much time - and memory.
Don't even try to resize 5+ GB of 1080p quality HD video
Use of queues at Flickr
- the most popular use isn't even about resizing images
We've talked about what you can do - and why you would, lets see how. How would a queue be structured
Polling
Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job!
Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza job! Gizza
job! Gizza job! Gizza job! Gizza job! Gizza job!
Blocking
I'll wait here till you have something
Many language bindings are available
Quite easy to write, as it's a simple text-based format, ala Memcached
The daemon is equally dumb as a classic MemcacheD. Talking to multiple servers is done from the client-side.
BeanstalkD was written in C for a Facebook app called 'Causes' - which was written in RubyOnRails
Not to be confused with the 'Beanstalkapp.com', which is remote SVN & GIT version control
Philotic, Inc. developed beanstalk to improve the response time for the Causes on Facebook application (with over 9.5 million users). Beanstalk decreased the average response time for the most common pages to a tiny fraction of the original, significantly improving the user experience.
... The Internet is not something that you just dump something on. It's not a big truck. It's a series of tubes. And if you don't understand, those tubes can be filled and if they are filled, when you put your message in, it gets in line and it's going to be delayed by anyone that puts into that tube enormous amounts of material. - former United States Senator Ted Stevens (R-Alaska)
Named tubes/queues can be created at will and have messages sent to them
'default' queue is always created and read from, unless overridden
Now _ with _ added _ underlines!
Pick a given set, and stick to them. Make sure you have at least a running worker reading from every tube
watch a tube-name based on the machine's hostname - file uploads onto a specific machine
0 to 4,294,967,295 (2^32)
Most urgent: 0
I'd suggest a range from 100 to 1000 - just be consistant
class YourApp_Task { const PRI_CHECK_EMAIL = 500; const PRI_PROCESS_IMAGE = 800; const PRI_LOGIN_ACTION = 100; } $priority = YourApp_Task::PRI_LOGIN_ACTION;
Hold a message within the queue before allowing it to be reserved, and acted upon
I want you to do this job - but in at least x seconds time
How long can a job be reserved (being worked on) before it's retried?
When a job has been reserved. a timer starts counting down from the job's TTR.
put reserve delete -----> [READY] ---------> [RESERVED] --------> *poof* Here is a picture with more possibilities: put with delay release with delay ----------------> [DELAYED] <------------. | | | (time passes) | | | put v reserve | delete -----------------> [READY] ---------> [RESERVED] --------> *poof* ^ ^ | | | \ release | | | `-------------' | | | | | | kick | | bury | [BURIED] <---------------' | | delete `--------> *poof*
64KB default limit.
<?php require_once 'pheanstalk_init.php'; $pheanstalk = new Pheanstalk('127.0.0.1'); $pheanstalk->useTube('testtube') ->put("job payload goes here\n");
<?php require_once 'pheanstalk_init.php'; $pheanstalk = new Pheanstalk('127.0.0.1'); $job = $pheanstalk->watch('testtube') ->ignore('default') ->reserve(); // handwave to run the job echo $job->getData(); $pheanstalk->delete($job);
use <tube>\r\n
put <pri> <delay> <ttr> <bytes>\r\n
<data>\r\n
watch <tube>\r\n
reserve\r\n
Alternatively, if you don't want to have to poll for a new job, you can specify a timeout:
reserve-with-timeout <seconds>\r\n
<?php
require 'pheanstalk_init.php';
$pheanstalk = new Pheanstalk('127.0.0.1');
for ($i = 10000; $i > 0; $i --) {
$pheanstalk->put("job payload goes here\n");
}
10,000 simple jobs queued:
$ time php -f fillQueue.php real 0m3.272s user 0m1.764s sys 0m0.768s
<?php
require 'pheanstalk_init.php';
$pheanstalk = new Pheanstalk('127.0.0.1');
for ($i = 10000; $i > 0; $i --)
{
$job = $pheanstalk->reserve(2); // reserve - with timeout
if (! $job) {
continue;
}
$job->getData();
$pheanstalk->delete($job);
}
10,000 simple jobs de-queued:
$ time php -f getQueue.php real 0m4.973s user 0m2.584s sys 0m1.304s
# telnet localhost 11300 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. stats OK 814 --- current-jobs-urgent: 7145 current-jobs-ready: 7145 current-jobs-reserved: 0 ..... current-tubes: 1 current-connections: 1 current-producers: 0 current-workers: 0 current-waiting: 0 total-connections: 5 pid: 5756 ....
Beanstalkd on Github: http://kr.github.com/beanstalkd/