Category Archives: Python

Simple PyPi Caching Proxy

Now that PyPi is being accelerated by the Fastly caching network, pip/easy_install already are running faster. However, this can be taken a step further by setting up a simple caching proxy. By caching packages locally (on the machine or in your private network), you don’t have to keep hitting Fastly/PyPi to download them. This is especially useful if you are constantly running builds and/or tests: AKA continuous integration.

Continue reading

Jenkins and Phabricator sitting in a tree

We’ve been using Phabricator for just about a year here at Disqus. It was originally created at Facebook and open sourced in Spring 2011. To sum it up using their own words: “Phabricator is a open source collection of web applications which make it easier to write, review, and share source code.” The small team working on it at Phacility (the SaaS company behind Phabricator) is constantly improving it so it’s on a continuous release cycle.

Jenkins has been used for continuous integration testing here for much longer. I’m not exactly sure for how long since it was setup before I started in September 2011. David Cramer has always been pushing for an ideal continuous integration/deployment system (IE here here) so part of my duties has been to improve what we have to achieve that goal (we’re hiring).

Currently, there isn’t a direct CI hook into Phabricator that is as deep as say Github+Travis. However, with a little script and an simple event listener for Arcanist, we can replicate most of that functionality.

Continue reading

Realtime Postfix stats aggregator

With a user base as large as Disqus‘, there is a ton of new comment and reply notification emails to send. Indubitable there is a user that accidentally (maybe even purposefully) subscribes to extremely hot threads. When they start receiving a stream of emails, their email provider doesn’t appreciate the spike in traffic. They usually show their annoyance by temporarily blocking and then rate limiting our Postfix instance from relaying email to that inbox.

Unfortunately, the only decent Postfix stats aggregators I could find were written in Perl (pflogsumm.pl) and consumed log files for some ad-hoc stats generation. Though, I was quite lazy after trying a few variations and finding the same Perl tools over and over so please leave a comment about your favorite Postfix stats aggregator.

I really didn’t need anything too fancy so I decided to take a stab at it myself. After thinking about it for a few minutes, I decided to try out using Python threading to have a small pool of workers run some regex on a queue of lines from syslog. All the stats are then gathered in a dictionary and either spit out to stdout or there is a VERY simple TCP server thread listening for ‘stats’ or ‘prettystats’ to dump the current cumulative stats as a JSON dictionary. Full readme can be found on the Github page. Best part, it requires no 3rd party libraries.

postfix-stats on Github
postfix-stats on PyPi

View more to see example output… Continue reading

django-celery, eventlet and debugging blocking

I recently wrote a couple Celery tasks that are purely IO bound. So instead of using the default multiprocessing execution pool, I used the Eventlet execution pool. With just a small change in Celery settings, I was off to the races.

Wrong! After some amount of time, it just sits at 100% CPU and no longer processes tasks. Unfortunately, Celery calls monkey_patch a little late when coupled with Django. Django does some magic of its own to various components so monkey_patch needs to be called before Django does any initialization. After a little digging, I found I can just set an environment variable to prevent Celery from doing the monkey patching and at the same time use it to signal manage.py to call monkey_patch before the initialization my Django app.

Continue reading

In Python, thread safe does not mean fork safe

We discovered this in a not so easily reproducible way. Here at Mahalo we use this wonderful distributed task queue called Celery. Upon restarting the celeryd server, rarely (until recently) and only some of the workers would throw MemcachedError during their first task. The cause of the issue would be receiving a “STORED” reponse to a memcached_get (wtf?!) or some other variant of invalid response or an EINPROGRESS (which is the socket is already busy doing something). Now another tidbit, we also use Pylibmc which is a wrapper around the libmemcached C library which we’ve written a custom Django cache interface for.
Continue reading

Beware of using Pylibmc 1.1+ and mod_wsgi

Recently, we attempted to bring up a new web server but to our dismay after it was all said and done, the Django application wouldn’t load. Eventually a “gateway timeout” was thrown with no errors in the wsgi or Apache error log. These types of errors are a pain to debug. What made the mystery even better, was the application ran just fine under the Django dev server!
Continue reading