2015-10-25 19:08:18 +00:00
|
|
|
|
Introduction
|
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
|
|
Should you use it?
|
|
|
|
|
|
..................
|
|
|
|
|
|
|
|
|
|
|
|
Django-cachalot is the perfect speedup tool for most Django projects.
|
|
|
|
|
|
It will speedup a website of 100 000 visits per month without any problem.
|
|
|
|
|
|
In fact, **the more visitors you have, the faster the website becomes**.
|
|
|
|
|
|
That’s because every possible SQL query on the project ends up being cached.
|
|
|
|
|
|
|
|
|
|
|
|
Django-cachalot is especially efficient in the Django administration website
|
|
|
|
|
|
since it’s unfortunately badly optimised (use foreign keys in list_editable
|
|
|
|
|
|
if you need to be convinced).
|
|
|
|
|
|
|
|
|
|
|
|
However, it’s not suited for projects where there is **a high number
|
|
|
|
|
|
of modifications per minute** on each table, like a social network with
|
|
|
|
|
|
more than a 30 messages per minute. Django-cachalot may still give a small
|
|
|
|
|
|
speedup in such cases, but it may also slow things a bit
|
|
|
|
|
|
(in the worst case scenario, a 20% slowdown,
|
|
|
|
|
|
according to :ref:`the benchmark <Benchmark>`).
|
|
|
|
|
|
If you have a website like that, optimising your SQL database and queries
|
|
|
|
|
|
is the number one thing you have to do.
|
|
|
|
|
|
|
|
|
|
|
|
There is also an obvious case where you don’t need django-cachalot:
|
|
|
|
|
|
when the project is already fast enough (all pages load in less than 300 ms).
|
|
|
|
|
|
Like any other dependency, django-cachalot is a potential source of problems
|
|
|
|
|
|
(even though it’s currently bug free).
|
|
|
|
|
|
Don’t use dependencies you can avoid, a “future you” may thank you for that.
|
|
|
|
|
|
|
|
|
|
|
|
Features
|
|
|
|
|
|
........
|
|
|
|
|
|
|
|
|
|
|
|
- **Saves in cache the results of any SQL query** generated by the Django ORM
|
|
|
|
|
|
that reads data. These saved results are then returned instead
|
|
|
|
|
|
of executing the same SQL query, which is faster.
|
|
|
|
|
|
- The first time a query is executed is about 10% slower, then the following
|
|
|
|
|
|
times are way faster (7× faster being the average).
|
|
|
|
|
|
- Automatically invalidates saved results,
|
|
|
|
|
|
so that **you never get stale results**.
|
|
|
|
|
|
- **Invalidates per table, not per object**: if you change an object,
|
|
|
|
|
|
all the queries done on other objects of the same model are also invalidated.
|
|
|
|
|
|
This is unfortunately technically impossible to make a reliable
|
|
|
|
|
|
per-object cache. Don’t be fooled by packages pretending having
|
|
|
|
|
|
that per-object feature, they are unreliable and dangerous for your data.
|
|
|
|
|
|
- **Handles everything in the ORM**. You can use the most advanced features
|
|
|
|
|
|
from the ORM without a single issue, django-cachalot is extremely robust.
|
|
|
|
|
|
- An easy control thanks to :ref:`settings` and :ref:`a simple API <API>`.
|
|
|
|
|
|
But that’s only required if you have a complex infrastructure. Most people
|
|
|
|
|
|
will never use settings or the API.
|
|
|
|
|
|
- A few bonus features like
|
|
|
|
|
|
:ref:`a signal triggered at each database change <Signal>`
|
|
|
|
|
|
(including bulk changes) and
|
|
|
|
|
|
:ref:`a template tag for a better template fragment caching <Template tag>`.
|
|
|
|
|
|
|
|
|
|
|
|
Comparison with similar tools
|
|
|
|
|
|
.............................
|
|
|
|
|
|
|
|
|
|
|
|
This comparison was done in October 2015. It compares django-cachalot
|
|
|
|
|
|
to the other popular automatic ORM caches at the moment:
|
|
|
|
|
|
`django-cache-machine <https://github.com/django-cache-machine/django-cache-machine>`_
|
|
|
|
|
|
& `django-cacheops <https://github.com/Suor/django-cacheops>`_.
|
|
|
|
|
|
|
|
|
|
|
|
Features
|
|
|
|
|
|
~~~~~~~~
|
|
|
|
|
|
|
|
|
|
|
|
======================================================== ========= ============= =========
|
|
|
|
|
|
Feature cachalot cache-machine cacheops
|
|
|
|
|
|
======================================================== ========= ============= =========
|
|
|
|
|
|
Easy to install ✔ ✘ quite
|
|
|
|
|
|
Cache agnostic ✔ ✔ ✘
|
2015-10-25 19:46:38 +00:00
|
|
|
|
Type of invalidation per table per object per table
|
|
|
|
|
|
CPU & memory performance optimal bad terrible
|
2015-10-25 19:08:18 +00:00
|
|
|
|
Reliable ✔ ✘ quite
|
|
|
|
|
|
Handles ``QuerySet.count`` ✔ ✘ ✔
|
|
|
|
|
|
Handles empty queries ✔ ✘ ✔
|
|
|
|
|
|
Handles multi-table inheritance ✔ probably not ✘
|
|
|
|
|
|
Handles proxy models ✔ ✘ ✔
|
|
|
|
|
|
Handles many-to-many fields ✔ ✘ ✔
|
|
|
|
|
|
Handles transactions ✔ probably not ✘
|
|
|
|
|
|
Handles ``QuerySet.aggregate``/``annotate`` ✔ probably not ✘
|
|
|
|
|
|
Handles ``QuerySet.bulk_create``/``update``/``delete`` ✔ probably not ✘
|
|
|
|
|
|
Handles ``QuerySet.select_related``/``prefetch_related`` ✔ partially ✘
|
|
|
|
|
|
Handles ``cursor.execute`` ✔ ✘ ✘
|
|
|
|
|
|
Handles GeoDjango ✔ maybe ✔
|
|
|
|
|
|
Handles django.contrib.postgres ✔ maybe partially
|
|
|
|
|
|
======================================================== ========= ============= =========
|
|
|
|
|
|
|
|
|
|
|
|
To find if a package supports a feature, I searched in the documentation,
|
|
|
|
|
|
the issues, the tests and the code.
|
|
|
|
|
|
I really tried to avoid writing “maybe”, “probably not”, etc.
|
|
|
|
|
|
Unfortunately, the absence of tests for such cases and sometimes the confusion
|
|
|
|
|
|
of the authors themselves about these features makes it difficult to know
|
|
|
|
|
|
whether they support a feature or not.
|
|
|
|
|
|
|
|
|
|
|
|
Explanations
|
|
|
|
|
|
~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
|
|
Of course, I can’t just throw a table with such
|
|
|
|
|
|
“Reliable” and “CPU & memory performance” lines without explanation.
|
|
|
|
|
|
My goal is not to start another stupid open source conflict, nor
|
|
|
|
|
|
to be pretentious about my work. I’m just trying to inform users here, so they
|
|
|
|
|
|
can fully grasp the consequences of using one or another tool.
|
|
|
|
|
|
I actually used django-cache-machine in production for a week
|
|
|
|
|
|
and django-cacheops for a month. On both solutions, I faced a lot
|
|
|
|
|
|
of invalidation issues, and the bigger the cache became,
|
|
|
|
|
|
the worst the performance was.
|
|
|
|
|
|
|
|
|
|
|
|
I now know the reason of these issues: in short, this is due to
|
|
|
|
|
|
their invalidation systems. Read the following paragraphs for more detail.
|
|
|
|
|
|
|
|
|
|
|
|
django-cache-machine
|
|
|
|
|
|
''''''''''''''''''''
|
|
|
|
|
|
|
|
|
|
|
|
django-cache-machine is using “flush lists” to remember which SQL queries are
|
|
|
|
|
|
linked to which objects. This is the approach I chose when I created
|
|
|
|
|
|
a prototype of django-cachalot, except it was invalidated per table,
|
|
|
|
|
|
not per object like django-cache-machine does. Unfortunately, there are several
|
|
|
|
|
|
important issues due to this approach that lead me to drop it.
|
|
|
|
|
|
|
|
|
|
|
|
The smaller issue is that each time you execute a new SQL query,
|
|
|
|
|
|
django-cache-machine needs to fetch the “flush list” from the cache,
|
|
|
|
|
|
update it and add it back to the cache. This means we have to make two
|
|
|
|
|
|
cache calls in addition of the cache call to store the SQL query results.
|
2015-10-25 19:34:27 +00:00
|
|
|
|
It may seem tiny, but when your cache size increases,
|
2015-10-25 19:08:18 +00:00
|
|
|
|
the “flush lists” start becoming huge (a list of hundreds of cache keys
|
|
|
|
|
|
for each database object), leading to an exponentially growing cache size
|
2015-10-25 19:34:27 +00:00
|
|
|
|
and a longer time to fetch the always-growing “flush lists”.
|
|
|
|
|
|
So **bad memory and CPU usage when reading data**.
|
2015-10-25 19:08:18 +00:00
|
|
|
|
|
|
|
|
|
|
The second issue is only linked to the per object invalidation.
|
|
|
|
|
|
When django-cache-machine invalidates an object, it also needs to invalidate
|
|
|
|
|
|
the queries of the related objects, otherwise they may contain stale data.
|
|
|
|
|
|
Django-cache-machine invalidates foreign keys only, not many-to-many
|
2015-10-25 19:34:27 +00:00
|
|
|
|
or generic foreign keys (because… I don’t know). **This degrades performance
|
|
|
|
|
|
of each writing operation to the database**, because it needs to fetch
|
2015-10-25 19:08:18 +00:00
|
|
|
|
related objects, fetch “flush lists” and delete these cache keys.
|
|
|
|
|
|
And of course it can’t invalidate basic queries such as count or empty queries
|
|
|
|
|
|
(probably aggregations too, but I’m not sure).
|
|
|
|
|
|
|
|
|
|
|
|
And at last but not least: a critical issue. It simply proves that the
|
|
|
|
|
|
django-cache-machine team **doesn’t know how caches work**.
|
|
|
|
|
|
Caches are fast because they are stupid: when your cache is full and
|
|
|
|
|
|
needs room, it randomly fetches a few keys, selects the older ones if possible
|
|
|
|
|
|
then deletes them. This means that **a cache key with a 1 year timeout
|
|
|
|
|
|
can be deleted before a cache key with a 1 minute timeout**.
|
|
|
|
|
|
But django-cache-machine assumes its “flush lists” will always stay longer
|
|
|
|
|
|
in cache than the saved query results will, because they have the same timeout
|
|
|
|
|
|
and “flush list” are saved a few milli-seconds after query results.
|
|
|
|
|
|
Until the cache is full, this is kind of true because no cache key is deleted.
|
|
|
|
|
|
But when it is full, the “flush list” can be removed at any moment,
|
|
|
|
|
|
so the other cache keys will never be invalidated until they are deleted.
|
|
|
|
|
|
|
|
|
|
|
|
**To sum up, django-cache-machine has bad memory and CPU performance
|
|
|
|
|
|
and is absolutely not reliable.**
|
|
|
|
|
|
|
|
|
|
|
|
django-cacheops
|
|
|
|
|
|
'''''''''''''''
|
|
|
|
|
|
|
|
|
|
|
|
django-cacheops uses
|
|
|
|
|
|
`a debug feature from Redis, KEYS, <http://redis.io/commands/KEYS>`_
|
|
|
|
|
|
to invalidate cache keys (that’s why it only supports Redis).
|
|
|
|
|
|
It’s a feature that becomes linearly slower as your cache size grows.
|
|
|
|
|
|
I measured, one single call of this command by django-cacheops
|
|
|
|
|
|
slows down any database save by 50 ms to 3.5 seconds,
|
|
|
|
|
|
depending on your database and cache sizes.
|
|
|
|
|
|
The problem is also that django-cacheops runs this command several times
|
2015-10-26 19:51:55 +00:00
|
|
|
|
at each save. Suppose you have a model with 3 many-to-many and you save
|
2015-10-25 19:34:27 +00:00
|
|
|
|
an object with 3 related objects per many-to-many. django-cacheops
|
|
|
|
|
|
will therefore run the Redis ``KEYS`` command at least 10 times! If you have
|
2015-10-25 19:08:18 +00:00
|
|
|
|
a large cache and database, it means **you can wait 30 seconds
|
|
|
|
|
|
while this object is saved!**
|
|
|
|
|
|
|
|
|
|
|
|
Another bad consequence of that use of the ``KEYS`` command is that Redis jumps
|
|
|
|
|
|
to a 100% CPU usage when the command is running, degrading performance for
|
|
|
|
|
|
other users or even blocking them until the command is finished.
|
|
|
|
|
|
|
|
|
|
|
|
In a general way, the workflow of django-cacheops is totally unoptimised.
|
|
|
|
|
|
When an object is modified, an ``invalidate_obj`` function is called,
|
|
|
|
|
|
calling an ``invalidate_dict`` function, calling the ``manage.py invalidate``
|
2015-10-25 19:34:27 +00:00
|
|
|
|
command with a serialized version of the object (!?)
|
2015-10-25 19:08:18 +00:00
|
|
|
|
calling an ``invalidate_model`` function that calls the Redis ``KEYS`` command
|
|
|
|
|
|
to get all the cache keys from that model then delete them.
|
|
|
|
|
|
And as I said above, it executes all that N times,
|
|
|
|
|
|
N being the number of related objects to the current object,
|
|
|
|
|
|
even though multiple objects have the same model and we therefore
|
|
|
|
|
|
don’t need to invalidate the model multiple times.
|
|
|
|
|
|
|
2015-10-25 19:34:27 +00:00
|
|
|
|
**To sum up, django-cacheops has a terrible performance when modifying data,
|
|
|
|
|
|
and is reliable on what it handles.**
|
|
|
|
|
|
But you probably need features it doesn’t handle, such as
|
2015-10-25 19:08:18 +00:00
|
|
|
|
transactions (used by Django admin),
|
|
|
|
|
|
multi-table inheritance, or
|
2015-10-25 19:34:27 +00:00
|
|
|
|
``cursor.execute`` (the three features being used by Wagtail and django CMS)…
|
2015-10-25 19:08:18 +00:00
|
|
|
|
|
|
|
|
|
|
Number of lines of code
|
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
|
|
Django-cachalot tries to be as minimalist as possible, while handling most
|
|
|
|
|
|
use cases. Being minimalist is essential to create maintainable projects,
|
|
|
|
|
|
and having a large test suite is essential to get an excellent quality.
|
|
|
|
|
|
The statistics below speak for themselves…
|
|
|
|
|
|
|
|
|
|
|
|
============ ======== ============= ========
|
|
|
|
|
|
Project part cachalot cache-machine cacheops
|
|
|
|
|
|
============ ======== ============= ========
|
|
|
|
|
|
Application 743 843 1662
|
|
|
|
|
|
Tests 3023 659 1491
|
|
|
|
|
|
============ ======== ============= ========
|