mirror of
https://github.com/Hopiu/django-cachalot.git
synced 2026-05-10 21:53:12 +00:00
209 lines
11 KiB
ReStructuredText
209 lines
11 KiB
ReStructuredText
Introduction
|
||
------------
|
||
|
||
Should you use it?
|
||
..................
|
||
|
||
Django-cachalot is the perfect speedup tool for most Django projects.
|
||
It will speedup a website of 100 000 visits per month without any problem.
|
||
In fact, **the more visitors you have, the faster the website becomes**.
|
||
That’s because every possible SQL query on the project ends up being cached.
|
||
|
||
Django-cachalot is especially efficient in the Django administration website
|
||
since it’s unfortunately badly optimised (use foreign keys in list_editable
|
||
if you need to be convinced).
|
||
|
||
However, it’s not suited for projects where there is **a high number
|
||
of modifications per minute** on each table, like a social network with
|
||
more than a 30 messages per minute. Django-cachalot may still give a small
|
||
speedup in such cases, but it may also slow things a bit
|
||
(in the worst case scenario, a 20% slowdown,
|
||
according to :ref:`the benchmark <Benchmark>`).
|
||
If you have a website like that, optimising your SQL database and queries
|
||
is the number one thing you have to do.
|
||
|
||
There is also an obvious case where you don’t need django-cachalot:
|
||
when the project is already fast enough (all pages load in less than 300 ms).
|
||
Like any other dependency, django-cachalot is a potential source of problems
|
||
(even though it’s currently bug free).
|
||
Don’t use dependencies you can avoid, a “future you” may thank you for that.
|
||
|
||
Features
|
||
........
|
||
|
||
- **Saves in cache the results of any SQL query** generated by the Django ORM
|
||
that reads data. These saved results are then returned instead
|
||
of executing the same SQL query, which is faster.
|
||
- The first time a query is executed is about 10% slower, then the following
|
||
times are way faster (7× faster being the average).
|
||
- Automatically invalidates saved results,
|
||
so that **you never get stale results**.
|
||
- **Invalidates per table, not per object**: if you change an object,
|
||
all the queries done on other objects of the same model are also invalidated.
|
||
This is unfortunately technically impossible to make a reliable
|
||
per-object cache. Don’t be fooled by packages pretending having
|
||
that per-object feature, they are unreliable and dangerous for your data.
|
||
- **Handles everything in the ORM**. You can use the most advanced features
|
||
from the ORM without a single issue, django-cachalot is extremely robust.
|
||
- An easy control thanks to :ref:`settings` and :ref:`a simple API <API>`.
|
||
But that’s only required if you have a complex infrastructure. Most people
|
||
will never use settings or the API.
|
||
- A few bonus features like
|
||
:ref:`a signal triggered at each database change <Signal>`
|
||
(including bulk changes) and
|
||
:ref:`a template tag for a better template fragment caching <Template tag>`.
|
||
|
||
Comparison with similar tools
|
||
.............................
|
||
|
||
This comparison was done in October 2015. It compares django-cachalot
|
||
to the other popular automatic ORM caches at the moment:
|
||
`django-cache-machine <https://github.com/django-cache-machine/django-cache-machine>`_
|
||
& `django-cacheops <https://github.com/Suor/django-cacheops>`_.
|
||
|
||
Features
|
||
~~~~~~~~
|
||
|
||
======================================================== ========= ============= =========
|
||
Feature cachalot cache-machine cacheops
|
||
======================================================== ========= ============= =========
|
||
Easy to install ✔ ✘ quite
|
||
Cache agnostic ✔ ✔ ✘
|
||
Type of invalidation per table per object per table
|
||
CPU & memory performance optimal bad terrible
|
||
Reliable ✔ ✘ quite
|
||
Handles ``QuerySet.count`` ✔ ✘ ✔
|
||
Handles empty queries ✔ ✘ ✔
|
||
Handles multi-table inheritance ✔ probably not ✘
|
||
Handles proxy models ✔ ✘ ✔
|
||
Handles many-to-many fields ✔ ✘ ✔
|
||
Handles transactions ✔ probably not ✘
|
||
Handles ``QuerySet.aggregate``/``annotate`` ✔ probably not ✘
|
||
Handles ``QuerySet.bulk_create``/``update``/``delete`` ✔ probably not ✘
|
||
Handles ``QuerySet.select_related``/``prefetch_related`` ✔ partially ✘
|
||
Handles ``cursor.execute`` ✔ ✘ ✘
|
||
Handles GeoDjango ✔ maybe ✔
|
||
Handles django.contrib.postgres ✔ maybe partially
|
||
======================================================== ========= ============= =========
|
||
|
||
To find if a package supports a feature, I searched in the documentation,
|
||
the issues, the tests and the code.
|
||
I really tried to avoid writing “maybe”, “probably not”, etc.
|
||
Unfortunately, the absence of tests for such cases and sometimes the confusion
|
||
of the authors themselves about these features makes it difficult to know
|
||
whether they support a feature or not.
|
||
|
||
Explanations
|
||
~~~~~~~~~~~~
|
||
|
||
Of course, I can’t just throw a table with such
|
||
“Reliable” and “CPU & memory performance” lines without explanation.
|
||
My goal is not to start another stupid open source conflict, nor
|
||
to be pretentious about my work. I’m just trying to inform users here, so they
|
||
can fully grasp the consequences of using one or another tool.
|
||
I actually used django-cache-machine in production for a week
|
||
and django-cacheops for a month. On both solutions, I faced a lot
|
||
of invalidation issues, and the bigger the cache became,
|
||
the worst the performance was.
|
||
|
||
I now know the reason of these issues: in short, this is due to
|
||
their invalidation systems. Read the following paragraphs for more detail.
|
||
|
||
django-cache-machine
|
||
''''''''''''''''''''
|
||
|
||
django-cache-machine is using “flush lists” to remember which SQL queries are
|
||
linked to which objects. This is the approach I chose when I created
|
||
a prototype of django-cachalot, except it was invalidated per table,
|
||
not per object like django-cache-machine does. Unfortunately, there are several
|
||
important issues due to this approach that lead me to drop it.
|
||
|
||
The smaller issue is that each time you execute a new SQL query,
|
||
django-cache-machine needs to fetch the “flush list” from the cache,
|
||
update it and add it back to the cache. This means we have to make two
|
||
cache calls in addition of the cache call to store the SQL query results.
|
||
It may seem tiny, but when your cache size increases,
|
||
the “flush lists” start becoming huge (a list of hundreds of cache keys
|
||
for each database object), leading to an exponentially growing cache size
|
||
and a longer time to fetch the always-growing “flush lists”.
|
||
So **bad memory and CPU usage when reading data**.
|
||
|
||
The second issue is only linked to the per object invalidation.
|
||
When django-cache-machine invalidates an object, it also needs to invalidate
|
||
the queries of the related objects, otherwise they may contain stale data.
|
||
Django-cache-machine invalidates foreign keys only, not many-to-many
|
||
or generic foreign keys (because… I don’t know). **This degrades performance
|
||
of each writing operation to the database**, because it needs to fetch
|
||
related objects, fetch “flush lists” and delete these cache keys.
|
||
And of course it can’t invalidate basic queries such as count or empty queries
|
||
(probably aggregations too, but I’m not sure).
|
||
|
||
And at last but not least: a critical issue. It simply proves that the
|
||
django-cache-machine team **doesn’t know how caches work**.
|
||
Caches are fast because they are stupid: when your cache is full and
|
||
needs room, it randomly fetches a few keys, selects the older ones if possible
|
||
then deletes them. This means that **a cache key with a 1 year timeout
|
||
can be deleted before a cache key with a 1 minute timeout**.
|
||
But django-cache-machine assumes its “flush lists” will always stay longer
|
||
in cache than the saved query results will, because they have the same timeout
|
||
and “flush list” are saved a few milli-seconds after query results.
|
||
Until the cache is full, this is kind of true because no cache key is deleted.
|
||
But when it is full, the “flush list” can be removed at any moment,
|
||
so the other cache keys will never be invalidated until they are deleted.
|
||
|
||
**To sum up, django-cache-machine has bad memory and CPU performance
|
||
and is absolutely not reliable.**
|
||
|
||
django-cacheops
|
||
'''''''''''''''
|
||
|
||
django-cacheops uses
|
||
`a debug feature from Redis, KEYS, <http://redis.io/commands/KEYS>`_
|
||
to invalidate cache keys (that’s why it only supports Redis).
|
||
It’s a feature that becomes linearly slower as your cache size grows.
|
||
I measured, one single call of this command by django-cacheops
|
||
slows down any database save by 50 ms to 3.5 seconds,
|
||
depending on your database and cache sizes.
|
||
The problem is also that django-cacheops runs this command several times
|
||
at each save. Suppose you have a model with 3 many-to-many and you save
|
||
an object with 3 related objects per many-to-many. django-cacheops
|
||
will therefore run the Redis ``KEYS`` command at least 10 times! If you have
|
||
a large cache and database, it means **you can wait 30 seconds
|
||
while this object is saved!**
|
||
|
||
Another bad consequence of that use of the ``KEYS`` command is that Redis jumps
|
||
to a 100% CPU usage when the command is running, degrading performance for
|
||
other users or even blocking them until the command is finished.
|
||
|
||
In a general way, the workflow of django-cacheops is totally unoptimised.
|
||
When an object is modified, an ``invalidate_obj`` function is called,
|
||
calling an ``invalidate_dict`` function, calling the ``manage.py invalidate``
|
||
command with a serialized version of the object (!?)
|
||
calling an ``invalidate_model`` function that calls the Redis ``KEYS`` command
|
||
to get all the cache keys from that model then delete them.
|
||
And as I said above, it executes all that N times,
|
||
N being the number of related objects to the current object,
|
||
even though multiple objects have the same model and we therefore
|
||
don’t need to invalidate the model multiple times.
|
||
|
||
**To sum up, django-cacheops has a terrible performance when modifying data,
|
||
and is reliable on what it handles.**
|
||
But you probably need features it doesn’t handle, such as
|
||
transactions (used by Django admin),
|
||
multi-table inheritance, or
|
||
``cursor.execute`` (the three features being used by Wagtail and django CMS)…
|
||
|
||
Number of lines of code
|
||
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Django-cachalot tries to be as minimalist as possible, while handling most
|
||
use cases. Being minimalist is essential to create maintainable projects,
|
||
and having a large test suite is essential to get an excellent quality.
|
||
The statistics below speak for themselves…
|
||
|
||
============ ======== ============= ========
|
||
Project part cachalot cache-machine cacheops
|
||
============ ======== ============= ========
|
||
Application 743 843 1662
|
||
Tests 3023 659 1491
|
||
============ ======== ============= ========
|