eZ component: Cache, Requirements, 1.4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$Author: ts $
$Revision: 7713 $
$Date: 2008-04-19 15:56:25 +0200 (Sat, 19 Apr 2008) $
:Status: Draft

.. contents::

=====
Scope
=====

The scope of this document is to describe the requirements for the features to
be implemented for the Cache component version 1.4. This version will
incorporate the following features and fixes:

- #12587: Hierarchic caching for the Cache component

The requirements for these features will be described in this document. For
design and implementation related information, please take a look at the design
document.

==================================================
#12587: Hierarchic caching for the Cache component
==================================================

The idea behind this feature is to provide hierarchical multi level caching for
the Cache component. Currently the Cache component only supports 1 cache
handler to be asked to restore a certain object. Either the handler returns
cached data for the desired object (hit) or it returns false to indicate that
it does not have valid data (miss). There is no possibility to instruct the
Cache component to search other caches in case of a miss. For this reason,
hierarchic caches will be introduced.

Use case
========

A typical use case for this feature exists, if several cache locations are
available, which are differently performant and can store a different amount
of items. The following is such a scenario.

3 types of cache are available for an application. The first and fastest level
is covered by a cache that resides in the local memory of the server fulfilling
a request. This cache is extremely fast, because it resides in local memory, but
is therefore also limited to a very low number of objects it can contain
without polluting the local memory too much. The second level cache is a
Memcached, running on a dedicated cache server. This is also quite fast, but
not as fast as the local memory cache, since data needs to be transfered
through a network connection. On the other hand, this cache is larger, since
the server is dedicatedly build for performant caching. The third level of
caching is provided by a file server, which offers the slowest kind of cache.
In contrast, it has the largest storage capability.

In this scenario, the desired behavior would be that the most needed cache
items would be stored in the fastest cache. Cache items which are not that
much, but still often needed could reside in the second cache level and rarely
used items would be stored in the third layer.

Requirements
============

Beside the main requirement, to be able realize hierarchical caching, several
sub-requirements of this exist, which will be summarized in this section. Note
that a fixed target is to not break BC for existing applications in any way and
to avoid code duplication as much as possible. Beside that, too complex
structures need to be avoided to maintain performance.

Hierarchical stacking
---------------------

A way needs to be implemented to put multiple instances of the current
implemented cache storages in a stack to indicate that they represent a
hierarchy. A class to manage such stacks is to be designed and implemented.

An object of this class must take care of:

- **Searching for items recursively through the stack**
  The stacking mechanism needs to search the stack of caches from top to bottom
  before it indicates that a cache item could not be found.
- **Storing new data across the cache stack**
  When a new cache item is stored, it needs to be stored according to the
  strategy chosen for the maintenance of the hierarchy. In addition, it can
  either be stored only in 1 Cache or in several caches at once. For more
  information see the `Cache propagation`_ section.
- **Bubbling restored data up through the cache stack**
  According to the replacement strategy, cache items need to be placed into
  higher levels of the hierarchy, as soon as they get restored from a deeper
  level, to make them available faster on subsequent restore requests. For more
  information see the `Replacement strategies`_ section.

Limiting of cached items
------------------------

The current storage implementations assume that unlimited space is available in
the storage location. Following this philosophy, all cache items would be kept
in the top most storage of the cache stack and the lower storages would not be
needed.

A way must be investigated, how an arbitrary cache storage can be limited by the
number of cache items it stores.

A major problem here is, that this information either needs to persist between
requests or needs to be recalculated in each request. On the one hand, the
latter solution might significantly reduce performance (e.g. for large
file system based caches). On the other hand, the first solution might afford
complex persistence mechanisms for the desired information and might pollute the
cache storages with it.

Replacement strategies
----------------------

As soon as the number of items in a cache exceeds the maximum (see section
`Limiting of cached items`_) and a new item is to be stored, a currently cached
item needs to be replaced. The most favorable approach would be to remove the
item which will be not used for the longest time in the future. Since this is
impossible to know, several established alternatives (e.g. LRU, LFU, ...) exist
to solve the replacement problem. More information about this can be found in
Wikipedia under `cache algorithms`_ and `page replacement algorithms`_.

.. _`cache algorithms`: http://en.wikipedia.org/wiki/Cache_algorithms
.. _`page replacement algorithms`: http://en.wikipedia.org/wiki/Page_replacement_algorithm

A problem with any of the replacement strategies is, that additional
information about a cache item needs to be stored. For example: To realize an
LRU algorithm, the last access time of an item needs to be stored. For LFU, the
number of accesses needs to be stored. The cache storages currently don't
support adding such information and an appropriate place to store this
persistently and efficiently needs to be thought out.

It would be favorable to allow users to decide for a specific replacement
algorithm or to even implement their own strategies.

Cache propagation
-----------------

There are two possibilities to define propagation of cache items through the
stack, where a decision needs to be made for one consistent solution:

Propagate in store
^^^^^^^^^^^^^^^^^^

With this strategy, a newly stored cache item is automatically propagated to
all levels of the cache hierarchy. With this strategy, a newly stored or update
cache item needs to be update in all caches of the hierarchy at once.

This has some advantages and some disadvantages against the second alternative
`Propagate on replacement`_:

Pros:

- Storing of cache items happens in a central place.
- The replacement strategy does not need to take care about downwards
  propagation.

Cons:

- The initial storage of an item lasts longer, since propagation needs to be
  done.
- To purge an item, the item needs to be purged from all storages in the stack,
  that reside deeper than the first cache where the item is found.

Propagate on replacement
^^^^^^^^^^^^^^^^^^^^^^^^

Using this strategy, a newly stored cache item is only put into the top most
storage in the hierarchy. As soon as it needs to be replaced there, it is
propagated down one level, before being removed from the higher level cache.

Pros:

- The initial storage of a cache item is faster, since it only affects one
  storage. In addition, this should be the fastest storage (the top most).
- Purging of an item does only affect 1 single cache.
- If all storages reached their maximum number of stored items, only a single
  item is bubbled down to the lowest level.

Cons:

- Additional work is to be done on each replacement of a cache item.
- If the cache storage disappears (for instance an in-memory storage) then it
  hasn't be cached in the lower levels yet).

Open issues
-----------

This section summarizes misc open issues that need to be solved during the
design and implementation phase.

Locking
^^^^^^^

With hierarchical caching, the complexity of operations that need to be
performed on a storage and between different storages is raised. This might
open the possibility for race conditions in high load environments. To avoid
this, a locking mechanism (or another way to ensure exclusive access) might be
needed.


..
   Local Variables:
   mode: rst
   fill-column: 79
   End: 
   vim: et syn=rst tw=79
