Documentation

is a tiny Scala library I created to cache Squeryl query results into memory in my web applications. It does not depend on Squeryl or anything, and may be used to cache anything.

Characteristics:

  • Written in pure Scala; no annotations, no aspects.
  • Simple, concise and strictly typed usage syntax.
  • No explicit startup/shutdown required.
  • No data serialization; thus no clustering support, and cached objects must be immutable.
  • No data expiration; there is no maxLifetime option.
  • Supports reconfiguration on-the-fly, invalidation and statistics collection for individual caches or for all caches at once via CacheRegistry.
  • Thread-safe: public methods are synchronized.

Contents of this page:

How to cache single values

In my projects, I usually have a single-row config table which stores configuration options. Using Squeryl, I map it like this:

					package myapp.model
					import org.squeryl._

					// It's just convenient to have primary key, even if it's value is always 1.
					case class Config (id: Int, ...) extends KeyedEntity[Int] {
						def this() = this(1, ...)
					}

					object T extends Schema {
						val config = table[Config]
					}
				

And then I access it like this:

					package myapp.dal
					import myapp.model._
					import org.squeryl.PrimitiveTypeMode._
					import ru.dimgel.lib.cache._

					object ConfigDAL {
						private val cache = new ValueCache[Config]

						def data = cache {
							// I need inTransaction{} here because configuration is queried
							// by webapp init() method outside request transaction context.
							inTransaction { from(T.config)(t => select(t)).head }
						}

						def data_=(x: Config) {
							// I don't use inTransaction{} anywhere except the above,
							// because my webapp service() method is wrapped in transaction{}.
							require(x.id == 1)
							T.config.update(x)

							// Optimization trick, to avoid excessive SQL query on next data getter call:
							//cache.clear()
							cache.set(x)
						}
					}
				

So, you create instance of ValueCache[V], and wrap your data query logic into a call to cache.apply(dataProvider: => V): V.

NOTE: If your data query logic throws an exception, it's propagated to the caller, and cache state does not change.

When you update your data, you have to clear (invalidate) cache manually by a call to cache.clear(); so next call to data getter would execute your data query logic. Or, for the sake of optimization, you can enforce cache to store updated data by a call to cache.set(v: V); in this case, next call to data getter would return that data without expensive execution of your data query logic.

I also often use ValueCache for caching lists of objects when I'm sure those lists are small, like this:

					object NewsDAL {
						private val cache = new ValueCache[List[News]]

						def list = cache {
							from(T.news)(n => where(1 === 1) select(n) orderBy(n.whenCreated desc)).page(0, 10).toList
						}
					}
				

See below how to cache multiple objects by their keys using MapCache, and how ValueCache and MapCache may be used together.

How it works

The very first version of ValueCache looked like this:

					package ru.dimgel.lib.cache

					class ValueCache[V] {
						private var data_? : Option[V] = None

						def apply(valueProvider: => V): V = synchronized {
							if (data_?.isEmpty)
								data_? = Some(valueProvider)
							data_?.get
						}

						def set(v: V) { synchronized {
							data_? = Some(v)
						}}

						def clear() { synchronized {
							data_? = None
						}}
					}
				

I beleive there's nothing to explain here. Current version supports configuration (see below; for ValueCache, it's just enabled/disabled), statistics collection, global CacheRegistry, but the essence is the same.

How to cache multiple values by their keys

Just example, again. Assume we have list of countries referenced by great many other tables. It would be much more effecient to cache countries separately instead of joining them into lots of SQL queries.

Entity mapping:

					package myapp.model
					import org.squeryl._

					case class Country(id: Int, name: String, ...) extends KeyedEntity[Int] {
						def this() = this(1, null, ...)
					}

					object T extends Schema {
						val country = table[Country]
					}
				

DAL:

					package myapp.dal
					import myapp.model._
					import org.squeryl.PrimitiveTypeMode._
					import ru.dimgel.lib.cache._

					object CountryDAL {
						// By default, there's no limit on number of elements stored in cache.
						private val cache = new MapCache[Int, Country]

						def find(id: Int) =
							// Don't cache negative results, to avoid cache to grow infinitely.
							// So if requested entity does not exist, we throw (None.get throws)
							// and catch that exception outside cache call.
							try {
								Some(cache(id, id => T.country.lookup(id).get))
							} catch {
								case e: NoSuchElementException => None
							}

						def get(id: Int) =
							find(id).get

						def updateCountry(x: Country) {
							require(x.id != 0)
							T.country.update(x)

							//cache.clear()
							//cache.remove(x.id)
							cache.set(x.id, x)
						}

						def insertCountry(x: Country) = {
							require(x.id == 0)

							// I hate when Squeryl injects id into _immutable_ entity.
							val x2 = x.copy()
							T.country.insert(x2)
							assert(x2.id != 0)

							cache.set(x2.id, x2)

							x2
						}
					}
				

The idea is the same as for ValueCache[V], but MapCache[K,V] has two type parameters (storage key and value; internal storage is HashMap[K,V]), and method apply() has more complex signature: apply(k: K, dataProvider: K => V): V.

But there are some tricks about how it's used. Look at find() method in example above. First, negative results are not cached. If you want it, you should instantiate MapCache[Int, Option[Country]]. Second, if your data query logic throws an exception, it's propagated to cache caller and cache state does not change. These two behaviours are leveraged so find() method has return type Option[Country] and returns None if requested country is not found, but that None is not stored into cache.

See below how to configure max MapCache size and how eviction works.

When country is updated, you can, again, invalidate cache completely (which is absolutely stupid in this case), invalidate just single cache entry, or set/replace it immediately and thus avoid excessive SQL query when that entry is requested. When new country is inserted, it's possible to call cache.set(K,V) too.

Thoughts on caching separately instead of joining

Pre-caching dictionaries (often used but rarely modified tables like countries, currencies, etc.) may significantly improve performance and reduce query complexity. But be careful if you have caches for various entities which reference each other.

First trouble. I doubt that Squeryl's relation declarations (ManyToOne, etc.) provide enough immutability semantics to be cached. Currently, I don't use them at all, instead I do this:

					package myapp.model
					case class Country(id: Int, ...) ...
					case class City(id: Int, countryId: Int, ...) ...
				
					package myapp.modelx
					import myapp.model
					case class CityX(city: City, country: Country)
				

Ugly but straitforward and simple. By the way, this allows introducing various ModelX-classes for same entity depending on its usage context. (I don't like idea of "partially filled objects" containing only data necessary for current use case because I've got no help from IDE or static type checker on recalling which fields I've filled and which I haven't.)

So, if most of your use cases need city's country along with city, it could look natural to cache CityX instead of City:

					package myapp.dal
					import ...

					object CityDAL {
						private val cache = new MapCache[Int, CityX]

						def find(id: Int) =
							try {
								Some(cache(id, id => {
									from(T.city, T.country)((ci,co) =>
										where(ci.id === id and co.id === ci.countryId)
										select(CityX(ci, co))
									).head
								}))
							} catch {
								case e: NoSuchElementException => None
							}
					}
				

But here comes the second trouble: if you update some Country, you'll have to invalidate/update not only the appropriate entry of CountryDAL.cache, but also all entries of CityDAL.cache (and all other caches) which reference it, or you'll obviously get cache inconsistence.

Thinking about this problem, I tried adding methods ValueCache.clearIf(cond: V => Boolean) and MapCache.removeWhere(cond: (K,V) => Boolean) as a potential solution for those who might want to maintain cross-cache consistency. I mean this use-case:

					object CountryDAL {
						def updateCountry(x: Country) {
							...
							cache.set(x.id, id)
							CityDAL.countryChanged(x)
						}
					}
					object CityDAL {
						def countryChanged(x: Country) {
							cache.removeWhere((id,cityX) => cityX.city.countryId == x.id)
						}
					}
				

But this idea looks ugly and dangerous:

  • Couping and complexity. Why the hell CountryDAL must know about CityDAL? Well, that maybe solved using Observer pattern, but the result cannot be called "simple and transparent" anymore in any case. And there maybe problems with Scala object instantiation order and circular dependencies.
  • Since all public methods in all caches are synchronized, I always fear of deadlocks.
  • Lots of data duplication among caches.

So for now I prefer instead of accessing cityX.country call CountryDAL.get(city.countryId) everywhere. I beleive, this is the case when more code results in less complexity. If you disagree, or have other ideas to share on the subject (and of course on everything else =)), I'd be thankful to read them on GoogleGroups.

Configuration, MapCache eviction policy

Configuration options are provided as by-name class parameters of ValueCache and MapCache classes:

					class ValueCache[V] (enabled: => Boolean = true)
					class MapCache[K,V] (enabled: => Boolean = true, maxElements_? : => Option[Int] = None)
				

Caches are enabled by default but can be disabled. In this case their internal storage is cleared, apply() methods always delegate to their dataProviders, and all updater methods (clear(), set(), remove(), .etc.) do nothing.

MapCache also has maxElements_? parameter. Default value None means that cache may grow infinitely. If you specify Some(N), then N must be positive and size of cache's internal HashMap storage would never exceed specified limit. Eviction policy is simple: least recently accessed entries are thrown away. This is done in efficient way, O(1), using auxiliary double-linked list of recently accessed entries (without cached data instance duplication).

Why cache parameters are by-name? They are applied on object instantiation and re-applied when you call cache's reloadConfig() method. You can keep cache parameters even in database (in Config entity fields, see ValueCache usage example in the beginning of documentation), provide site admin with HTML editor form and reapply all cache configurations on its submission. Just define your cache like I do:

					object NotificationsDAL {
						private val byUserIdCache = new MapCache[Int, List[NotificationX]] (
							enabled = ConfigDAL.data.cache_notifications_isEnabled,
							maxElements_? = ConfigDAL.data.cache_notifications_maxElements
						)
					}
				

I repeat: access to class parameters is performed only in two cases: on cache instantiation and each time when you call cache's reloadConfig() method. Not on any access to cache. Parameters are evaluated, their values are stored into internal variables (currently effective configuration) and cache state is adjusted accordingly. For example, if you switch cache from enabled to disabled state, it's internal storage is cleared; if you reduce MapCache's maxElements_? value, expensive least recently accessed elements would be evicted to fit new restriction.

CacheRegistry

Both ValueCache and MapCache extend abstract Cache class which declares their common API and registers its instances into global object CacheRegistry which provides helper methods that affect all registered caches at once:

  • reloadAllConfigs() calls reloadConfig() on all registered caches (this is what I call on config form submission as explained in previous section);
  • clearAll() calls clear() on all registered caches;
  • clearAllStatistics() calls clearStatistics() on all registered caches;
  • getAllStatistics() calls getStatistics() on all registered caches and returns them in unsorted list (see below about statistics).

CacheRegistry stores cache instances in WeakHashMap, so it does not prevent them from being garbage collected.

NOTE: I define my DALs as Scala objects (singletons), and they are instantiated lazily. You cannot affect those caches which don't yet exist (in my case, because they belong to DAL which is not yet instantiated).

NOTE: Many people told me that global registry is bad idea and I should use service provider instead to avoid mixing caches defined in application and, for example, in libraries. But this is my deliberate intention: site admin will see and manage all caches in single place, no matter where they came from. Also, I could not add service provider without making usage syntax significatly more complex and verbose, and I wanted to keep the whole thing as simple as possible.

CacheStatistics

Caches' getStatistics() methods return an instance of CacheStatistics class which contains snapshot of current cache's configuration and internal statistics counters (see scaladoc for details). I used to display that statistics in HTML table on a page accessible by site admin, along with buttons that perform actions of CacheRegistry API.

CacheStatistics does not contain a reference to the cache instance it was created by, instead it contains cache description. By default, cache description is just cache class name - "MapCache" or "ValueCache". It's recommended to override cache descriptions like this:

					object NotificationDAL {

						private val byUserIdCache = new MapCache[Int, List[NotificationX]](...) {
							override protected val description = "NotificationDAL.byUserIdCache"
						}
					}
				

Note that description is val, not def.

Caching both list and by-id map

Things like countries, currencies and so on maybe both accessed by id and displayed in list. So it could be useful to consistently cache both list and by-id map. Class CachedListAndMap[K,V] solves this task. It's contained in the library, but here I show its source code just to provide another real usage example:

					package ru.dimgel.lib.cache

					abstract class CachedListAndMap[K, V] {

						protected final class Data(val list: List[V], val map: Map[K,V])

						// Abstract because user will need custom-configured instances.
						protected val cache: ValueCache[Data]

						protected def queryList: Iterable[V]
						protected def getKey(v: V): Option[K]


						private def data = cache {
							val list = queryList.toList
							val map = Map() ++ list.map(v => (getKey(v) -> v)).filter(!_._1.isEmpty).map(t2 => (t2._1.get -> t2._2))
							new Data(list, map)
						}

						final def list = data.list

						final def find(k: K) = data.map.get(k)

						final def get(k: K) = data.map(k)

						final def clear() {
							cache.clear()
						}
					}