MimicDB: An Isomorphic Key-Value Store for S3


S3 Metadata without the Latency or Costs

MimicDB is a local data store of object metadata from S3. Consistency is created and maintained by "mimicking" every S3 API call locally. This allows many tasks like listing, searching keys and calculating storage usage to be completely handled locally, without the latency or costs of using the S3 API.

On average, tasks like these are 2000x faster using MimicDB.

Boto
MimicDB + Boto
>>> c = S3Connection(KEY, SECRET)
>>> bucket = c.get_bucket('bucket_name')
>>> start = time.time()
>>> bucket.get_all_keys()
>>> print time.time() - start
0.425064992905
>>> c = S3Connection(KEY, SECRET)
>>> bucket = c.get_bucket('bucket_name')
>>> start = time.time()
>>> bucket.get_all_keys()
>>> print time.time() - start
0.000198841094971


Lightweight Data Store

The Python implementation of MimicDB defaults to use a Redis backend, although SQLite and in-memory backends are also available. The backend can be easily extended to use other databases.

Data is stored in a simple but powerful layout:

mimicdb A set of bucket names
mimicdb:bucket A set of key names
mimicdb:bucket:key A hash of key metadata (size and MD5)

The mimicdb prefix can additionally have an optional namespace string, which allows multiple S3 connections to share the same backend. In that case, the layout looks like this:

mimicdb:namespace
mimicdb:namespace:bucket
mimicdb:namespace:bucket:key

Concurrency and Scalability

MimicDB transactions take place on the same level as S3 API calls, so concurrent access to the backend is identical to concurrent access to an S3 connection. Multiple simultaneous writes to the same key on S3 are reflected identically in MimicDB.

Bucket level key layout provides an easy way to partition Redis over multiple servers. Using consistent hash partitioning, each bucket and key set can be stored on the same instance.


Implementation

MimicDB is currently implemented in Python via Boto. If you're using Boto already, the MimicDB Python library works as a drop in replacement.