# LL-SYS2: Design Tiered Cache
Question Owner: Naga Bhanoori
Self-link: [go/llsys2](<http://go/llsys2>)
Presentation & video: Under construction!
## Overview
This is a design question that is open-ended. The goal here is to discuss potential approaches/tradeoffs to build a performant caching solution. We intend to ask the candidate to build a simple in-memory caching solution and we extend the problem statement to include concurrency, tiering, eviction etc.,.
Following are the expectations from the candidate:
* Come up with a simple solution and implement it in pseudo-code.
* Participate in design discussion around caching strategies, propose and iterate on initial design, solve technical challenges and communicate ideas clearly.
## Introduction
1. Set expectations with the candidate that this is a high-level design question, there will be follow up questions/extensions as the interview progresses forward.
2. Code need not be compiled/tested, we are looking for functionally correct pseudo code in a language of their choice.
## Question
* We have a data processing engine that reads/writes from/to a cloud store such as S3/ADLS/GCS. Reading data from the cloud store all the time can be slow.
* Build a simple, single machine in-memory cache. This cache can help improve performance of reads by avoiding cloud store requests.
* The cache can only use fixed memory space (10 GB for example).
// Implement following APIs using in-memory cache. You can leverage // cloud_store::get() and cloud_store::put() to access cloud store.
Read(Key)
Write(Key, Val)
**Assumptions (Please provide this info only based on discussion or questions from candidate):**
1. Our use case is, we ingest data continuously every day, but most of the time data that is read is from the last week.
2. Our workload characteristics are 66% read and 34% writes.
3. Writes are always for new keys, we don't modify the same key-value pair.
4. Fixed sized writes (4 MB) for all objects.
### Part 1: Implement Read/Write APIs with In-memory Cache (No concurrency).
We expect pseudo-code for Read/Write APIs from our processing engine. Ask the candidate about caching policy. Common answer is LRU (though at this stage of the interview, any policy is ok). We don't expect candidates to implement the policy, but we expect them to explain the details of how it works and why they picked that.
### Part 2: Caching strategies Deep dive!
#### 2.1 Poke the candidate on which caching strategy would they adopt and why?
Expectation here is that we want the candidate to think through and discuss potential options, and additional information we need to proceed in picking any mix of following strategies.
Possible caching strategies:
1. Look-aside cache
2. Read-through cache
3. Write-through cache
4. Write-back cache
***Hint:*** We don’t want to penalize the candidates for not knowing the exact names of the strategies as such. If they don’t know these strategies specifically, push them to think about when to write/read from cache etc.,
#### 2.2 What additional information do we need or we should collect from the workload to help us assist with caching strategy?
Read/write access patterns. We want to collect metrics that answer following questions:
1. How often does the same data get read?
2. How often do we read back newly written data?
### Solution
Our workload for this question, is going to be ready heavy (66%), and we will read back recently written data most of the time.
* Look-aside/Read-through is a no-brainer here as our workload is going to be read heavy.
* Write-through caching can be used, as we have evidence of newly written data being leveraged in future reads.
Ask the candidate to modify the above pseudo-code to suit the caching strategy they settle on.
### Part 3: Concurrency
#### Extend the Read/Write APIs to be concurrent
- **Global lock** - only one operation happening at a time, limits throughput.
- **Reader/Writer lock** - Allows multiple readers to occur at the same time, but only one writer. So though readers have better throughput, writes/cache-eviction is still serialized.
Ask the candidate to adjust the code based on their approach. Their code needs to be functionally correct.
***Hint:*** If the candidate has difficulty coming up with a solution beyond global lock:
1. Ask them to be use global lock in their code (allows them to see reads could be unblocked with minimal protection)
2. Remind them that our workload is read heavy.
```cpp
// Note: This code snippet contains solution combined with Part1 to Part3.
class DataProcessor {
public:
// Constructor that takes in-memory capacity of cache as parameter
DataProcessor(int capacity_bytes): cache_(capacity_bytes) {}
std::string Read(std::string key) {
// Look up cache without holding mutex. No need to hold mutex if the key
// was not found in cache.
if (cache_.Find(key)) {
std::lock_guard<CustomReaderWriterMutex> lock(rw_mutex_, kReaderMode);
if (cache_.Find(key)) {
return cache_.Get(key);
}
}
auto object = cloud_store::get(key);
{
// No need to hold the mutex while fetching data from S3.
std::lock_guard<CustomReaderWriterMutex> lock(rw_mutex_, kWriterMode);
cache_.EvictLRUIfNeeded(value);
cache_.Put(key, val);
}
return object;
}
void Write(std::string key, std::string value) {
cloud_store::put(key, value);
std::lock_guard<CustomReaderWriterMutex> lock(rw_mutex_, kWriterMode);
cache_.EvictLRUIfNeeded(value);
cache_.Put(key, val);
}
// mutex protecting access to cache_
CustomReaderWriterMutex rw_mutex_;
// Fixed size in-memory cache
LRUCache cache_;
};
Solution: Tiered cache. Data could be in-memory or on-disk (or definitely in cloud).
Read path:
Write path:
What to look for? There is no one correct answer here. See if the candidate can discuss an approach based on the information of workload we shared so far.
For a grade 3 ("Yes" in Greehhouse):
For a grade 4:
Grade 3 +
TBD