Editorial R03 | Notion

Fetching all the records in memory
- 50M records (max 5 x 5GB of data) → not a good idea to have this in memory
- Even if we can, what if two requests come in parallel?
  - 2x memory usage, won’t scale
- Optimization: Offload median calculation to the database itself
  - percentile_cont: https://www.postgresql.org/docs/9.4/functions-aggregate.html
Not using caching
- Caching is always an easy performance boost, consider using redis or memcached
  - https://en.wikipedia.org/wiki/Redis
  - https://en.wikipedia.org/wiki/Memcached
- This prevents repetitive calculations, while cache invalidation is tricky in our case we can cache the response until the next ingestion
Using OLTP database
Not using proper index
Actually calculating the median
- Calculating median over 50M records can be costly, we can approximate instead
  - https://arxiv.org/abs/1902.04023