Building a sub-sampling image viewer for Compose UI

How do you display a 100mb image on Android without running into OutOfMemoryError? You can’t. But you can cheat.

For years Android developers have used Dave Morrissey’s excellent library, subsampling-scale-image-view for displaying large bitmaps with deep zoom. It optimizes memory usage by loading lower resolution bitmaps whenever possible and avoiding loading parts of the original image that are not visible. It felt magical.

When Compose UI came out, it made it difficult to continue using subsampling-scale-image-view due to its initial lack of interoperability with Views, especially around nested scrolling. It made me wish that the library’s image viewer code was decoupled from its gesture detector. That’d have made it possible for me to plug-in a compose-compatible gesture detector and even allowed the same gestures to be used for GIFs and other non-sampled bitmaps.

This seed of an idea blossomed into what telephoto is today. Building a sub-sampling image viewer from scratch was a lot of fun and I’d like to share my adventure with future Sakets who might dare to embark on the same path for Android or other platforms.

There are two concepts for displaying a large image with minimal memory usage:

1. Sub-sampling

Phones are small, images can be big. To fit an 4K image within a Pixel 7’s 1080p display, scaling it to 0.5x zoom is necessary. However, a more efficient approach is to load the image at half its resolution and display it at 1x zoom, achieving the same visual quality.

Left: 2160p resolution at 0.5x zoom. Right: 1080p resolution at 1x zoom. Photo by Romain Guy.

If you ask your math professor, they’ll agree that 2160 x 0.5 == 1080 x 1. Our eyes wouldn’t be able to tell any differences between the two images and we’ll have saved a whopping 75% of memory space — from ~23.7MB to ~5.9MB.

// Assuming that the image has a color depth of
// 8 bits per channel in the sRGB color space.
2160 x 3840 x 3 = ~23.7 MB
1080 x 1920 x 3 = ~5.9 MB

For decoding an image at a lower resolution, telephoto uses Android’s BitmapRegionDecoder. It uses a technique called “sub-sampling”, which involves sampling and retaining only a subset of the original data with fewer data points. In other words, a 4K image can be converted into 1080p by skipping every alternate pixel.

val decoder = BitmapRegionDecoder.newInstance(…)
val bitmap = decoder.decodeRegion(
  bounds = tile.bounds,
  options = BitmapFactory.Options().apply {
    inSampleSize = sampleSizeFor(tile.zoom)

2. Tiling

Images can’t remain zoomed-out forever. Telephoto is a zoomable image library after all (among other things). As an image is zoomed in, it is progressively replaced with higher resolution copies at various zoom buckets. This reduces the amount of memory saved through sub-sampling, but opens up another clever way to save memory — tiling.

Tiling refers to a technique where the image is divided into smaller tiles so that they can be lazy loaded. Tiles whose coordinates lie within the visible region are selectively loaded and cached. When the image is panned, tiles that are no longer visible are safely discarded from memory.

val tileSize = imageSize * (sampleSizeFor(zoom) / sampleSizeAtMinZoom)
val xTileCount = imageSize.width / tileSize.width
val yTileCount = imageSize.height / tileSize.height

val tileGrid = (0 until xTileCount).flatMap { x ->
  (0 until yTileCount).map { y ->
      zoom = zoom,
      bounds = IntRect(
        offset = IntOffset(
          x = x * tileSize.width, 
          y = y * tileSize.width
        size = tileSize, 
val tileBounds = tile.bounds.scaledAndOffsetBy(zoom, offset)
val isTileVisible = tileBounds.overlaps(canvasBounds)

These two strategies ensure that the memory usage of images remains approximately the same regardless of how large they are. A 1080p image and an 8k image will have similar memory usage on a Pixel 7!

One small detail that was omitted from the video above is that the loading of bitmap tiles is not instant and can take time depending on the image size. To hide this, telephoto draws the base, low-resolution tile as a filler.

Take a penny, leave a penny

One interesting math problem I ran into was that the visible tiles need to be stitched together after they’ve been divided. Because zoom levels are fractional and canvas expects integers, it’s easy to run into precision errors. My first naive attempt was to convert fractions into integers by using Math.ceil. As you may have guessed, this caused the tiles to contain gaps at certain levels:

A correct solution was to discard fractional values from all tiles (commit). This may sound the opposite of what was needed, but it works out well since discarding a fractional value causes the next tile to be shifted back by a pixel and so on, eventually eliminating any fractional error from the whole image. The last tiles along the X and Y axes may become 1px shorter than the original image, but who cares it’s not noticeable enough to be a problem.

If you’ve made it to the end of this article, I’d recommend checking out telephoto. It performs everything mentioned along with gestures for panning and zooming images with just one word:

- AsyncImage(
+ ZoomableAsyncImage(
    model = "",
    contentDescription = ...
Thanks to Cora Sherratt for reviewing this article. Photo credits to Romain Guy and Joel Severino.