6 min read
RoaringBitmap is a compressed Bitmap format designed to efficiently represent sets of integers. It strikes an excellent balance between memory efficiency and performance and enables fast operations like union, intersection, and iteration.
Thanks to its effectiveness, RoaringBitmap has become the de facto Bitmap implementation in systems such as Apache Lucene, Apache Druid, ClickHouse, InfluxDB, and many others.
In this article, we'll explore the internal encoding strategies used in RoaringBitmap, how it compresses data, why these strategies are effective, and an example to see it in practice.
Note that this article is based on the implementation of a RoaringBitmap used in Apache Lucene. While other systems may introduce slight variations in how it is implemented or optimized, the core principles remain consistent across most implementations.
Also, please consider reading this article on the internals of a Bitmap to understand this article better.
Instead of using a single large Bitmap for the entire 32-bit integer space, RoaringBitmap splits the space into smaller chunks called blocks or containers. Each block handles a fixed range of values and is compressed independently, depending on how many values it contains.
Each 32-bit integer is split into:
This means each block manages a range of 2¹⁶ = 65536 possible values. How a block is stored depends on how many values are present in it.
RoaringBitmap uses three encoding strategies:
If a block contains up to 4096 values, it is considered sparse. In this case, the values are stored directly in a sorted short[] (since 16 bits are enough to represent any value within the block, and a short takes only 16 bits).
This format is compact and fast for small sets.
If a block contains 61440 or more set values, it is considered dense. Since the total range is 65536, this means that fewer than 4096 (the difference between 65536 and 61440) values are not set.
Instead of storing all the set values (which would take more space), RoaringBitmap stores the complement: the list of unset values, again using a short[]. When reading the block, the logic is inverted to return the correct set of values.
This approach saves space when a block is nearly full.
If a block contains a moderate number of values, which is more than 4096 but fewer than 61440, RoaringBitmap uses a Bitmap representation. This is a 65536-bit (8KB) fixed-size bitset, where each bit directly indicates the presence or absence of a value.
This strikes a balance between memory use and access speed.
You might wonder: why exactly 4096 is better suited for an array than a Bitmap? Is it just a magic number?
No, this is not a magic number. This is chosen simply because above this number of documents in a block, a Bitmap becomes more memory-efficient than an array. Let's see the maths behind it.
A short[] stores each offset using 2 bytes.
So, if a block contains ≤ 4096 values, the array is smaller or equal in size compared to a Bitmap, making it more efficient.
If a block has ≥ 61440 values set, then ≤ 4096 values are unset.
short[] now takes up to 8KB.Bitmap block.Since we're storing fewer items, it's more efficient to invert the logic and represent the block as a sparse list of absent values.

In the example above, we first find the block for each number and its corresponding index in each block.
Once that is defined, we group them together and then choose the right encoding strategy based on the cardinality of the block.
In this example, all the blocks would be categorised as sparse and will use a short[].
RoaringBitmap is a powerful and practical data structure that offers a highly efficient way to represent large sets of integers. Its clever use of multiple encoding strategies makes it suitable for a wide range of use cases where both performance and memory usage are critical.
If you're interested in digging deeper into how RoaringBitmap works or seeing it in action across different platforms, here are some great resources to explore:
https://roaringbitmap.org/
https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
https://jvns.ca/blog/2016/01/23/fast-integer-sets-with-roaring-bitmaps/
https://github.com/RoaringBitmap/RoaringBitmap
Hope you learned something new today. If you like geeking out over clever engineering, check out my other blogs here.
Thank you for reading. Happy coding!