RoaringBitmap API documentation

Roaring bitmap in Cython.

A Roaring bitmap stores a set of 32 bit integers compactly while allowing for efficient set operations. The space of integers is partitioned into blocks of 2 ** 16 integers. The representation for a block depends on the number of elements it contains:

<= 4096 elements:
an array of up to 1 << 12 16-bit integers that are part of the set.
>= 61140 elements:
an array of up to 1 << 12 16-bit integers that are not part of the set.
otherwise:
a fixed bitmap of 1 << 16 (65536) bits with a 1-bit for each element.

A RoaringBitmap can be used as a replacement for a mutable Python set containing unsigned 32-bit integers:

>>> from roaringbitmap import RoaringBitmap
>>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15))
RoaringBitmap({5, 6, 7, 8, 9})

ImmutableRoaringBitmap is an immutable variant (analogous to frozenset) which is stored compactly as a contiguous block of memory.

MultiRoaringBitmap stores a sequence of immutable roaring bitmaps in an efficiently serializable, contiguous block of memory.

class roaringbitmap.RoaringBitmap(iterable=None)

Bases: object

A compact, mutable set of 32-bit integers.

Return a new RoaringBitmap with elements from iterable.

The elements x of a RoaringBitmap must be 0 <= x < 2 ** 32. If iterable is not specified, a new empty RoaringBitmap is returned. Note that a sorted iterable will significantly speed up the construction. iterable may be a generator, in which case the generator is consumed incrementally. iterable may be a range (Python 3) or xrange (Python 2) object, which will be constructed efficiently.

add(self, uint32_t elem)

Add an element to the set.

This has no effect if the element is already present.

clamp(self, uint32_t start, uint32_t stop)

Return new set with range of values restricted to (start, stop).

clear(self)

Remove all elements from this RoaringBitmap.

copy(self)

Return a copy of this RoaringBitmap.

debuginfo(self, verbose=False)

Return a string describing the internal representation of this set.

difference(self, *other)

Return the difference of two or more sets as a new RoaringBitmap.

(i.e, self - other[0] - other[1] - …)

difference_update(self, *other)

Remove all elements of other RoaringBitmaps from this one.

discard(self, uint32_t elem)

Remove an element from the set if it is a member.

If the element is not a member, do nothing.

flip_range(self, uint32_t start, uint32_t stop)

In-place negation for range(start, stop).

freeze(self)

Return an immutable copy of this RoaringBitmap.

index(self, uint32_t x)

Return the 0-based index of x in this set.

Equivalent to sorted(self).index(x).

intersection(self, *other)

Return the intersection of two or more sets as a new RoaringBitmap.

(i.e. elements that are common to all of the sets.)

intersection_len(self, other)

Return the cardinality of the intersection.

Optimized version of len(self & other).

intersection_update(self, *other)

Intersect this set in-place with one or more RoaringBitmap objects.

NB: since range objects are recognized by the constructor, this provides an efficient way to restrict the set to a range of elements:

>>> rb = RoaringBitmap(range(5))
>>> rb.intersection_update(range(3, 7))
>>> rb
RoaringBitmap({3, 4})
isdisjoint(self, other)

Return True if two RoaringBitmaps have a null intersection.

issubset(self, other)

Report whether another set contains this RoaringBitmap.

issuperset(self, other)

Report whether this RoaringBitmap contains another set.

jaccard_dist(self, other)

Return the Jaccard distance.

Optimized version of 1 - len(self & other) / len(self | other). Counts of union and intersection are performed simultaneously.

max(self)

Return largest element in this RoaringBitmap.

min(self)

Return smallest element in this RoaringBitmap.

numelem(self)

Return total number of uint16_t elements stored.

pop(self)

Remove and return the largest element.

rank(self, uint32_t x)

Return the number of elements <= x that are in this set.

remove(self, uint32_t elem)

Remove an element from the set; it must be a member.

If the element is not a member, raise a KeyError.

select(self, int i)

Return the ith element that is in this set.

Parameters:i – a 0-based index.
symmetric_difference(self, other)

Return the symmetric difference of two sets as a new RoaringBitmap.

(i.e. all elements that are in exactly one of the sets.)

symmetric_difference_update(self, other)

Update set to symmetric difference of itself and another.

union(self, *other)

Return the union of two or more sets as a new set.

(i.e. all elements that are in at least one of the sets.)

union_len(self, other)

Return the cardinality of the union.

Optimized version of len(self | other).

update(self, *other)

In-place union update of this RoaringBitmap.

With one argument, add items from the iterable to this set; with more arguments: add the union of given RoaringBitmap objects.

NB: since range objects are recognized by the constructor, this provides an efficient way to set a range of bits:

>>> rb = RoaringBitmap(range(5))
>>> rb.update(range(3, 7))
>>> rb
RoaringBitmap({0, 1, 2, 3, 4, 5, 6})
class roaringbitmap.ImmutableRoaringBitmap(iterable=None)

Bases: roaringbitmap.RoaringBitmap

A roaring bitmap that does not allow mutation operations.

Any operation resulting in a new roaring bitmap is returned as a mutable RoaringBitmap (except for freeze() and the ImmutableRoaringBitmap constructor). Stores data in one contiguous block of memory for efficient serialization.

Return a new RoaringBitmap with elements from iterable.

The elements x of a RoaringBitmap must be 0 <= x < 2 ** 32. If iterable is not specified, a new empty RoaringBitmap is returned. Note that a sorted iterable will significantly speed up the construction. iterable may be a generator, in which case the generator is consumed incrementally. iterable may be a range (Python 3) or xrange (Python 2) object, which will be constructed efficiently.

add(self, uint32_t elem)

Unsupported method.

clear(self)

Unsupported method.

copy(self)

Return a copy of this RoaringBitmap.

difference_update(self, *other)

Unsupported method.

discard(self, uint32_t elem)

Unsupported method.

flip_range(self, start, stop)

Unsupported method.

freeze(self)

Already immutable, return self.

intersection_update(self, *bitmaps)

Unsupported method.

pop(self)

Unsupported method.

remove(self, uint32_t elem)

Unsupported method.

symmetric_difference_update(self, other)

Unsupported method.

update(self, *bitmaps)

Unsupported method.

class roaringbitmap.MultiRoaringBitmap(list init, filename=None)

Bases: object

A sequence of immutable roaring bitmaps.

Bitmaps are addressed with 32-bit indices. Everything is stored in a single contiguous block of memory.

>>> mrb = MultiRoaringBitmap([
...    RoaringBitmap({0, 1, 2}),
...    RoaringBitmap({1, 6, 8}),
...    RoaringBitmap({1, 7, 2})])
>>> mrb.intersection(list(range(len(mrb))))
RoaringBitmap({1})
>>> mrb[0] | mrb[1]
RoaringBitmap({0, 1, 2, 6, 8})
Parameters:
  • init – a list of set-like objects (e.g., RoaringBitmaps). May contain None elements, which are treated as empty sets.
  • filename – if given, result is stored in an mmap’d file. File is overwritten if it already exists.
andor_len_pairwise(self, array indices1, array indices2, array resultand, array resultor)

Pairwise intersection/union cardinality for pairs of roaring bitmaps in this collection given by zip(indices1, indices2).

Parameters:
  • indices1 – input array
  • indices2 – input array
  • resultand – result array
  • resultor – result array

All parameters should be Python arrays of type ‘L’, all preallocated with the same length; result arrays need not be initialized.

>>> result1 = array.array('L', [0] * 3)
>>> result2 = array.array('L', [0] * 3)
>>> mrb.intersection_card_pairwise(array.array('L', [0, 6, 8]),
...                     array.array('L', [1, 7, 6]), result1, result2)
>>> result1
array.array('L', [3, 2, 56])
>>> result2
array.array('L', [6, 4, 123])
bufsize(self)

Return size in number of bytes.

close(self)

Close opened file, if any.

frombuffer(type cls, data, int offset)

Load a MultiRoaringBitmap from a Python object using the buffer interface (e.g. bytes or mmap object), starting at offset.

fromfile(type cls, filename)

Load a MultiRoaringBitmap from a file using mmap.

get(self, long i)

Return bitmap i as an ImmutableRoaringBitmap, or None if i is an invalid index.

getsize(self, long i)
intersection(self, list indices, uint32_t start=0, uint32_t stop=0xffffffff)

Compute intersection of given a list of indices of roaring bitmaps in this collection.

Parameters:
  • start – optional start index.
  • stop – optional end index; if given, only return elements n s.t. start <= n < stop.
Returns:

the intersection as a mutable RoaringBitmap. Returns None when an invalid index is encountered or an empty result is obtained.

jaccard_dist(self, array indices1, array indices2)

Compute the Jaccard distances for pairs of roaring bitmaps in this collection given by zip(indices1, indices2).

>>> mrb.jaccard_dist(array.array('L', [0, 6, 8]),
...                     array.array('L', [1, 7, 6]))
array.array('d', [0.3, 0.2, 0.56])
Parameters:
  • indices1 – input array
  • indices2 – input array
Returns:

a Python array of floats with the jaccard distances.

indices1 and indices2 should be arrays of unsigned long integers, created with array.array('L'). Ensure that all indices i are in the range 0 <= i < len(self).

jaccard_dist_single(self, RoaringBitmap rb)

Compute the Jaccard distances for rb with all roaring bitmaps in this collection.

>>> mrb.jaccard_dist_single(RoaringBitmap([1, 6, 19, 22]))
array.array('d', [0.3, 0.2, 0.56])
Parameters:rb – a roaring bitmap.
Returns:a Python array of floats with the jaccard distances with length equal to len(self).

Indices and tables