RoaringBitmap API documentation¶
Roaring bitmap in Cython.
A Roaring bitmap stores a set of 32 bit integers compactly while allowing for
efficient set operations. The space of integers is partitioned into blocks
of 2 ** 16
integers. The representation for a block depends on the number
of elements it contains:
- <= 4096 elements:
- an array of up to
1 << 12
16-bit integers that are part of the set. - >= 61140 elements:
- an array of up to
1 << 12
16-bit integers that are not part of the set. - otherwise:
- a fixed bitmap of
1 << 16
(65536) bits with a 1-bit for each element.
A RoaringBitmap
can be used as a replacement for a mutable
Python set
containing unsigned 32-bit integers:
>>> from roaringbitmap import RoaringBitmap
>>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15))
RoaringBitmap({5, 6, 7, 8, 9})
ImmutableRoaringBitmap
is an immutable variant (analogous to frozenset
)
which is stored compactly as a contiguous block of memory.
MultiRoaringBitmap
stores a sequence of immutable roaring bitmaps
in an efficiently serializable, contiguous block of memory.
-
class
roaringbitmap.
RoaringBitmap
(iterable=None)¶ Bases:
object
A compact, mutable set of 32-bit integers.
Return a new RoaringBitmap with elements from
iterable
.The elements
x
of a RoaringBitmap must be0 <= x < 2 ** 32
. Ifiterable
is not specified, a new empty RoaringBitmap is returned. Note that a sorted iterable will significantly speed up the construction.iterable
may be a generator, in which case the generator is consumed incrementally.iterable
may be arange
(Python 3) orxrange
(Python 2) object, which will be constructed efficiently.-
add
(self, uint32_t elem)¶ Add an element to the set.
This has no effect if the element is already present.
-
clamp
(self, uint32_t start, uint32_t stop)¶ Return new set with range of values restricted to
(start, stop)
.
-
clear
(self)¶ Remove all elements from this RoaringBitmap.
-
copy
(self)¶ Return a copy of this RoaringBitmap.
-
debuginfo
(self, verbose=False)¶ Return a string describing the internal representation of this set.
-
difference
(self, *other)¶ Return the difference of two or more sets as a new RoaringBitmap.
(i.e, self - other[0] - other[1] - …)
-
difference_update
(self, *other)¶ Remove all elements of other RoaringBitmaps from this one.
-
discard
(self, uint32_t elem)¶ Remove an element from the set if it is a member.
If the element is not a member, do nothing.
-
flip_range
(self, uint32_t start, uint32_t stop)¶ In-place negation for range(start, stop).
-
freeze
(self)¶ Return an immutable copy of this RoaringBitmap.
-
index
(self, uint32_t x)¶ Return the 0-based index of x in this set.
Equivalent to
sorted(self).index(x)
.
-
intersection
(self, *other)¶ Return the intersection of two or more sets as a new RoaringBitmap.
(i.e. elements that are common to all of the sets.)
-
intersection_len
(self, other)¶ Return the cardinality of the intersection.
Optimized version of
len(self & other)
.
-
intersection_update
(self, *other)¶ Intersect this set in-place with one or more
RoaringBitmap
objects.NB: since range objects are recognized by the constructor, this provides an efficient way to restrict the set to a range of elements:
>>> rb = RoaringBitmap(range(5)) >>> rb.intersection_update(range(3, 7)) >>> rb RoaringBitmap({3, 4})
-
isdisjoint
(self, other)¶ Return True if two RoaringBitmaps have a null intersection.
-
issubset
(self, other)¶ Report whether another set contains this RoaringBitmap.
-
issuperset
(self, other)¶ Report whether this RoaringBitmap contains another set.
-
jaccard_dist
(self, other)¶ Return the Jaccard distance.
Optimized version of
1 - len(self & other) / len(self | other)
. Counts of union and intersection are performed simultaneously.
-
max
(self)¶ Return largest element in this RoaringBitmap.
-
min
(self)¶ Return smallest element in this RoaringBitmap.
-
numelem
(self)¶ Return total number of uint16_t elements stored.
-
pop
(self)¶ Remove and return the largest element.
-
rank
(self, uint32_t x)¶ Return the number of elements
<= x
that are in this set.
-
remove
(self, uint32_t elem)¶ Remove an element from the set; it must be a member.
If the element is not a member, raise a KeyError.
-
select
(self, int i)¶ Return the ith element that is in this set.
Parameters: i – a 0-based index.
-
symmetric_difference
(self, other)¶ Return the symmetric difference of two sets as a new RoaringBitmap.
(i.e. all elements that are in exactly one of the sets.)
-
symmetric_difference_update
(self, other)¶ Update set to symmetric difference of itself and another.
-
union
(self, *other)¶ Return the union of two or more sets as a new set.
(i.e. all elements that are in at least one of the sets.)
-
union_len
(self, other)¶ Return the cardinality of the union.
Optimized version of
len(self | other)
.
-
update
(self, *other)¶ In-place union update of this RoaringBitmap.
With one argument, add items from the iterable to this set; with more arguments: add the union of given
RoaringBitmap
objects.NB: since range objects are recognized by the constructor, this provides an efficient way to set a range of bits:
>>> rb = RoaringBitmap(range(5)) >>> rb.update(range(3, 7)) >>> rb RoaringBitmap({0, 1, 2, 3, 4, 5, 6})
-
-
class
roaringbitmap.
ImmutableRoaringBitmap
(iterable=None)¶ Bases:
roaringbitmap.RoaringBitmap
A roaring bitmap that does not allow mutation operations.
Any operation resulting in a new roaring bitmap is returned as a mutable RoaringBitmap (except for
freeze()
and theImmutableRoaringBitmap
constructor). Stores data in one contiguous block of memory for efficient serialization.Return a new RoaringBitmap with elements from
iterable
.The elements
x
of a RoaringBitmap must be0 <= x < 2 ** 32
. Ifiterable
is not specified, a new empty RoaringBitmap is returned. Note that a sorted iterable will significantly speed up the construction.iterable
may be a generator, in which case the generator is consumed incrementally.iterable
may be arange
(Python 3) orxrange
(Python 2) object, which will be constructed efficiently.-
add
(self, uint32_t elem)¶ Unsupported method.
-
clear
(self)¶ Unsupported method.
-
copy
(self)¶ Return a copy of this RoaringBitmap.
-
difference_update
(self, *other)¶ Unsupported method.
-
discard
(self, uint32_t elem)¶ Unsupported method.
-
flip_range
(self, start, stop)¶ Unsupported method.
-
freeze
(self)¶ Already immutable, return self.
-
intersection_update
(self, *bitmaps)¶ Unsupported method.
-
pop
(self)¶ Unsupported method.
-
remove
(self, uint32_t elem)¶ Unsupported method.
-
symmetric_difference_update
(self, other)¶ Unsupported method.
-
update
(self, *bitmaps)¶ Unsupported method.
-
-
class
roaringbitmap.
MultiRoaringBitmap
(list init, filename=None)¶ Bases:
object
A sequence of immutable roaring bitmaps.
Bitmaps are addressed with 32-bit indices. Everything is stored in a single contiguous block of memory.
>>> mrb = MultiRoaringBitmap([ ... RoaringBitmap({0, 1, 2}), ... RoaringBitmap({1, 6, 8}), ... RoaringBitmap({1, 7, 2})]) >>> mrb.intersection(list(range(len(mrb)))) RoaringBitmap({1}) >>> mrb[0] | mrb[1] RoaringBitmap({0, 1, 2, 6, 8})
Parameters: - init – a list of set-like objects (e.g., RoaringBitmaps).
May contain
None
elements, which are treated as empty sets. - filename – if given, result is stored in an mmap’d file. File is overwritten if it already exists.
-
andor_len_pairwise
(self, array indices1, array indices2, array resultand, array resultor)¶ Pairwise intersection/union cardinality for pairs of roaring bitmaps in this collection given by
zip(indices1, indices2)
.Parameters: - indices1 – input array
- indices2 – input array
- resultand – result array
- resultor – result array
All parameters should be Python arrays of type ‘L’, all preallocated with the same length; result arrays need not be initialized.
>>> result1 = array.array('L', [0] * 3) >>> result2 = array.array('L', [0] * 3) >>> mrb.intersection_card_pairwise(array.array('L', [0, 6, 8]), ... array.array('L', [1, 7, 6]), result1, result2) >>> result1 array.array('L', [3, 2, 56]) >>> result2 array.array('L', [6, 4, 123])
-
bufsize
(self)¶ Return size in number of bytes.
-
close
(self)¶ Close opened file, if any.
-
frombuffer
(type cls, data, int offset)¶ Load a MultiRoaringBitmap from a Python object using the buffer interface (e.g. bytes or mmap object), starting at
offset
.
-
fromfile
(type cls, filename)¶ Load a MultiRoaringBitmap from a file using mmap.
-
get
(self, long i)¶ Return bitmap i as an
ImmutableRoaringBitmap
, orNone
if i is an invalid index.
-
getsize
(self, long i)¶
-
intersection
(self, list indices, uint32_t start=0, uint32_t stop=0xffffffff)¶ Compute intersection of given a list of indices of roaring bitmaps in this collection.
Parameters: - start – optional start index.
- stop – optional end index;
if given, only return elements
n
s.t.start <= n < stop
.
Returns: the intersection as a mutable RoaringBitmap. Returns
None
when an invalid index is encountered or an empty result is obtained.
-
jaccard_dist
(self, array indices1, array indices2)¶ Compute the Jaccard distances for pairs of roaring bitmaps in this collection given by
zip(indices1, indices2)
.>>> mrb.jaccard_dist(array.array('L', [0, 6, 8]), ... array.array('L', [1, 7, 6])) array.array('d', [0.3, 0.2, 0.56])
Parameters: - indices1 – input array
- indices2 – input array
Returns: a Python array of floats with the jaccard distances.
indices1
andindices2
should be arrays of unsigned long integers, created witharray.array('L')
. Ensure that all indices i are in the range0 <= i < len(self)
.
-
jaccard_dist_single
(self, RoaringBitmap rb)¶ Compute the Jaccard distances for rb with all roaring bitmaps in this collection.
>>> mrb.jaccard_dist_single(RoaringBitmap([1, 6, 19, 22])) array.array('d', [0.3, 0.2, 0.56])
Parameters: rb – a roaring bitmap. Returns: a Python array of floats with the jaccard distances with length equal to len(self).
- init – a list of set-like objects (e.g., RoaringBitmaps).
May contain