RoaringBitmap API documentation¶
Roaring bitmap in Cython.
A Roaring bitmap stores a set of 32 bit integers compactly while allowing for
efficient set operations. The space of integers is partitioned into blocks
of 2 ** 16 integers. The representation for a block depends on the number
of elements it contains:
- <= 4096 elements:
- an array of up to
1 << 1216-bit integers that are part of the set. - >= 61140 elements:
- an array of up to
1 << 1216-bit integers that are not part of the set. - otherwise:
- a fixed bitmap of
1 << 16(65536) bits with a 1-bit for each element.
A RoaringBitmap can be used as a replacement for a mutable
Python set containing unsigned 32-bit integers:
>>> from roaringbitmap import RoaringBitmap
>>> RoaringBitmap(range(10)) & RoaringBitmap(range(5, 15))
RoaringBitmap({5, 6, 7, 8, 9})
ImmutableRoaringBitmap is an immutable variant (analogous to frozenset)
which is stored compactly as a contiguous block of memory.
MultiRoaringBitmap stores a sequence of immutable roaring bitmaps
in an efficiently serializable, contiguous block of memory.
-
class
roaringbitmap.RoaringBitmap(iterable=None)¶ Bases:
objectA compact, mutable set of 32-bit integers.
Return a new RoaringBitmap with elements from
iterable.The elements
xof a RoaringBitmap must be0 <= x < 2 ** 32. Ifiterableis not specified, a new empty RoaringBitmap is returned. Note that a sorted iterable will significantly speed up the construction.iterablemay be a generator, in which case the generator is consumed incrementally.iterablemay be arange(Python 3) orxrange(Python 2) object, which will be constructed efficiently.-
add(self, uint32_t elem)¶ Add an element to the set.
This has no effect if the element is already present.
-
clamp(self, uint32_t start, uint32_t stop)¶ Return new set with range of values restricted to
(start, stop).
-
clear(self)¶ Remove all elements from this RoaringBitmap.
-
copy(self)¶ Return a copy of this RoaringBitmap.
-
debuginfo(self, verbose=False)¶ Return a string describing the internal representation of this set.
-
difference(self, *other)¶ Return the difference of two or more sets as a new RoaringBitmap.
(i.e, self - other[0] - other[1] - …)
-
difference_update(self, *other)¶ Remove all elements of other RoaringBitmaps from this one.
-
discard(self, uint32_t elem)¶ Remove an element from the set if it is a member.
If the element is not a member, do nothing.
-
flip_range(self, uint32_t start, uint32_t stop)¶ In-place negation for range(start, stop).
-
freeze(self)¶ Return an immutable copy of this RoaringBitmap.
-
index(self, uint32_t x)¶ Return the 0-based index of x in this set.
Equivalent to
sorted(self).index(x).
-
intersection(self, *other)¶ Return the intersection of two or more sets as a new RoaringBitmap.
(i.e. elements that are common to all of the sets.)
-
intersection_len(self, other)¶ Return the cardinality of the intersection.
Optimized version of
len(self & other).
-
intersection_update(self, *other)¶ Intersect this set in-place with one or more
RoaringBitmapobjects.NB: since range objects are recognized by the constructor, this provides an efficient way to restrict the set to a range of elements:
>>> rb = RoaringBitmap(range(5)) >>> rb.intersection_update(range(3, 7)) >>> rb RoaringBitmap({3, 4})
-
isdisjoint(self, other)¶ Return True if two RoaringBitmaps have a null intersection.
-
issubset(self, other)¶ Report whether another set contains this RoaringBitmap.
-
issuperset(self, other)¶ Report whether this RoaringBitmap contains another set.
-
jaccard_dist(self, other)¶ Return the Jaccard distance.
Optimized version of
1 - len(self & other) / len(self | other). Counts of union and intersection are performed simultaneously.
-
max(self)¶ Return largest element in this RoaringBitmap.
-
min(self)¶ Return smallest element in this RoaringBitmap.
-
numelem(self)¶ Return total number of uint16_t elements stored.
-
pop(self)¶ Remove and return the largest element.
-
rank(self, uint32_t x)¶ Return the number of elements
<= xthat are in this set.
-
remove(self, uint32_t elem)¶ Remove an element from the set; it must be a member.
If the element is not a member, raise a KeyError.
-
select(self, int i)¶ Return the ith element that is in this set.
Parameters: i – a 0-based index.
-
symmetric_difference(self, other)¶ Return the symmetric difference of two sets as a new RoaringBitmap.
(i.e. all elements that are in exactly one of the sets.)
-
symmetric_difference_update(self, other)¶ Update set to symmetric difference of itself and another.
-
union(self, *other)¶ Return the union of two or more sets as a new set.
(i.e. all elements that are in at least one of the sets.)
-
union_len(self, other)¶ Return the cardinality of the union.
Optimized version of
len(self | other).
-
update(self, *other)¶ In-place union update of this RoaringBitmap.
With one argument, add items from the iterable to this set; with more arguments: add the union of given
RoaringBitmapobjects.NB: since range objects are recognized by the constructor, this provides an efficient way to set a range of bits:
>>> rb = RoaringBitmap(range(5)) >>> rb.update(range(3, 7)) >>> rb RoaringBitmap({0, 1, 2, 3, 4, 5, 6})
-
-
class
roaringbitmap.ImmutableRoaringBitmap(iterable=None)¶ Bases:
roaringbitmap.RoaringBitmapA roaring bitmap that does not allow mutation operations.
Any operation resulting in a new roaring bitmap is returned as a mutable RoaringBitmap (except for
freeze()and theImmutableRoaringBitmapconstructor). Stores data in one contiguous block of memory for efficient serialization.Return a new RoaringBitmap with elements from
iterable.The elements
xof a RoaringBitmap must be0 <= x < 2 ** 32. Ifiterableis not specified, a new empty RoaringBitmap is returned. Note that a sorted iterable will significantly speed up the construction.iterablemay be a generator, in which case the generator is consumed incrementally.iterablemay be arange(Python 3) orxrange(Python 2) object, which will be constructed efficiently.-
add(self, uint32_t elem)¶ Unsupported method.
-
clear(self)¶ Unsupported method.
-
copy(self)¶ Return a copy of this RoaringBitmap.
-
difference_update(self, *other)¶ Unsupported method.
-
discard(self, uint32_t elem)¶ Unsupported method.
-
flip_range(self, start, stop)¶ Unsupported method.
-
freeze(self)¶ Already immutable, return self.
-
intersection_update(self, *bitmaps)¶ Unsupported method.
-
pop(self)¶ Unsupported method.
-
remove(self, uint32_t elem)¶ Unsupported method.
-
symmetric_difference_update(self, other)¶ Unsupported method.
-
update(self, *bitmaps)¶ Unsupported method.
-
-
class
roaringbitmap.MultiRoaringBitmap(list init, filename=None)¶ Bases:
objectA sequence of immutable roaring bitmaps.
Bitmaps are addressed with 32-bit indices. Everything is stored in a single contiguous block of memory.
>>> mrb = MultiRoaringBitmap([ ... RoaringBitmap({0, 1, 2}), ... RoaringBitmap({1, 6, 8}), ... RoaringBitmap({1, 7, 2})]) >>> mrb.intersection(list(range(len(mrb)))) RoaringBitmap({1}) >>> mrb[0] | mrb[1] RoaringBitmap({0, 1, 2, 6, 8})
Parameters: - init – a list of set-like objects (e.g., RoaringBitmaps).
May contain
Noneelements, which are treated as empty sets. - filename – if given, result is stored in an mmap’d file. File is overwritten if it already exists.
-
andor_len_pairwise(self, array indices1, array indices2, array resultand, array resultor)¶ Pairwise intersection/union cardinality for pairs of roaring bitmaps in this collection given by
zip(indices1, indices2).Parameters: - indices1 – input array
- indices2 – input array
- resultand – result array
- resultor – result array
All parameters should be Python arrays of type ‘L’, all preallocated with the same length; result arrays need not be initialized.
>>> result1 = array.array('L', [0] * 3) >>> result2 = array.array('L', [0] * 3) >>> mrb.intersection_card_pairwise(array.array('L', [0, 6, 8]), ... array.array('L', [1, 7, 6]), result1, result2) >>> result1 array.array('L', [3, 2, 56]) >>> result2 array.array('L', [6, 4, 123])
-
bufsize(self)¶ Return size in number of bytes.
-
close(self)¶ Close opened file, if any.
-
frombuffer(type cls, data, int offset)¶ Load a MultiRoaringBitmap from a Python object using the buffer interface (e.g. bytes or mmap object), starting at
offset.
-
fromfile(type cls, filename)¶ Load a MultiRoaringBitmap from a file using mmap.
-
get(self, long i)¶ Return bitmap i as an
ImmutableRoaringBitmap, orNoneif i is an invalid index.
-
getsize(self, long i)¶
-
intersection(self, list indices, uint32_t start=0, uint32_t stop=0xffffffff)¶ Compute intersection of given a list of indices of roaring bitmaps in this collection.
Parameters: - start – optional start index.
- stop – optional end index;
if given, only return elements
ns.t.start <= n < stop.
Returns: the intersection as a mutable RoaringBitmap. Returns
Nonewhen an invalid index is encountered or an empty result is obtained.
-
jaccard_dist(self, array indices1, array indices2)¶ Compute the Jaccard distances for pairs of roaring bitmaps in this collection given by
zip(indices1, indices2).>>> mrb.jaccard_dist(array.array('L', [0, 6, 8]), ... array.array('L', [1, 7, 6])) array.array('d', [0.3, 0.2, 0.56])
Parameters: - indices1 – input array
- indices2 – input array
Returns: a Python array of floats with the jaccard distances.
indices1andindices2should be arrays of unsigned long integers, created witharray.array('L'). Ensure that all indices i are in the range0 <= i < len(self).
-
jaccard_dist_single(self, RoaringBitmap rb)¶ Compute the Jaccard distances for rb with all roaring bitmaps in this collection.
>>> mrb.jaccard_dist_single(RoaringBitmap([1, 6, 19, 22])) array.array('d', [0.3, 0.2, 0.56])
Parameters: rb – a roaring bitmap. Returns: a Python array of floats with the jaccard distances with length equal to len(self).
- init – a list of set-like objects (e.g., RoaringBitmaps).
May contain