Tuesday, September 26, 2017

Probabilistic Data Structures and Stream Data Processing

The scale and the new way of stream processing has given rise to many interesting data structures and algorithms.

Here are some good resources that cover the topics like:

"Some Important Streaming Algorithms You Should Know About", covers several essential data structures and algorithms, from Ted Dunning

"Probabilistic Data Structures for Web Analytics and Data Mining", a really nice overview from Highly Scalable Blog:

A comprehensive list of the papers, presentations and talks by debasishg

Stanford CS369G: Algorithmic Techniques for Big Data

Stanford CS168: The Modern Algorithmic Toolbox

MIRI Seminar on Data Streams (Spring 2015 Edition)

Counting Items, Cardinality

- HashSet
- Linear Probabilistic Counter
- LogLog and HyperLogLog

Frequency Estimate, Top K

- Count Min Sketch
- Count Mean Min Sketch

Membership Query

- Bloom Filter
- Cuckoo Filter

Percentile and Quantile

- Q-digest
- t-digest