The scale and the new way of stream processing has given rise to many interesting data structures and algorithms.
Here are some good resources that cover the topics like:
"Some Important Streaming Algorithms You Should Know About", covers several essential data structures and algorithms, from Ted Dunning
"Probabilistic Data Structures for Web Analytics and Data Mining", a really nice overview from Highly Scalable Blog:
A comprehensive list of the papers, presentations and talks by debasishg
Stanford CS369G: Algorithmic Techniques for Big Data
Stanford CS168: The Modern Algorithmic Toolbox
MIRI Seminar on Data Streams (Spring 2015 Edition)
Counting Items, Cardinality
- HashSet- Linear Probabilistic Counter
- LogLog and HyperLogLog
Frequency Estimate, Top K
- Count Min Sketch- Count Mean Min Sketch
Membership Query
- Bloom Filter- Cuckoo Filter
Percentile and Quantile
- Q-digest- t-digest
No comments:
Post a Comment