Real-time Stream Processing
Hazelcast Jet and Stream Processing
The data streams are potentially unbounded and infinite sequences of records. In real world, the records usually represent events or changes that happen in time. The applications connected to such a data stream are observing the flowing records and they’re extracting an information from it. They literally query the stream for relevant data.
Jet is a tool for do such a querying in scalable and reliable way. It employs distributed and parallel computations – the Jet Jobs – to do so.
On top of that, Jet embeds the Hazelcast IMDG to be used to:
- Store the operational results of the computation in an in-memory NoSQL KV store
- Cache the data necessary for a record enrichment, pre-processing and data cleaning
- In-memory messaging and notifications
Hazelcast Jet gives you the tooling to build such a streaming application. It gives you the powerfull processing Framework to „query“ the data stream and the elastic in-memory storage to store the results of the computation.
Fast Big Data
The value of information contained in the data stream decreases rapidly with the data getting older. The faster the information is extracted from the stream and provided to consumers — the better. The streams with on-line trades, system log records, IoT sensor updates or orders from an e-shop are an examples. In those use-cases, processing the data fast is a same importance as processing the big volumes of data.
Jet is build on top of the one-record-per-time streaming core (also known as continuous operators). That refers to processing incoming record as soon as possible, opposed to accumulating records to micro-batches.
The fast processing can be extended by using Hazelcast in-memory data grid for publishing the results, thus achieving low end-to-end latency.
The Jet processing tasks, called Jobs, are distributed across Jet cluster to parallelize the computation. Jet is able to scale-out this way to process the big data volumes.
Hazelcast Jet Moves Big Data Processing to Real-Time
Jet is a good fit for applications where the real-time value of Big Data is a top priority.
- Log analysis
- Fraud detection
- Anomaly detection (IoT systems, sensors)
- Fast business insights
- Cleaning the data for downstream processing (filtering, modifying, normalising, enriching)
- Real-time ad placement
- Real-time recommendations
- Online gaming stats
Typical Processing Tasks
- Implementing Change data capture (CDC)
- Moving the batch tasks to near real-time
- Extraction of the information from the data stream
- Algorithmic analysis of the stream data
- Collecting stats from stream data (including aggregations like sums and averages)
- Joining multiple streams
- Enriching stream with another information
Jet in 5 minutes