Where is the Jet Documentation?
How are Hazelcast Jet and Hazelcast IMDG related?
Jet is built on top of the Hazelcast platform, so there is a tight integration between Jet and Hazelcast IMDG.
JetInstance embeds a Hazelcast Instance. The full storage functionality of Hazelcast IMDG is available inside Jet. Hazelcast Operations are used for different actions that can be performed on a job. Jet can also be used with the Hazelcast Client, which uses the Hazelcast Open Binary Protocol to communicate different actions to the server instance.
Is Jet an Apache project?
Jet isn’t an Apache project, however it comes with an Apache 2 license.
Where can I get more help?
For information on support subscriptions, please visit Hazelcast.com.
Can I use Jet interchangeably with any version of Hazelcast IMDG?
Hazelcast IMDG is an integral part of each Jet release and cannot be swapped out as Jet relies on the services specific to each individual IMDG release.
Nevertheless, a Jet job can use any remote Hazelcast IMDG as a source or sink.
Can I use Hazelcast IMDG for data ingestion?
Yes, data can be ingested into Jet using the distributed data structures of Hazelcast IMDG.
Data producers can use the Hazelcast Client (available for numerous programing languages) to push data into Hazelcast IMDG.
Using Hazelcast IMDG to ingest data ensures a smoother deployment.
For batch (bounded data) processing, Jet comes with IMap, ICache and IList batch connectors that iterate over the entries in order to process them.
Hazelcast Client writes the data to a Hazelcast IMap. When the entire data set is written, Jet reads the data from the IMap and processes the batch.
Jet 0.5 will contain a change stream (journal) source for IMap and ICache, so all updates of the IMap/ICache will be streamed directly to Jet.
Values are updated in an Hazelcast IMDG IMap, each update produces a change event. Hazelcast Jet processes the stream of change events of an IMap.
Data streams are ordered sequences of records, similar to append-only logs. Right now, Hazelcast IMDG does not provide an append-only log structure. Therefore, to ingest data of this type, consider using Apache Kafka. Note, Jet comes with an Apache Kafka connector. However, we do intend to provide a distributed append-only in-memory log in future versions of Hazelcast IMDG.
Records are appended to a log in Apache Kafka. Hazelcast Jet streams from Apache Kafka.
Hazelcast IMDG Computing and Jet Questions
What are the differences between Hazelcast Jet and Hazelcast IMDG Fast-Aggregations?
Hazelcast IMDG has native support for aggregation operations on the contents of its distributed data structures – Fast-Aggregations.
When Hazelcast IMDG is a better fit:
Fast-Aggregations are a good fit for simple operations (count, distinct, sum, avg, min, max, etc.).
Hazelcast IMDG Fast-Aggregations may not be sufficient with operations that group data by key and produce results of size O(keyCount). The architecture of Hazelcast aggregations is not well suited to this use case, although it will still work even for moderately-sized results (up to 100 MB, as a ballpark figure).
When Hazelcast Jet is a better fit:
Beyond the numbers quoted above, and whenever something more than a single aggregation step is needed, Jet becomes the preferred choice.
For more information see the docs: Jet Compared with New Aggregations.
What are the differences between Hazelcast Jet and Hazelcast IMDG EntryProcessor?
An Entry Processor is a function that executes your code on a map entry in an atomic way. Instead of calling get and set, it is used to mutate the map entry by executing logic directly on the JVM where the data resides, therefore in one step reducing the network hops and providing atomicity. It is intended to be used for fast mutating operations.
Consider using Hazelcast IMDG with…
An Entry Processor that performs bulk mutations of an IMap, where the processing function is fast and involves a single map entry per call.
Consider using Hazelcast Jet with…
Processing that involves multiple entries (aggregations, joins, etc.), or involves multiple computing steps to be made parallel, or when the data source and sink are not a single IMap instance.
Which API should I use?
There are multiple Hazelcast Jet APIs available: java.util.stream API, High-Level Pipeline API (available in Jet 0.5), and Jet Core API.
|java.util.stream||* Pipeline API (High-Level API)||Core API (DAG API)|
|Declarative (what) x Imperative (how)||Declarative||Declarative||Imperative|
|Processing bounded data (batch)|
|Processing unbounded data (streaming)|
* Available in Jet 0.5
Does Jet support Apache Beam?
No, the performance of Jet is based on optimizations that wouldn’t be available via Beam (for example, Beam assumes a fully general, opaque window assignment policy). Therefore, we chose to build our own Pipelines API instead of Beam as a primary high-level API.
However, depending on user demand, we may implement Beam in the future.
Does Jet support SQL?
No, SQL involves many additional layers beyond the obvious (select from input stream, insert to output stream), and since we are focused on a Java programming audience, we have no plans at this point to add SQL support.
Features and Roadmap Questions
What is the unique value of Jet?
- Performance – For both stream (unbounded data) and batch (bounded data) processing, Jet is able to process data with an impressive throughput capacity, with very low latency even under increasing load. See the benchmarks.
- Integration with IMDG – IMDG provides scalable in-memory data storage to be used during processing (as source, sink, operational storage for cached/temporary data). Hazelcast IMDG and Jet are engineered to work together for high performance.
- Simplicity – Jet is simple to deploy as it shares the lower stack with Hazelcast IMDG. Jet is application-embeddable for OEMs and microservices, making it is easier for manufacturers to build and maintain next generation systems.
What is the current focus of Jet?
High-Level API (Pipelines), guarantees for stream processing and high performance Hazelcast integrations.
Are Jet nodes highly-available?
In its current version, Hazelcast Jet can detect a member failure, report it and abort the running jobs. In 0.5, Jet will create regular snapshots and restart processing from the last snapshot in case of member failure.
Is Jet at-least-once or exactly-once?
Jet 0.4 guarantees neither.
Jet 0.5 will provide at-least once, exactly-once and no guarantee (best effort) – based on the user configuration.
Is it possible to rescale a Jet cluster?
A cluster can be dynamically rescaled. The change in the cluster topology affects new jobs and the embedded in-memory storage. Running jobs aren’t affected, however these running jobs won’t make use of the expanded cluster. In the future, we plan to allow running jobs to take advantage of an increased cluster for computation.
When is Jet 1.0 expected?
We plan to introduce API stability from the release of Jet 1.0 onwards. We expect this to happen in early 2018.
Releasing the 0.x versions gives the Jet team more flexibility in changing both the API and the internals.
All the features released to date are of production quality. It’s recommended to use Jet in production if the features available fit your use case.