Microservices orchestration - Zeebe

Opensource microservice orchestration engine, implemented in highly scalable, highly performant and fault tolerant way using state of the art, proven technology building blocks

  • Using Raft protocol for consensus and replication

  • Using RocksDB, an embeddable high performance key-value store, for its persistency

  • All process instances are auditable, and audit records can be sent to Elasticsearch or Kafka

  • Using an efficient, high performance binary protocol (gRPC) for the clients to connect and receive tasks for execution

Important features

  • Timers surviving node failures without interruption

  • Correlation of incoming messages to existing workflow instances across the cluster nodes

  • Capability to handle non correlating cases - important for cross payment hub engine flows

  • Handle failover situation without interruption (3 nodes: 1 can fail, or 5 nodes: 2 can fail) - utilizing Raft protocol for consensus, hand over “leadership” in case failed nodes and replication of state

Focus on system integrators

  • BPMN let implementation team focus on the business flow changes inside the given DFSP, making adoption and customization much easier

  • Multi programming language support for component implementation, let implementation team to create the necessary connector components in any of the supported languages, the engine is able to orchestrate the steps. (e.g. multiple AMS integration is required), using gRPC protocol for efficient communication

  • Connector components are stateless, easy and simple implementation

  • No additional components required in the realtime engines, no databases to maintain, all active process states are stored in the embedded RocksDB data store. Auditlogs, and other events are stored in Kafka clusters before storing them

  • Scalability - using partitioning (sharding), the system scales horizontally to create thousands of workflow instances per second

  • Fault tolerance enables, that system recovers from a failure without data loss, and using replication to define the number of copies for all instances In case of failures, “leader election” is taking place for the given partitions to continue operations uninterrupted (using Raft Consensus and Replication Protocol)

Last updated