Self-Organizing Distributed Workflow Management

Research Database / FORSCHUNGSDATENBANK

Publication 3177829 | Published

Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks

Login for users with Unibas email account...

Login for registered users without Unibas email account...

Self-Organizing Distributed Workflow Management

Thesis (Dissertationen, Habilitationen)

ID	3177829
Author	Stojnić, Nenad
Author at UniBasel	Stojnic, Nenad
Year	2015
Title	Self-Organizing Distributed Workflow Management
Type of Thesis	Dissertation
Start of thesis	01.03.2011
End of thesis	15.05.2015
Name of University	University of Basel
Name of Faculty	Philosophisch-Naturwissenschaftliche Fakultät
Supervisor(s) / Fachvertreter/in	Schuldt, Heiko
Abstract	The proliferation of service-oriented architectures in the last decade has brought forward an important class of sophisticated distributed applications that are founded on the idea of composing multiple simple services into a complex, coherent whole. Such applications spanning multiple service invocations can be most effectively realized by means of workflows. When it comes to high performance workflow execution, distribution (outscaling) of services is a key concept and also a very straightforward advantage of the workflow paradigm. Concretely, both the constituent services of the workflow and the system that manages their invocations have to be distributed across an environment of computational devices. In a wide spectrum of applications, that entail heterogeneity of the encompassed computational devices, e.g., modern emergency management, invocations of optimal service instances in conjunction to their reliability are fundamental prerequisites of distributed workflow management. At the center of this thesis is a formal model that defines the distributed (i.e., scalable) execution of workflows. To extend this model for reliability in a novel way, which does not affect the scalability of execution, the Safety-Ring system service is presented. The idea behind Safety-Ring is to offer recovery for a wide range of node failures, which host active services of running workflows. To this end, the Safety-Ring provides a scalable, reliable, and consistent data store that is used for the storage of workflow execution state. The novel failure-recovery mechanism features effective reliability such that can be applied on the nodes that host the Safety-Ring service themselves, thus we say the Safety-Ring is self-healing. To apply the reliable (and distributed) execution model, enhanced by Safety-Ring, to heterogeneous node environments, that are predominantly composed of mobile devices, this thesis presents the Compass data access protocol. In providing scalable data lookup for its maintained data, the Safety-Ring assumes network runtime characteristics which are rather stable, and thus Safety-Ring implicitly optimizes for the number of queried nodes. Especially in mobile applications, where node network connectivity dynamically changes, data access protocols should aim at reducing the overall data lookup latency, rather than the number of queried nodes. Compass introduces latency optimal paths to each node, which dynamically adapt to changing network characteristics. The scalable data lookup of Safety-Ring is not affected. In case distributed execution of workflows spans services of continuous (stateful) type, their reliability is decoupled from the Safety-Ring. Since such service types are predominantly featured by devices of limited resources, novel approaches to resource conservative recovery of failures for continuous services have to be provided. This thesis builds on proven recovery techniques, such as passive-standby, so as to enhance them for redundancy of the continuous state and thus improve the overall reliability of the system. In doing so the redundancy of state is enforced by means of a lightweight, in terms of network overhead, consistency protocol which allows for its application in resource limited node environments. In order to improve the execution performance of distributed workflows, in terms of throughput, this thesis offers a novel concept to services distribution. At the heart of our approach lie decentralized controllers that autonomously perform dynamic reconfiguration of the execution environment in terms of available services. This primarily affects the Safety-Ring service and all application services. Thereby, the goals of the controllers are to prevent workflow execution bottlenecks and unnecessary service deployments that waste resources. Since the controllers are equipped at any node in the system and can affect any other node of the system we say that the distributed workflow execution model is self-optimizing. Finally, all the presented concepts are implemented within the context of the OSIRIS distributed workflow engine and quantitatively evaluated in a series of experiments. The results of experiments confirm the benefits of our concepts for the distributed workflow execution model.
Full Text on edoc

03/05/2024

Research Database / FORSCHUNGSDATENBANK