Research & Papers

QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models

New research partitions workflow configurations to guarantee Quality of Service constraints with 27.38% better performance.

Deep Dive

A research team led by Md Hasanur Rashid and five collaborators has introduced QoSFlow, a novel performance modeling method designed to ensure Quality of Service (QoS) constraints in distributed scientific workflows. With the increasing complexity of distributed computing environments, guaranteeing QoS metrics like execution time limits or resource subset restrictions has become critical yet challenging due to unpredictable workflow behavior. QoSFlow addresses this by partitioning a workflow's execution configuration space into regions with similar performance characteristics, enabling systematic reasoning about QoS scheduling without requiring exhaustive testing of every possible configuration. The method's interpretable sensitivity models provide transparency into why certain configurations perform better, making it particularly valuable for scientific and enterprise applications where predictable performance is essential.

The technical approach involves grouping configurations with comparable execution times according to statistical sensitivity, allowing schedulers to make informed decisions based on analytical models rather than trial-and-error. In evaluations across three diverse workflows, QoSFlow's execution recommendations demonstrated a 27.38% performance improvement over the best-performing standard heuristic. The system has been validated empirically, with its recommended configurations consistently matching measured execution outcomes across different QoS constraints. Scheduled for publication at the 40th IEEE International Parallel & Distributed Processing Symposium (IPDPS) in 2026, QoSFlow represents a significant advancement in performance-aware scheduling for distributed systems, potentially transforming how organizations manage complex computational workflows in cloud and high-performance computing environments.

Key Points
  • QoSFlow partitions workflow configurations into performance-similar regions using interpretable sensitivity models
  • Outperforms best standard scheduling heuristics by 27.38% across three diverse workflow evaluations
  • Provides consistent QoS guarantees by matching recommended configurations to measured execution outcomes

Why It Matters

Enables predictable performance for distributed AI training, scientific simulations, and enterprise workflows in cloud environments.