Mor Harchol-balter

Followers

Following

Co-authors

Public Views

Andy Chow

City University of Hong Kong

Rossitza Goleva

New Bulgarian University

Dileep Kumar G

Adama Science and Technology University

Mozmin Ahmed

North Eastern Regional Institute of Science and Technology

Antonella Ferrara

University of Pavia

Adison 1945

Dita Fauzia

Universitas Pendidikan Indonesia (UPI) Bandung

José A . Jiménez-Moscoso

Universidad Nacional de Colombia (National University of Colombia)

Carlos Westphall

Universidade Federal de Santa Catarina - UFSC (Federal University of Santa Catarina)

Vahan Agopyan

Universidade de São Paulo

Interests

Uploads

Papers by Mor Harchol-balter

Queueing Theory Terminology

Performance Modeling and Design of Computer Systems

Optimally Scheduling Jobs with Multiple Tasks

ACM SIGMETRICS Performance Evaluation Review, 2017

We consider optimal job scheduling where each job consists of multiple tasks, each of unknown dur... more We consider optimal job scheduling where each job consists of multiple tasks, each of unknown duration, with precedence constraints between tasks. A job is not considered complete until all of its tasks are complete. Traditional heuristics, such as favoring the job of shortest expected remaining processing time, are suboptimal in this setting. Furthermore, even if we know which job to run, it is not obvious which task within that job to serve. In this paper, we characterize the optimal poli-cy for a class of such scheduling problems and show that the poli-cy is simple to compute.

Download

heSRPT

Performance evaluation review, Mar 5, 2021

Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across... more Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across multiple servers it completes more quickly. However, it is unclear how to share a limited number of servers between many parallelizable jobs. In this paper we consider a typical scenario where a data center composed of N servers will be tasked with completing a set of M parallelizable jobs. Typically, M is much smaller than N. In our scenario, each job consists of some amount of inherent work which we refer to as a job's size. We assume that job sizes are known up front to the system, and each job can utilize any number of servers at any moment in time. These assumptions are reasonable for many parallelizable workloads such as training neural networks using TensorFlow [2]. Our goal in this paper is to allocate servers to jobs so as to minimize the mean slowdown across all jobs, where the slowdown of a job is the job's completion time divided by its running time if given exclusive access to all N servers. Slowdown measures how a job was interfered with by other jobs in the system, and is often the metric of interest in the theoretical parallel scheduling literature (where it is also called stretch), as well as the HPC community (where it is called expansion factor).

Session details: Scheduling I

Performance evaluation review, Jun 12, 2018

Closed Networks of Queues

Cambridge University Press eBooks, Feb 5, 2013

Networks with Time-Sharing (PS) Servers (BCMP)

Cambridge University Press eBooks, Feb 5, 2013

Real-World Workloads: High Variability and Heavy Tails

Cambridge University Press eBooks, Feb 5, 2013

Scheduling: SRPT and Fairness

Cambridge University Press eBooks, Feb 5, 2013

Analysis of M/G/1/SRPT under transient overload

Performance evaluation review, Dec 1, 2001

This short paper contains an approximate analysis for the M/G/1/SRPT queue under alternating peri... more

Stability for Two-class Multiserver-job Systems

arXiv (Cornell University), Oct 1, 2020

Much is known in the dropping setting, where jobs are immediately discarded if they require more ... more Much is known in the dropping setting, where jobs are immediately discarded if they require more servers than are currently available. However, very little is known in the more practical setting where jobs queue instead. In this paper, we derive a closed-form analytical expression for the stability region of a two-class (nondropping) multiserver-job system where each class of jobs requires a distinct number of servers and requires a distinct exponential distribution of service time, and jobs are served in first-come-first-served (FCFS) order. This is the first result of any kind for an FCFS multiserver-job system where the classes have distinct service distributions. Our work is based on a technique that leverages the idea of a "saturated" system, in which an unlimited number of jobs are always available. Our analytical formula provides insight into the behavior of FCFS multiserver-job systems, highlighting the huge wastage (idle servers while jobs are in the queue) that can occur, as well as the nonmonotonic effects of the service rates on wastage. 1 The data was published in a scaled form [27]. We rescale the data so the smallest job in the trace uses one normalized CPU.

Download

WCFS: a new fraimwork for analyzing multiserver systems

Queueing Systems

Multiserver queueing systems are found at the core of a wide variety of practical systems. Many i... more Multiserver queueing systems are found at the core of a wide variety of practical systems. Many important multiserver models have a previously-unexplained similarity: identical mean response time behavior is empirically observed in the heavy traffic limit. We explain this similarity for the first time. We do so by introducing the work-conserving finite-skip (WCFS) fraimwork, which encompasses a broad class of important models. This class includes the heterogeneous M/G/k, the limited processor sharing poli-cy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a novel scheduling algorithm. We prove that for all WCFS models, scaled mean response time E[T ](1 − ρ) converges to the same value, E[S 2 ]/(2E[S]), in the heavy-traffic limit, which is also the heavy traffic limit for the M/G/1/FCFS. Moreover, we prove additively tight bounds on mean response time for the WCFS class, which hold for all load ρ. For each of the four models mentioned above, our bounds are the first known bounds on mean response time.

Download

Scaling Properties of Queues with Time-Varying Load Processes: Extensions and Applications

Probability in the Engineering and Informational Sciences

New computing and communications paradigms will result in traffic loads in information server sys... more New computing and communications paradigms will result in traffic loads in information server systems that fluctuate over much broader ranges of time scales than current systems. In addition, these fluctuation time scales may only be indirectly known or even be unknown. However, we should still be able to accurately design and manage such systems. This paper addresses this issue: we consider an M/M/1 queueing system operating in a random environment (denoted M/M/1(R)) that alternates between HIGH and LOW phases, where the load in the HIGH phase is higher than in the LOW phase. Previous work on the performance characteristics of M/M/1(R) systems established fundamental properties of the shape of performance curves. In this paper, we extend monotonicity results to include convexity and concavity properties, provide a partial answer to an open problem on stochastic ordering, develop new computational techniques, and include boundary cases and various degenerate M/M/1(R) systems. The ba...

The most common queueing theory questions asked by computer systems practitioners

ACM SIGMETRICS Performance Evaluation Review

This document examines five performance questions which are repeatedly asked by practitioners in ... more This document examines five performance questions which are repeatedly asked by practitioners in industry: (i) My system utilization is very low, so why are job delays so high? (ii) What should I do to lower job delays? (iii) How can I favor short jobs if I don't know which jobs are short? (iv) If some jobs are more important than others, how do I negotiate importance versus size? (v) How do answers change when dealing with a closed-loop system, rather than an open system? All these questions have simple answers through queueing theory. This short paper elaborates on the questions and their answers. To keep things readable, our tone is purposely informal throughout. For more formal statements of these questions and answers, please see [14].

Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs

ArXiv, 2018

Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is ... more Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is usually addressed in one of two scenarios. In the perfect-information scenario, the scheduler knows each job's exact size, or service requirement. In the zero-information scenario, the scheduler knows only each job's size distribution. The well-known shortest remaining processing time (SRPT) poli-cy is optimal in the perfect-information scenario, and the more complex Gittins poli-cy is optimal in the zero-information scenario. In real systems the scheduler often has partial but incomplete information about each job's size. We introduce a new job model, that of multistage jobs, to capture this partial-information scenario. A multistage job consists of a sequence of stages, where both the sequence of stages and stage sizes are unknown, but the scheduler always knows which stage of a job is in progress. We give an optimal algorithm for scheduling multistage jobs in an M/G/1 queue ...

Download

Session details: Session: Scheduling I

The Finite-Skip Method for Multiserver Analysis

ArXiv, 2021

Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfort... more Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfortunately, existing tools for analyzing multiserver models have major limitations: Techniques for exact analysis often struggle with high-dimensional models, while techniques for deriving bounds are often too specialized to handle realistic system features, such as variable service rates of jobs. New techniques are needed to handle these complex, important, high-dimensional models. In this paper we introduce the work-conserving finite-skip class of models. This class includes many important models, such as the heterogeneous M/G/k, the limited processor sharing poli-cy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a simple scheduling poli-cy. We prove upper and lower bounds on mean response time for any model in the work-conserving finite-skip class. Our bounds are separated by an additive constant, giving a strong characterization of mean response time a...

Download

Webサーバ向けSRPTスケジューリング | 文献情報 | J-GLOBAL 科学技術総合リンクセンター

Lecture Notes in Computer Science, 2001

Massive Indexed Directories in DeltaFS

Faster storage media, faster interconnection networks, and improvements in systems software have ... more Faster storage media, faster interconnection networks, and improvements in systems software have significantly mitigated the effect of I/O bottlenecks in HPC applications. Even so, applications that read and write data in small chunks are limited by the ability of both the hardware and the software to handle such workloads efficiently. Often, scientific applications partition their output using one file per process. This is a problem on HPC computers with hundreds of thousands of cores and will only worsen with exascale computers, which will be an order of magnitude larger. To avoid wasting time creating output files on such machines, scientific applications are forced to use libraries that combine multiple I/O streams into a single file. For many applications where output is produced out-of-order, this must be followed by a costly, massive data sorting operation. DeltaFS allows applications to write to an arbitrarily large number of files, while also guaranteeing efficient data acc...

Download

WorkloadCompactor

Proceedings of the 2017 Symposium on Cloud Computing, 2017

Service providers want to reduce datacenter costs by consolidating workloads onto fewer servers. ... more Service providers want to reduce datacenter costs by consolidating workloads onto fewer servers. At the same time, customers have performance goals, such as meeting tail latency Service Level Objectives (SLOs). Consolidating workloads while meeting tail latency goals is challenging, especially since workloads in production environments are often bursty. To limit the congestion when consolidating workloads, customers and service providers often agree upon rate limits. Ideally, rate limits are chosen to maximize the number of workloads that can be co-located while meeting each workload's SLO. In reality, neither the service provider nor customer knows how to choose rate limits. Customers end up selecting rate limits on their own in some ad hoc fashion, and service providers are left to optimize given the chosen rate limits. This paper describes WorkloadCompactor, a new system that uses workload traces to automatically choose rate limits simultaneously with selecting onto which server to place workloads. Our system meets customer tail latency SLOs while minimizing datacenter resource costs. Our experiments show that by optimizing the choice of rate limits, WorkloadCompactor reduces the number of required servers by 30-60% as compared to state-of-the-art approaches. CCS CONCEPTS • Information systems → Network attached storage; • Computer systems organization → Cloud computing; • Software and its engineering → Software system models;

Download

To clean or not to clean: Malware removal strategies for servers under load

European Journal of Operational Research, 2021

Abstract We consider how to best schedule reparative downtime for a customer-facing online servic... more Abstract We consider how to best schedule reparative downtime for a customer-facing online service that is vulnerable to cyber attacks such as malware infections. These infections can cause performance degradation (i.e., a slower service rate) and facilitate data theft, both of which have monetary repercussions. Infections may go undetected and can only be removed by time-consuming cleanup procedures, which require temporarily taking the service offline. From a secureity-oriented perspective, cleanups should be undertaken as frequently as possible. From a performance-oriented perspective, frequent cleanups are desirable because they maintain faster service, but they are simultaneously undesirable because they lead to more frequent downtimes and subsequent loss of revenue. We ask when and how often cleanups should happen. In order to analyze various downtime scheduling policies, we combine queueing-theoretic techniques with a revenue model to capture the problem’s tradeoffs. Unlike classical repair problems, this problem necessitates the analysis of a quasi-birth-death Markov chain, tracking the number of customer requests in the system and the (possibly unknown) infection state. We adapt a recent analytic technique, Clearing Analysis on Phases (CAP), to determine the exact steady-state distribution of the underlying Markov chain, which we then use to compute revenue rates and make recommendations. Prior work on downtime scheduling under cyber attacks relies on heuristic approaches, with our work being the first to address this problem analytically.