Management Science
Vol. 68, No. 6, June 2022, pp. 4261–4278
Microsoft Research, Redmond, Washington 98052
Contact:, (GaA); (GeA); rberry@eecs., (RB); (NI);, (KM); (AS)
Received: June 14, 2018 Abstract. Lean and agile models of product development organize the flexible capacity to
Revised: September 11, 2019; rapidly update individual products in response to customer feedback. Although agile op-
Accepted: April 5, 2021 explaining when firms choose to become agile are validated and understood. We study
Accepted: April 5, 2021 explaining when firms choose to become agile are validated and understood. We study
Published Online in Articles in Advance: these questions using data on the development of mobile apps, which occurs through the
October 5, 2021
dynamic release of new versions into the mobile app marketplace, and the apps’ customer ratings. We develop a structural model estimating the dependence of product versioning
on (a) market feedback in the form of customer ratings against (b) project and work-based
Copyright: © 2021 INFORMS straints. In contrast to when they actually benefit from operational agility, firms become ag-
straints. In contrast to when they actually benefit from operational agility, firms become ag-
ile when launching riskier products (in terms of uncertainty in initial customer reception)
and less agile when they are able to exploit scale economies from coordinating develop-
ment over a portfolio of apps. Agile operations increase firm payoffs by margins of 20% to
80%, and interestingly, partial agility is often sufficient to capture the bulk of these returns.
Finally, turning to a question of marketplace design, we study how the mobile app market-
place should design the display of ratings to incentivize quality (increasing app categories’
average user satisfaction rates by as much as 22%).
Keywords: agile product development • empirical operations management • mobile apps • online marketplace • product quality and reviews •
product versioning and innovation • structural estimation
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
4262 Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS
agile developer is responsive to customers. Specifically, (see, e.g., Zhang and Seidmann 2010, Brecko 2017) in
when customers are dissatisfied, the developer recog- this area seek to explain pricing and innovation rates
nizes their dissatisfaction and utilizes its agile capacity through consumer demand features. However, al-
to expedite an improved product release. In this way, though the existing literature largely assumes that
operational agility lies at the intersection of product de- firms are essentially agile, in reality, firms institute
velopment and supply operations. Whereas past litera- organizational constraints around their internal pro-
ture has called for organizing the supply chain to pro- cesses and development timelines (see, e.g., Krishnan
duce flexibly and responsively when market demand is and Ulrich 2001). Even in the software and app devel-
substantially uncertain (see, e.g., Fisher 1997), opera- opment industry, top-down constraints are common-
tional agility mandates that product improvements cor- place; for example, the ridesharing app Lyft announces
respond to customer feedback. a new update every week, whereas the academic
Our empirical study uses data from the marketplace course platform Canvas releases versions every three
for mobile apps. This market is economically signifi- weeks. In other cases, firms schedule around develop-
cant in its own right, surpassing $82 billion in 2017 ment “sprints” that can be more flexibly planned.
revenues and projecting $120 billion in 2019.3 We col- More closely reflecting these issues, a second stream
lect data for a sample of apps from Apple’s iOS App of literature explains how firms’ product development
processes should be tailored to the market and produc-
Store, which is one of the two major mobile app
tion uncertainties they face. MacCormack and Verganti
stores.4 Across the various genre categories, our panel
(2003) empirically supported the value of early market
dataset records apps’ weekly reviews and ratings, in-
feedback, and Terwiesch and Loch (1999) explained
cluding version information, in addition to app-level
how the early resolution of uncertainty allows firms to
characteristics. As a third-party firm separate from the
boost project-level efficiencies by planning overlapping
mobile app store, the developer typically releases an
development activities. Loch et al. (2001), Erat and
app into a mobile app store, and then adds, removes,
Kavadias (2008), and Sommer et al. (2009) showed that
and edits features of the app in subsequent version
choosing sequential product innovation is more appro-
updates. Importantly for our study, the iOS platform
priate than parallel innovation when feedback is imper-
makes its entire selection of products and reviews
fect but the innovation space is constrained in its com-
available in a single marketplace, therefore allowing
plexity. Krishnan and Ramachandran (2011) considered
us to observe both when the developer releases new implications for modular design, and Nagarajan et al.
versions and the concurrent ratings of apps. (2018) relatedly address the product design of apps.
Applying econometric and structural empirical MacCormack et al. (2001) document that flexibility in
methods to our data, we consider four basic questions. the development process significantly complements ac-
(1) Can the empirical pattern and timing of a develop- cess to early market response. A broader literature with-
er’s app versioning be used to characterize and quanti- in operations similarly links organizational design and
fy operational agility? (2) What factors drive the choice strategy to the ability to react to information over time
to be agile in practice? (3) By comparing agile and non- (see, e.g., Mendelson 2000, Petruzzi and Dada 2001,
agile models, what is the value of agility both for the Lurie and Swaminathan 2009). Some recent literature
firm and in its broader effect on product quality? How (Gupta et al. 2020, Yoo et al. 2020) analytically studies
much agility should a firm pursue knowing that agility operationalization of agile methods.
can be expensive to pursue for organizations? (4) Last- In relation to these literatures, our contribution ad-
ly, supposing that product quality is improved by agile dresses operational agility in theory and in practice. In
versioning, can the marketplace operator (e.g., Apple, increasingly many contexts, firms collect continuous
in the role of managing the iOS ecosystem and App product feedback, placing greater stress on their organi-
Store) incentivize agility? zational infrastructures. Using data from the app econ-
omy, we characterize the extent to which agile organi-
1.1. Related Literature zation has been, and moreover should be, adopted.
In operations, the most extensive literature regard- Methodologically, we study firms’ product version-
ing versioning addresses the static assortments or ing decisions by developing and estimating a structural
product lines (see, e.g., Shapiro and Varian 1998, Ghose model. We add to the fast-growing structural empirical
and Sundararajan 2005, Netessine and Taylor 2007, literature on consumer-facing operations and services,
Bhargava and Choudhary 2008) produced by firms in including Allon et al. (2011), Akşin et al. (2013), Li et al.
response to market characteristics. A significantly nar- (2014), Yu et al. (2016), and Moon et al. (2018), and on
rower body of papers focuses on intertemporal prod- apps (Mendelson and Moon 2016, 2018).
uct versioning and upgrades. Both the early modeling Our contribution lastly informs the informational de-
literature (see, e.g., Dhebar 1994, Fudenberg and Tirole sign problem of how marketplaces should convey rat-
1998) and more recent modeling and empirical work ings to incentivize suppliers. The focus on managing
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4263
developers’ incentives stands in contrast to the exten- Table 1. Characteristics of the Sampled iOS Apps
sive literature studying ratings from the perspective
Mean Median
of consumer choice (see, e.g., Chevalier and Mayzlin
2006, Li and Hitt 2008, Luca 2016). Our study also dif- Entire sample
fers from Dellarocas (2005, 2006), which focuses on Price $1.81 $0
No. of Text Reviews 49.8 3
robustness to strategic manipulation in designing
App size (MB) 36.7 14
online marketplace ratings. The design problem we Subsample (18%) offering in-app purchases
address resembles that of innovative contests, where In-app purchase assortment size 3.4 2
development activities are likewise delegated (see, In-app purchase price $5.04 $1.99
e.g., Terwiesch and Xu 2008, Boudreau et al. 2011, Ales
et al. 2017). The issues we study extend beyond apps submitted ratings and the written reviews themselves
inhabiting mobile devices, for instance, on emerging are made accessible upon further clicking and scrolling
home assistance platforms, such as Amazon’s Alexa on apps’ product pages. In managing the platform’s
and Google Home. reviews, Apple actively screens for fake reviews and
bars users from leaving multiple reviews. In Online
2. Data and Background Appendix A, we further describe how ratings and re-
Collected in February 2015, our dataset randomly views have been managed and used by the iOS App
samples 2,035 apps from Apple’s iOS App Store.5 For Store, including recent policy changes, and address the
each app, we record its genre category, its then- limited instances in which previously submitted re-
current price, and the size and prices of its in-app pur- views are overwritten.
chase assortment. Importantly for our study, we re- From the product page, we collected each app’s his-
cord apps’ publicly available histories of reviews and tory of reviews, including for each the date of submis-
ratings. We describe below how summary ratings are sion, the version of the app addressed, its star rating
formed from apps’ histories of individual ratings and and review text, and the submitting username. Table 1
then used prominently on product pages and in displays the sample’s summary statistics. Up through
search results to influence future downloads. the time we collected the data, our apps averaged 50
We organize an unbalanced long panel dataset, text reviews, evidencing an upward skew vis-à-vis the
with an average of 95 weeks observed for each app sample’s median of three reviews received. In terms of
over the overall studied period of 349 weeks. The da- prices, the median app is free with no in-app pur-
taset covers from the release of each app’s first version chases offered. Eighteen percent of the apps offer
up through the time we collected the data or, if earlier,
in-app purchases, with the median offering compris-
the app’s termination, which we define as not having
ing two purchase options for $1.99 each.
received any reviews for two months.
Table 2. Age and Frequency of Versioning Overall and for the 5 Largest Genres
(a) (b)
business categories. Our sample’s apps have an aver- developer choosing to invest in and release a new ver-
age age of approximately 1.8 years. sion, averages three stars upon updating. Reflecting the
Versioning typically results in improved user ratings. difficulty of further improving a high-performing app
Across all versioning episodes observed in our data, and the variability of single-day (preupdate) ratings,
Figure 1 compares the ratings received by apps on the ratings modestly decline on average for versioning
days immediately prior to and subsequent to the new apps already recording four stars or above.
version’s release. As Figure 1(a) shows, postupdate rat- More generally, as shown in Figure 2(a), a version’s
ings at 3.62 stars are 2.6% higher (significant at 0.01 lev- ratings peak shortly postrelease and then trend mildly
el) than the preupdate 3.53 stars on average. More gran- downward over its lifetime. Similarly, Figure 2(b) il-
ularly, Figure 1(b) shows apps’ postupdate ratings lustrates that the daily volume of reviews logged for a
averages conditional on their preupdate ratings aver- version drops substantially with the time elapsed
ages, divided into half-star interval bins. Poorly rated since its release. Across versioning episodes, we find
mobile apps reaped the greatest benefit from version- that the weekly volume of reviews increases on aver-
ing; for instance, a one-star-rated app, conditional on its age by 20% from 3.59 preupdate to 4.33 postupdate.
(a) (b)
Note. (a) Daily ratings average; (b) daily reviews volume average.
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4265
Table 3. Average Portfolio Size, Market Risk, and Competition for the 5 Largest App Genres
newly updated versions of its iOS mobile app. Version These are represented by the vector θi : {qi,0 , δi , ci ,
updates improve the app’s quality of user experience zi , pi }. qi,0 is app i’s initial quality state (as Section 3.4
by adding or polishing features, addressing bugs and details, the quality state corresponds to a current
deficiencies, and meeting users’ evolving needs. At the “probability of user satisfaction”), and δi is the weekly
same time, prospective new users use the app’s sum- rate at which such quality decays; ci, zi, and pi charac-
mary ratings to impute a perceived quality level before terize i’s costs and benefits from versioning, which we
deciding whether to download. Consequently, our explain below in the appropriate context.
model’s Markov state includes both the app’s underly- For the remainder of Section 3, which models a sin-
ing quality over time, which affects users’ actual experi- gle app, we typically drop the subscript i. The subse-
ences, and ratings that affect its perceived quality. quent estimation will allow apps to differ in their
In the remainder of Section 3, we present the devel- quality and cost parameters θi.
oper’s versioning Markov decision process and its op-
timal solution. It is worth emphasizing that our model 3.3. Weekly Rewards and Actions
addresses two distinct types of app versioning. The de- The developer collects his or her stream of rewards
veloper’s optimal solution to its Markov decision pro- from the app, discounted at the weekly rate γ ∈ (0, 1),
cess captures her agile versioning behavior (namely, over the discrete time horizon t 1, 2, 3: : : . The weekly
her versioning that is highly responsive to the state of reward r(s, a) accrues in state s when updating action a
ratings). Separately, our model allows the same devel- is taken. This choice is binary, and thus at ∈ {0, 1} is an
oper to exhibit a rate of exogenous versioning that is indicator variable for the decision to version in week t.
not responsive. We represent a developer’s agility A policy π prescribes in each t-th week a rule deter-
based on imputing its share of versioning of each type. mining i’s action at in week t based on the history of i’s
past actions and past and present states. Developer i
3.2. Definitions and States will be interested in choosing the policy π that maxi-
Our Markov decision process (MDP) models develop- mizes the developer’s expected discounted reward:
er i’s decision to update its mobile app over time. Let
π π
us first introduce notation to define the characteristics J (s0 ) : Es0 γ · r(st , at ) ,
of mobile app i and its time-varying state. t0
We partition app i’s state in week t into two parts; where s0 ∈ S is the initial state, and actions at are cho-
the first is (mostly) observed in our data, whereas the sen in accordance with policy π.
second set of states is unobserved by the researcher The developer’s weekly reward consists first of the
yet needed for our model to fully rationalize i’s
app’s revenue-generating downloads, D̃ t , made in
choices seen in the data. The first component of the
week t. (Without loss of generality, the reward’s other
state, Xit, consists of the vector
components can be viewed as normalized against the
X it : {τit , qit , Dit , D̃ it , Rit } ∈ Z+ × [0, 1] × Z3+ , revenue value of a download.) Against this, i incurs the
developer’s cost to version, c, in weeks when the devel-
in which τit represents weeks elapsed since i’s last ver- oper decides to version (at 1). Lastly, as is common in
sioning update, qit represents i’s quality level (which discrete choice modeling, we provide that certain un-
we describe further below), Dit the number of down- observed states, it ∈ R2 , may influence i’s week-t re-
loads made in the weeks from i’s last versioning update wards. For week t, t (1) and t (0) represent the addi-
through week t, D̃ it the number of downloads made tively separable variation in i’s choice-conditional
during week t itself, and Rit the number of “satisfied” rewards caused by unobserved states, and they are as-
ratings submitted since i’s last versioning update sumed to independently follow the type-1, extreme val-
through week t. With the exception of qit, all of Xit is ob- ue distribution.
served directly or by proxy from our data. In particular, Formally, we specify i’s reward function r : S ×
we assume that the frequency of downloads can be ac- {0, 1} → R as follows:
curately proxied by the frequency of feedbacks re-
r(s, a) D̃ t − c · at + t (at ): (2)
ceived. See, for example, Cabral and Hortacsu (2010),
which follows a similar approach for eBay sellers. With relevance for the existence of i’s optimal stationary
We defer discussing the unobserved portion of the policy, the state space S is complete and separable, the
state, it ∈ R2 , until we address the developer’s deci- action space is compact, and the reward r is continuous.
sions. Together, the state is denoted sit : {X it , it }, and
we use S to denote the state space (i.e., S Z+ × 3.4. Quality and Ratings
[0, 1] × Z3+ × R2 ). Our study considers how the developer reacts to its
Finally, a set of time stationary parameters governs app’s ratings and quality. For this reason, we impor-
how the app’s state, sit ∈ S, transitions week to week. tantly explain upfront how our model defines ratings
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4267
and quality as states and how they are permitted to af- When i chooses to version, the share of dissatisfied
fect the firm’s rewards. Later, Section 3.5 describes customers is reduced by the fraction z ∈ [0, 1]. Concrete-
how quality and ratings fit within our model’s state ly, suppose that qt 0:7, implying that in expectation
transitions. 70% of customers are satisfied, and 30% are dissatisfied.
Versioning reduces that 30% group of dissatisfied cus-
3.4.1. Ratings. In keeping with prior literature, we tomers by z × 100%, meaning that the fraction z of
distinguish between satisfied and unsatisfied ratings. the previously dissatisfied group will be satisfied by
Downloaded from by [] on 15 March 2024, at 23:55 . For personal use only, all rights reserved.
mary rating on the iOS App Store, the current ver- only (1 − z) × 30% of customers are dissatisfied for
sion’s fraction of satisfied ratings,
Rt: week t + 1. More generally, upon versioning, qt+1
Rt 1 − (1 − qt ) × (1 − z).
R t : ∈ [0, 1], provided that Dt > 0: (3)
3.5. State Transitions
Such visible ratings play an important role in our We return to the developer’s MDP, where in each
study because they shape consumer perceptions. Con- week developer i selects his or her action at to collect
sequently, an app’s ratings determine the arrival rate his or her state-dependent reward r(st , at ). The afore-
of its new users, that is, its rate of new downloads mentioned transitions in the app’s quality and ratings,
made, λt. Empirically, we assume that 4+ stars are along with their resulting downloads, constitute part
considered satisfactory and 3 and below are unsatis- of how the state st transitions into st+1 between the de-
factory. We base the threshold for satisfaction on past veloper’s weekly decisions.
literature treating 4- and 5-star reviews as positive Recall that the transition probabilities depend in
(see, e.g., Bhattacharjee and Goel 2005, Forman et al. part on i’s chosen action. Importantly, if he or she ver-
2008, Dang et al. 2010, Mudambi and Schuff 2010, Fu sions, i’s quality increments and its summary ratings
et al. 2013, Ho-Dac et al. 2013). As in Ho-Dac et al. refresh under the iOS App Store’s display policy. If
(2013), we treat a 3-star review as a negative review not, quality decrements. In each case, new downloads
because it falls below the ratings mean of the sample. transpire, and each download generates both revenue
See Mudambi et al. (2014) on the overlap of 4-star and and a new positive or negative rating in the App Store.
5-star reviews as clustered outputs when predicting The following sequence describes the MDP state
ratings (sentiment) from review text. transition q : S × {0, 1} → S.
The empirical section will describe how we specify 1. Nonagile versioning. First, if i has not selected to ver-
the new user arrivals as Poisson with mean rate pa- sion (at 0), with probability p a versioning update
rameter λt, which is a continuous function of the app’s will still occur in week t. The developer still incurs the
visible ratings,
R t . We further allow λt to depend on i’s versioning cost c in this case.
genre and the time elapsed since i’s last versioning up- Exogenous versioning rate p flexibly allows for the
date, τt, with the mild technical restriction that λt is real possibility that the firm institutes versioning that is
continuous and bounded above by some λ̄ < ∞. The not responsive to ratings and quality. Many mobile
sensitivity of λt to visible ratings, R t , reflects in part developers do utilize agile methods that facilitate
the competitive intensity of i’s genre market. “sprints” ahead of their product releases, and many
extend flexibility into later stages of development
3.4.2. Quality. Although visible ratings determine the (MacCormack et al. 2001). Yet sustained sprints can re-
arrival rate of i’s users, their experiences are shaped sult in poor productivity (see, e.g., Zoeller 2013), and
by its actual quality state, qt ∈ [0, 1]. At time t, qt repre- agile operations may be less advantageous for develop-
sents app i’s Bernoulli probability to produce a satis- ers with portfolios of apps. These may benefit from de-
factory experience for each downloading customer. tailed commitment and coordination in timing and fea-
Each customer then records a star rating on the plat- ture planning.
form, from which we impute his or her (dis)satisfac- 2. New downloads and ratings. New downloads, D̃ t+1 ,
tion level. are realized as a function of displayed ratings R̃ t and
interval since versioning τt. If versioning occurs in the
3.4.3. Changes in Quality. However, qt evolves over
current period, Dt+1 D̃ t+1 (i.e., the number of ratings
time. Two types of quality state transitions are possi-
used in the displayed ratings average refreshes), and
ble, and which one occurs depends on i’s decision
otherwise, Dt+1 Dt + D̃ t+1 . New satisfied ratings R̃ t+1
whether to version in week t. Absent versioning, the
quality level qt decays at the weekly rate δ ∈ (0, 1), that are drawn from the Bernoulli (D̃ t+1 , qt ) distribution. Re-
is, qt+1 δ · qt . Quality depreciates in this way as the flecting the iOS summary ratings display, Rt+1 R̃ t+1
app’s features and content become outdated, and its upon versioning, and otherwise, Rt+1 Rt + R̃ t+1 . Ver-
users’ tastes evolve. sioning refreshes τt+1 1, and otherwise τt+1 τt + 1.
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
4268 Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS
3. Changes in quality. As described previously, qt+1 where r̃(x, a) D̃ − c · a, that is, reward r omitting the
δ · qt absent an update, and qt+1 1 − (1 − qt ) × (1 − z) unobserved state (a), and c accounts for the version-
with an update. ing cost incurred when at 0 yet exogenous versioning
4. Unobserved states. The unobserved states t+1 are occurs. We slightly abuse notation by decomposing
realized. These are independently drawn and in partic- state transition q. T is straightforwardly a contraction
ular depend on neither st nor at. mapping, which implies by the Banach fixed point the-
orem a unique fixed point w∗ ∈ C(X × {0, 1}) such that
ment. We study this issue in Section 5 by examining
1−γ < ∞. the developer characteristics that empirically associate
We briefly characterize the optimal solution. For with low shares of agile versioning (i.e., high shares of
this purpose, let X : Z+ × [0, 1] × Z3+ , that is, the state versioning through pi). Second, i’s policy π∗ dictates
space of the state variables {Xt : t 1, 2, : : : }, and use his or her agile versioning. It is worth noting that i’s
C(X × {0, 1}) to denote the metric space (supremum agile versioning behavior depends on i’s costs, ci, and
norm) of real-valued functions on X × {0, 1} that are benefits, zi, from versioning. Thus agile versioning is
bounded and upper semicontinuous. Notably, C(X × not necessarily frequent whenever the rate of exoge-
{0, 1}) is complete. nous versioning pi is low.
Consider the following monotone operator T : C(X × Given θi, the associated policy π∗ attaches to each
{0, 1}) → C(X × {0, 1}): observed state X ∈ X a probability over whether de-
veloper i chooses to version, where we account also
(Tw)(x, a) r̃(x, a) + − c(s ) + γ · max [w(s , a ) for his or her weekly rate of exogenous versioning pi:
a ∈{0, 1}
+ (a )] · q(d) · q(s | x, a) (7) exp{w∗ (X, 1)} + pi · exp{w∗ (X, 0)}
Pr{Versioning |X} :
exp{w∗ (X, 1)} + exp{w∗ (X, 0)}
r̃(x, a) + − c(s ) + γ · (Γ + log(exp{w(s, 0)} (11)
+ exp{w(s , 1)})) · q(s | x, a) (8) Various methods are available to compute the choice-
conditional value functions w∗ , with contraction
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4269
mapping T being a candidate algorithm guaranteed to where gN is a set of sieve-polynomial basis functions,
converge. Then expression (11) provides the probabili- organized as a vector, that grows with the sample
ty that i releases a new version given the week’s state X. size, N. In the first stage, we estimate β. Because an
Our data consist of many developers and apps. We app’s genre is represented in a vector of mutually ex-
treat each app’s characteristics θi as independently clusive dummies, in practical terms, a separate
β is es-
drawn from a population distribution F(θ; Θ0 ) (the timated for each genre. For assessing the reliability of
discounting rate γ is assumed shared). Distribution F nonparametric prediction of demand (downloads),
is given by a set of hyperparameters Θ, with their true we carry out fivefold cross-validation. For each of the
values Θ0 being our object of estimation. For example, five folds when held out, we estimate the model and
we will allow app i’s cost of updating ci to be drawn evaluate the R2. Over the five folds, we obtain the de-
from a log-normal distribution, and the parameters of cently high average R2 of 0.68.
this distribution are part of Θ0 . In Section 4, we esti-
mate Θ0 from our data; then, any specific θi can be ob- 4.2. Structural Estimation
tained through Bayesian updating. We estimate our structural model of developer up-
dates. Suppose that we are provided β and θi, respec-
3.8. Empirical Identification tively describing a state-dependent distribution over
We briefly outline our model’s empirical identification; downloads and a set of app and developer character-
i’s quality, qi,t , is reflected by its observed ratings, each istics. Then, our model in Section 3 characterizes the
of which is either positive or negative. The rate at which developer’s optimal versioning behavior, yielding
quality deteriorates, δi, is captured by the rate of declin- state-dependent probabilities of observing versioning
ing positivity in ratings since versioning. The degree by updates made by the developer in the data. Comput-
which ratings are observed to improve following ver- ing these probabilities, as set out in expression (11),
sioning episodes pins down zi. Lastly, suppose that two consists primarily of solving the associated Bellman
mobile apps, i and j, are observationally identical in the Equations (9). For computational efficiency, our esti-
above respects and operate in the same genre. However, mation procedure is designed to limit the number of
i releases version updates substantially more frequently times we are required to resolve such Bellman equa-
than j. We then infer that either i’s cost to update, ci, is tions for different θi.
lower than j’s, causing frequent agile updating, or that First, we draw and fix θb , b 1, : : : , B, independent-
i’s rate of exogenous updating, pi, is higher. We distin- ly, from a proposal distribution, F(θ; ΘProposal ). For
guish these two rationales by observing how ratings de- each app developer in our sample, i ∈ {1, : : : , N}, we
pendent i’s version timing tends to be. denote its observed sequence of downloads, ratings,
and versioning events as
4. Estimation Oi : {Ai,t , Di,t , Ri,t : t 0, : : : , Ti }: (13)
Our estimation proceeds in two parts to arrive at a
two-step estimator for the population hyperpara- After solving the Bellman equations once for each
meters set forth in Section 3. In the first stage, we non- b ∈ {1, : : : , B}, we can evaluate and store the B × N like-
parametrically estimate how an app’s rate of new lihoods, L(Oi | θb ). This precomputing step need only
downloads depends on its public ratings and on the be performed once, and the B sets of Bellman equa-
time elapsed since its last versioning update. Subse- tions (and subsequently the B × N likelihood evalua-
quently, we estimate our model of version timing tions) can be solved in parallel computations.
based on observing how developers respond to the in- As in Ackerberg (2009), we then obtain our maxi-
centives estimated in the first stage. mum simulated likelihood estimator via importance
sampling as
4.1. Demand Effect of Ratings
dF(θ ; Θ)
We flexibly estimate an app’s rate of new downloads SMLE : argmax
Θ log L(Oi | θb ) ·
in period t as a nonparametric, sieve-based function of Θ i1 b1 dF(θb ; ΘProposal )
its genre, its displayed fraction of satisfied ratings, (14)
R t−1 ∈ [0, 1], and the time elapsed since its last version-
ing update, τi,t . where dF(θb ; Θ) is the only component of the maxi-
More explicitly, period-t downloads are distributed mand that cannot be precomputed and fixed. Note
as Poisson with the rate that, by design, F has a common support across alter-
⎛ ⎞ native hyperparameters.
⎜⎜⎜ R i,t−1 ⎟⎟
λt gN ⎜⎜⎜⎜⎜ genrei ⎟⎟⎟⎟⎟ β, (12) 4.3. Results
⎜⎝ ⎟⎠ Our structural hyperparameter estimates are pre-
τi,t sented in Table 4. As discussed earlier, our estimated
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
4270 Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS
Table 4. Maximum Likelihood Estimates for the Structural quality level, rate of quality depreciation, and the size
Model of quality jumps upon versioning are all chosen by the
model to best fit the ratings path. The timing and fre-
α β
quency of versioning then pin down the app’s cost to
Cost of version update c — — 2:1 0:1∗∗∗ version and rate of nonagile versioning.
(0.45) (0.04) Table 5 presents the average posterior estimates of
Probability of nonagile 0:8∗∗∗ 10:7∗∗∗ — —
apps’ quality and cost parameters for the five largest
versioning p (0.23) (0.14) — —
Downloaded from by [] on 15 March 2024, at 23:55 . For personal use only, all rights reserved.
Initial quality q0 8:6∗∗∗ 4:0∗∗∗ — — genres, covering the cost to version, versioning im-
(0.35) (0.17) — — provement in user satisfaction, initial quality, rate of
Quality decay rate δ 67:6∗∗∗ 0:9∗∗∗ — — quality depreciation, and incidence of nonagile version-
(10.12) (0.21) — — ing. For the same genres, Figure 4 displays their shares
Versioning quality boost z 26:9∗∗ 51:4∗∗∗ — —
of agile versioning and the resulting average quality.
(13.56) (7.41) — —
Consider how apps’ outcomes differ across genres in
Notes. Sample of 2,035 apps. Shape parameters α and β characterize Figure 4. Notably, entertainment and games stand out
the beta prior (with mean α=α + β), and μ and σ correspond to the pa-
rameters for the log-normal distribution (with mean exp(μ + σ2 =2)). for their high quality levels (75% to 80% satisfied),
See Online Appendix C for marginal distribution plots. whereas the remaining genres of education, books, and
*p < 0.1; **p < 0.05; ***p < 0.01. business all register below 70% in satisfaction. Interest-
ingly, the genres of entertainment and games differ
structural model delineates a distribution over the app starkly in how their apps’ high quality is achieved. Their
population. The marginal distributions resulting from shares of agile versioning (as opposed to nonresponsive)
the estimated parameters are shown in Online Appen- diverge sharply — 63% versus 39% — to importantly
dix C. Starting from the prior defined by these esti- demonstrate that high quality can be sustained with or
mates, we carry out Markov chain Monte Carlo simu- without being agile. As shown in Table 6, paid and free
lations to estimate posteriors for each individual app. apps do not appear to differ significantly in their shares
Using such posterior estimates, Figure 3 shows how of agile versioning or in quality.
the model estimates a time series path of evolving Turning back to Table 5, we observe that the aver-
latent quality to fit one specific app’s ratings and ver- age rate of quality depreciation is largely uniform
sions data. Consistent with Section 3.8, the initial across the genres. However, we subsequently examine
Notes. The plot shows the mean evolution path of latent quality for a selected app. The plotted path is generated using the estimated posterior
means for the app’s quality and cost parameters. The plot shows both latent quality and average reviewed ratings weekly, and versioning events
are marked by dashed lines. Ratings and versions are from data.
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4271
whether varying competition may affect quality depre- 5. Analysis and Prescriptions
ciation rates for apps within a genre. Strikingly, despite In this section, we first study the factors driving devel-
these genres’ high average quality, entertainment and opers’ choices to be agile in practice. We then turn to a
games versioning events are the least effective at re- comparison of agile and nonagile models to assess
ducing their apps’ shares of dissatisfied customers, agility’s value before finally evaluating the market-
thus entailing more frequent versioning to remain at place operator’s influence on the developers’ incen-
the same quality level. Moreover, the average initial tives through ratings design.
quality of apps released in these two categories starts
substantially lower, signaling that developers experi- 5.1. When Are Developers Agile?
ment after launching apps in the marketplace. Regarding when developers choose to organize them-
selves to be agile, we focus on three main hypotheses.
4.4. Predictive Performance and Robustness First, cost economies in software development have
We assess predictive accuracy by reserving 20% of the long been shown to be significant (Banker and Slaugh-
sampled apps as a test set and estimating the structur- ter 1997), and in recent years, app developers have cited
al model solely on the remaining 80% as the training efficiencies from building proprietary development
set. See Online Appendix B for details on the structur- platforms that support features to be shared by their
al model’s area under the curve (AUC) of 0.77. apps. See, for example, Seif El-Nasr and Magy (2013) re-
Online Appendix D provides a robustness check garding Zynga’s cloud-based data infrastructure sup-
showing that our structural estimates remain essen- porting integrated analytics, experimentation, and
tially unchanged when we carry out estimation sepa- cross-promotion. We thus hypothesize that scale econo-
rately for paid and free apps. There, we additionally mies can be attained by coordinating development over
estimate a model extension that allows the level of a developer’s app portfolio, which undercuts the incen-
competition to affect the versioning rate. tives for agile development of individual products. This
first hypothesis also parallels product development in
Figure 4. (Color online) Versioning Type and Quality by automobiles, where lean and agile manufacturers em-
Genre Bar Plots Denote the Share of Agile Versioning (Left pirically share fewer components across automobile
Axis), and the Dashed Lines Denote the Average Quality models than competitors that realize scale economies
(Right Axis) in Respective Genres by sharing components across multiple products (Clark
et al. 1991, Fisher et al. 1999, Ramdas 2003).
Second, the operations literature (Fisher et al. 1999,
Ramdas 2003) proposes that firms facing higher levels
of market uncertainty should prioritize operational
responsiveness over cost efficiency and reliability.
Average quality % of
Share of agile versioning level sample
Therefore, we examine the salience of initial market Table 7. Effect of Developer Portfolio Size on Share of
risk in influencing firms’ decisions about whether to Agile Versioning
be agile.
(1) (2) (3)
Lastly, we explore how the competitive landscape
affects developer agility. We do so in two parts. First, portfolioSize −0.0012** −0:0013 −0.0014***
we hypothesize that when facing greater competition, (0.000) (0.000) (0.000)
inApp 0.017* 0.016** 0.031***
an app must output a higher rate of innovation and (0.010) (0.007) (0.006)
content development to satisfy customers. In the con- Constant 0.494*** 0.342*** 0.299***
text of our model, this dynamic manifests as a higher (0.028) (0.067) (0.113)
rate of quality decay for apps facing greater competi- Genre —
tion. We further hypothesize that, in response, devel- Observations 2,035 2,035 2,035
R2 0.033 0.693 0.376
opers launching apps in more competitive landscapes Adjusted R2 0.014 0.659 0.085
will choose to engage in higher rates of agile and non-
agile versioning. Note. Standard errors in parentheses.
*p < 0.1; **p < 0.05; ***p < 0.01.
The first hypothesis tests whether a developer’s de-
cision to be agile is significantly explained by the size
of its portfolio. We use the following regression speci- Interestingly, cost economies may have genre-level
fication: effects. In discussing Figure 4, we noted that whereas
the entertainment and games genres post high quality
agilityi genrei + portfolioSizeD(i) + inAppi + i (15) levels (75% to 80% satisfied), they differ sharply in
An app’s agility is measured by the share of agile ver- their shares of agile versioning. This may be explained
sioning, while portfolioSize refers to the average app by game developers maintaining significantly larger
portfolio size of the app developer D(i) over i s life- app portfolios than entertainment developers.
time. We additionally control for whether the app has The second hypothesis proposes that firms are in-
in-app purchases, inAppi. fluenced by the initial level of market risk. We use the
However, a potential endogeneity arises if better ac- following regression specification,
cess to capital and funding (or a similarly omitted fac- agilityi genrei + riski + i , (16)
tor) enables a developer to both organize itself for
greater agility and develop a more expansive portfolio. in which product i s market risk is measured by vari-
In this case, estimating (15) by ordinary least squares ance of initial reviews the app receives.
(OLS) results in a portfolioSize coefficient that is up- Table 8 shows our OLS regression results with
wardly biased; however, this is a conservative bias, be- genre fixed effects. We find that more agile versioning
cause we expect that the effect is negative (a larger associates with higher market risk.
portfolio discourages developer agility). We addition- To test our final hypotheses on competitive effects,
ally apply an instrumental variable: the market share of we use the following regression specifications:
iOS devices six months prior to app i s launch release
δi genrei + market-competitioni + (δ)
i , (17)
on iOS. Although the iOS market share should not di-
rectly affect agility (exclusion restriction), it plausibly agilityi genrei + market-competitioni + i , (18)
influences the number of apps launched by the devel- (p)
pi genrei + market-competitioni + i : (19)
oper on the iOS platform specifically (relevance).
We report our OLS estimates in columns (1) and (2) Table 9 shows the estimation results (17) for the com-
of Table 7, where specification (2) adds the genre fixed petition’s effects on the pace of updating required to
effects. We find that the share of agile versioning of an
app is significantly negatively associated with the de- Table 8. Effect of Market Risk on Share of Agile
veloper’s portfolio size, supporting our hypothesis Versioning
that firms with broader portfolios engage in planned
and nonagile development that better preserve econo- (1)
mies of scale. App developers that offer in-app pur- Risk 0.430***
chases tend be more agile in versioning. They may de- (0.007)
pend more heavily on a continuous revenue stream, Constant 0.015***
which requires them to stay relevant and attractive to
consumers. Specification (3) in Table 7 shows results Observations 2,035
for our instrument variable regression, which corrobo- R2 0.494
rate our OLS findings. The corresponding first-stage Adjusted R2 0.494
regression F-statistic is 14.07 (P < 0.01), which sup- Note. Standard errors in parentheses.
ports that the instrument is relevant. *p < 0.1; **p < 0.05; ***p < 0.01.
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4273
Table 9. Effect of Competition on Quality Decay Rate illustration, apps launched at the 75th percentile level
of competition have 34% more levels of nonagile ver-
Dependent variable
sioning versus the 25th percentile level.
(1) (2)
5.2. Gains from Agile Versioning
market-competition(L) −0.0002*** −0.0001**
To next assess the gains from adopting agile versioning,
(0.0000) (0.0000)
Genre — we compare a firm’s payoffs under the agile and nonag-
Downloaded from by [] on 15 March 2024, at 23:55 . For personal use only, all rights reserved.
(0.0001) (0.0003) acterize which developers would benefit most from
Observations 2,035 2,035 agility and whether they are the most agile in practice.
R2 0.0208 0.1341
Adjusted R2 0.0203 0.1242
Specifically, for each app, we compare the NPV (net
present valued) revenue under three different version-
Notes. The table shows impact of competition on app’s quality decay ing approaches. First, we consider nonagile versioning,
rate. market-competition(L) refers to the log specification for market
competition being used. where the firm optimizes its frequency of versioning
*p < 0.1; **p < 0.05; ***p < 0.01. the product, independent of the customer ratings. We
optimize over a restricted version of the model in Sec-
maintain an app’s quality. We consistently find that, tion 3 where responsive versioning is disallowed. In-
within genres, apps facing greater competition face a stead, the developer optimizes over the exogenous ver-
faster decay rate δi in quality when not updated, con- sioning rate, p, which cannot depend on ratings or
sistent with customers demanding constant innova- quality. Second, we evaluate the firms’ NPV gains un-
tion and content development to be satisfied. For illus- der the completely agile model. Using the model of
tration, compare apps launched at the 75th percentile Section 3, we set p 0 and solve the resulting MDP.
level of competition versus the 25th percentile level. Note that the resulting versioning policy necessarily
The former’s quality decays by 4% more than the lat- dominates nonagile versioning in NPV payoffs. Lastly,
ter over 10 weeks if not updated. we include the firm’s NPV revenue collected from its
In columns (1) and (2) of Table 10, we report our estimated policy as a benchmark for its current deci-
OLS regression results for specification (18), which sions in practice.
tests that firms respond to competition by becoming The panels of Figure 5 present our main findings.
more agile. Specification (2) adds genre fixed effects. First, as shown in Figure 5(a), agile versioning yields
Across both specifications, we find that firms substantial benefits over nonagile versioning. Even
launched into more competitive landscapes engage in against the optimal nonagile schedule, agility typical-
significantly higher shares of agile versioning. For il- ly yields a premium (depicted by the height of the
lustration, apps launched at the 75th percentile level dashed line over zero) of 20% to 80% in the NPV reve-
of competition are 22% more agile versus the 25th per- nue stream. Second, some of the apps and developers
centile level. that could significantly benefit from agility (i.e., those
Columns (3) and (4) of Table 10 report our OLS re- situated where the dashed line is highest) are not very
gression results for (19), which tests whether more agile. Figure 5(a)’s shaded area conveys the density of
competitive landscapes spur developers to also en- apps over their currently chosen shares of agile ver-
gage in higher rates of nonagile versioning p. We find sioning, and a healthy share of apps resides close to
that firms situated in more competitive landscapes do where the widest gap separates the solid line from
choose higher rates of nonagile versioning. For the dashed line, which represents the NPV revenue
Table 10. Effects of Competition on Share of Agile Versioning and on Rate of Nonagile
Figure 5. (Color online) Agile Versioning’s Net Present Valued Premium over Optimal Nonagile Versioning
Note. (a) All developers; (b) developers with high versioning cost; (c) developers with low versioning cost; (b) and (c) show the top and bottom
quartiles, respectively, of apps in versioning cost c.
gap between apps’ current versioning and fully agile achieved by completely agile versioning and approxi-
versioning. Various explanations are plausible; for mately 26% of its NPV premium over the optimal
example, in light of the previous subsection, some de- schedule-based regime. The performance gap is nar-
velopers may choose to remain less agile in order to rower virtually everywhere else. By contrast, for apps
exploit portfolio scale economies. However, because with low versioning costs, 25% agile versioning captures
the prospective gains from increasing agility are sub- only about 22% and 12.5% of the agile NPV and NPV
stantial, managers should carefully consider these premium, respectively. The explanation is intuitive.
tradeoffs in staying less agile. In Figure 5(b) and (c), When versioning is costly, it is infrequent, affording an
we separately plot the premia and shares of agile ver- outsized benefit to being at least somewhat agile and re-
sioning for developers with low and high versioning sponsive to customer feedback.
cost markets. Developers respond to such costs, with
developers with low versioning costs being significant-
ly more agile. The latter represent the fullest definition 5.3. Marketplace Ratings Design
of agility, where responsiveness to customers is cou- Even as they surpassed many tens or even hundreds of
pled with low costs enabling fast development cycles. billions in downloads, the leading mobile app market-
Third, apps with costlier versioning receive a relative- places have relied on sharply contrasting summary rat-
ly high gains from becoming partially agile; that is, the ing systems. Relevantly for our purposes, the divergent
gap between the dashed and solid lines is comparatively summary ratings policies differently incentivize devel-
narrower in Figure 5(b). For these apps, 25% agile ver- opers in their decisions about whether to engage in ag-
sioning captures approximately 51% of the NPV ile development.
Allon et al.: When to Be Agile: Ratings and Version Updates in Mobile Apps
Management Science, 2022, vol. 68, no. 6, pp. 4261–4278, © 2021 INFORMS 4275
Before the launch of iOS 11 in September 2017, Figure 6. (Color online) Comparing App Quality Under iOS
Apple’s App Store completely refreshed an app’s sum- and Google Play Summary Ratings
mary ratings whenever the developer released a new
version, meaning that the average rating prominently
displayed to potential customers exclusively addressed
the new version. Google Play’s summary ratings were
instead formed by averaging the individual ratings re-
market feedback is critical to improving the initially design. Despite logging hundreds of billions of
mixed quality of apps. When poor early ratings remain downloads, the two predominantly leading app
permanently attached to new versions, developers can- stores still display their apps’ ratings in opposing
not capture the full marginal benefits of quality- ways. When a consumer views an app, Google Play
enhancing versioning, and therefore, they underinvest displays its average rating over all previous versions,
in such efforts. Second, a significant fraction of apps whereas Apple’s iOS store shows the average rating
launched in ratings-sensitive markets face relatively received by the current version alone. We find that
high versioning costs. Because these apps version infre- the iOS ratings policy better preserves suppliers’ dy-
quently, it is very difficult to entirely avoid lapsing into namic incentives to deliver product quality through
lower ratings, and the Apple policy incentivizes agility versioning, with up to 22% higher rates of satisfied
by forgiving these lapses. ratings in certain markets.
Lastly, thanks to a reviewer suggestion, we address
the recent change in Apple’s summary ratings dis- Acknowledgments
played for apps, which occurred in September 2017 af- The authors thank their editors, three referees, and partici-
ter our data were collected. At this time, Apple granted pants at 2017-2020 INFORMS Annual Meetings, 2018
developers the option to decide, each time they version, MSOM Conference, 2019 POMS Annual Conference, and
2019 MSOM Supply Chain Management SIG Meeting for
whether to “reset” their apps’ summary ratings or re-
their helpful feedback.
tain the reviews from past versions as summary rating
inputs (Dillet 2017). We analyzed an additional coun-
terfactual to study the effects of Apple’s revised policy. 1
For example, in software development, Harter et al. (2000) associ-
We found that developers of high-quality apps signifi- ate higher product quality with process maturity. The well-known
cantly increased their shares of agile versioning by as Capability Maturity Model they cite places emphasis on planning
much as 24.1% (pvalue 1e − 6), whereas no statistically and a highly repeatable process that can be matured and made pre-
significant change was found for the versioning behav- dictable: “Schedules and budgets are based on historical perfor-
mance and are realistic, the expected results for cost, schedule, func-
ior of developers producing low-quality apps. The ex-
tionality, and quality of the product are usually achieved
planation appears to be that developers of high-quality … disciplined process is followed … and the necessary infrastruc-
apps were previously reluctant to quickly release their ture exists to support the process” (Paulk et al. 1993, p. O-7).
version updates due to losing the high-quality reviews 2
However, this first factor alone is not sufficient for agility. In par-
from previous versions. However, with iOS 11, these ticular, efficiently planned processes can shorten software develop-
developers can version more frequently while electing ment cycle times (see, e.g., Harter et al. 2000).
not to reset apps’ summary ratings. See (accessed
April 5, 2021).
6. Concluding Remarks The two largest mobile app stores, Google Play and Apple’s iOS
App Store, offered more than 2.4 and 1.9 million mobile apps for
We study operational agility in the versioning-based download onto the Android and iOS platforms, respectively, as of
development of apps. Using panel data on apps’ ver- April 2019. By July 2013, these two mobile app stores had seen
sion releases, reviews and ratings, and developer char- more than 50 billion app downloads each, and Apple announced its
acteristics, we find that developers become agile when 100 billionth download milestone in June 2015.
facing initial risk in the marketplace and when manag- We originally sample 4,938 apps but filter for one or more months
ing smaller portfolios of apps. As was the case for of data since release to carry out estimation.
manufacturers considering component sharing across
