みらい 未来
Minimalist Async Evaluation Framework
for R
→ Designed for simplicity, a ‘mirai’ evaluates an R
expression asynchronously in a parallel process, locally or distributed
over the network.
→ Modern networking and concurrency, built on nanonext and NNG (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Distributed computing can launch remote resources via SSH or cluster managers.
→ A queued architecture readily handles more tasks than available
processes, requiring no storage on the file system. Innovative features
include event-driven promises, asynchronous parallel map, and automatic
serialization of otherwise non-exportable reference objects.
Use mirai()
to evaluate an expression asynchronously in a separate,
clean R process.
The following mimics an expensive calculation that eventually returns a vector of random values.
library(mirai)
m <- mirai({Sys.sleep(n); rnorm(n, mean)}, n = 5L, mean = 7)
The mirai expression is evaluated in another process and hence must be self-contained, not referring to variables that do not already exist there. Above, the variables
n
andmean
are passed as part of themirai()
call.
A ‘mirai’ object is returned immediately - creating a mirai never blocks the session.
m
#> < mirai [] >
Whilst the async operation is ongoing, attempting to access a mirai’s data yields an ‘unresolved’ logical NA.
m$data
#> 'unresolved' logi NA
To check whether a mirai remains unresolved (yet to complete):
unresolved(m)
#> [1] TRUE
To wait for and collect the return value, use the mirai’s []
method:
m[]
#> [1] 6.288799 7.337810 6.767335 7.435713 7.628763
As a mirai represents an async operation, it is never necessary to wait
for it. Once it completes, the return value is automatically available
at $data
.
while (unresolved(m)) {
# do work here that does not depend on `m`
}
m$data
#> [1] 6.288799 7.337810 6.767335 7.435713 7.628763
📡️️ Daemons are persistent background processes for receiving mirai requests, and are created as easily as:
daemons(6)
#> [1] 6
Daemons may also be deployed remotely for distributed computing over the network.
🛰️️ Launchers can start daemons via (tunnelled) SSH or a cluster resource manager.
🔐 Secure TLS connections can be used for remote daemon connections, with zero configuration required.
mirai_map()
maps a function over a list or vector, with each element
processed in a separate parallel process. It also performs multiple map
over the rows of a dataframe or matrix.
df <- data.frame(
fruit = c("melon", "grapes", "coconut"),
price = c(3L, 5L, 2L)
)
m <- mirai_map(df, \(...) sprintf("%s: $%d", ...))
A ‘mirai_map’ object is returned immediately, and is always non-blocking.
Its value may be retrieved at any time using its []
method to return a
list, just like purrr::map()
. The []
method also provides options
for flatmap, early stopping and/or progress indicators.
m
#> < mirai map [3/3] >
m[.flat]
#> [1] "melon: $3" "grapes: $5" "coconut: $2"
All errors are returned as ‘errorValues’, facilitating recovery from partial failure. There are further advantages over alternative map implementations.
mirai is designed from the ground up to provide a production-grade experience.
→ Fast
- 1,000x more responsive vs. common alternatives [1]
- Built for low-latency applications e.g. real time inference & Shiny apps
→ Reliable
- No reliance on global options or variables for consistent behaviour
- Explicit evaluation for transparent and predictable results
→ Scalable
- Launch millions of tasks over thousands of connections
- Proven track record for heavy-duty workloads in the life sciences industry
mirai features the following core integrations, with usage examples in the linked vignettes:
Provides the first official alternative communications backend for R,
implementing a new parallel cluster type, a feature request by R-Core at
R Project Sprint 2023.
Powers the (in development) implementation of parallel map for the
purrr functional programming toolkit, one of the core tidyverse
packages.
Implements the next generation of completely event-driven promises.
‘mirai’ and ‘mirai_map’ objects may be used interchangeably with
‘promises’, including with the promise pipe
%...>%
.
Asynchronous parallel / distributed backend, supporting the next level
of responsiveness and scalability within Shiny, with native support for
ExtendedTask.
Asynchronous parallel / distributed backend for scaling Plumber
applications in production.
Allows Torch tensors and complex objects such as models and optimizers
to be used seamlessly across parallel processes.
Allows queries using the Apache Arrow format to be handled seamlessly
over ADBC database connections hosted in background processes.
Targets, a make-like pipeline tool, has adopted crew as its default
high-performance computing backend. Crew is a distributed
worker-launcher extending mirai to different distributed computing
platforms, from traditional clusters including LFS, PBS/TORQUE, SGE and
Slurm to cloud services such as AWS Batch.
We would like to thank in particular:
Will Landau for being instrumental in shaping development of the package, from initiating the original request for persistent daemons, through to orchestrating robustness testing for the high performance computing requirements of crew and targets.
Joe Cheng for integrating the ‘promises’ method to work seamlessly within Shiny, and prototyping event-driven promises.
Luke Tierney of R Core, for discussion on L’Ecuyer-CMRG streams to ensure statistical independence in parallel processing, and making it possible for mirai to be the first ‘alternative communications backend for R’.
Henrik Bengtsson for valuable insights leading to the interface accepting broader usage patterns.
Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.
Kirill Müller for discussion on using parallel processes to host Arrow database connections.
Install the latest release from CRAN:
install.packages("mirai")
The current development version is available from R-universe:
install.packages("mirai", repos = "https://r-lib.r-universe.dev")
◈ mirai R package: https://mirai.r-lib.org/
◈ nanonext R
package: https://nanonext.r-lib.org/
mirai is listed in CRAN High Performance Computing Task View:
https://cran.r-project.org/view=HighPerformanceComputing
–
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.