04 Vision Wang
04 Vision Wang
04 Vision Wang
Opportunities
data
Figure 1: Stochastic Gradient Descent. gradient
sigmoid
data W
optimizing deep learning systems. Section 4 de- inner-
scribes research problems in databases where deep gradient product b
learning techniques may help to improve perfor-
mance. Some final thoughts are presented in Sec- data input
tion 5.
ing. Both approaches have scalability issues [38]. timization based on the dataflow graph.
Recently, there are studies for training convex mod- We are optimizing the Apache incubator SINGA
els (deep learning models are non-linear and non- system [28] starting from version 1.0. For stand-
convex) using a value bounded consistency model [41]. alone training, cost models are explored for runtime
Researchers are starting to investigate the influence operation scheduling. Memory optimization includ-
of consistency models on distributed training [15, ing dropping, swapping and garbage collection with
16, 2]. There remains much research to be done on memory pool will be implemented. OpenCL is sup-
how to provide flexible consistency models for dis- ported to run SINGA on a wide range of hardware
tributed training, and how each consistency model including GPU, FPGA and ARM. For distributed
a↵ects the scalability of the system, including com- training, SINGA (V0.3) has done much work on
munication overhead. flexible parallelism and consistency, hence the fo-
cus would be on optimization of communication and
3.2.3 Fault Tolerance fault-tolerance, which are missing in almost all sys-
Databases systems have good durability via log- tems.
ging (e.g., command log) and checkpointing. Cur-
rent deep learning systems recover the training from 4. DEEP LEARNING TO DATABASES
crashes mainly based on checkpointing files [11]. Deep learning applications, such as computer
However, frequent checkpointing would incur vast vision and NLP, may appear very di↵erent from
overhead. In contrast with database systems, which database applications. However, the core idea of
enforce strict consistency in transactions, the SGD deep learning, known as feature (or representation)
algorithm used by deep learning training systems learning, is applicable to a wide range of applica-
can tolerate a certain degree of inconsistency. There- tions. Intuitively, once we have e↵ective represen-
fore, logging is not a must. How to exploit the SGD tations for entities, e.g., images, words, table rows
properties and system architectures to implement or columns, we can compute entity similarity, per-
fault tolerance efficiently is an interesting problem. form clustering, train prediction models, and re-
Considering that distributed training would repli- trieve data with di↵erent modalities [40, 39] etc.
cate the model status, it is thus possible to recover We shall highlight a few deep learning models that
from a replica instead of checkpointing files. Ro- could be adapted for database applications below.
bust frameworks (or concurrency model) like actor
model, could be adopted to implement this kind of 4.1 Query Interface
failure recovery. Natural language query interfaces have been at-
tempted for decades [24], because of their great de-
3.3 Existing Systems sirability, particularly for non-expert database users.
A summary of existing systems in terms of the However, it is challenging for database systems to
above mentioned optimization aspects is listed in interpret (or understand) the semantics of natural
Table 1. Many researchers have extended Ca↵e [19] language queries. Recently, deep learning models
with ad hoc optimizations, including memory swap- have achieved state-of-the-art performance for NLP
ping and communication optimization. However, tasks [13]. Moreover, RNN has been shown to be
the official version is not well optimized. Similarly, able to learn structured output [34, 36]. As one so-
Torch [6] itself provides limited support for distributed lution, we can apply RNN models for parsing nat-
training. Mxnet[3] has optimization for both mem- ural language queries to generate SQL queries, and
ory and operations scheduling. Theano [1] is typi- refine it using existing database approaches. For
cally used for stand-alone training. TensorFlow [11] instance, heuristic rules could be applied to correct
has the potential for the aforementioned static op- grammar errors in the generated SQL queries. The