Architectural Issues in Adopting Distributed Shared Memory For Distributed Object Management Systems
Architectural Issues in Adopting Distributed Shared Memory For Distributed Object Management Systems
Architectural Issues in Adopting Distributed Shared Memory For Distributed Object Management Systems
294
0-8186-7125495 $04.00 0 1995 IEEE
Distributed Shared Object Cachj \
1'$3 I
Distributed Share Page Cacl
Figure 1: Distributed Shared Page Cache Architecture Figure 2: Distributed Shared Object Cache Architec-
ture
295
this point, a site A has got the page with read- 3. The first found favored page is selected as victim.
only mode, since DSM memory fault is occurred A favored page in a victim set i is a page which
by read operation. has not been accessed for a certain time ~ i where
,
~i 5 ~ i + l1
, 5 i 5 n - 1, and n is the number of
e The request page is fixed in the victim by read- victim sets and ordered by replacement costs.
ing disk and marking its cache control block.
But disk read operation calls for another net- In this algorithm, it is trivial that any remote vic-
work traffic to get the write ownership for the tim, which has higher replacement cost, is not gone
page. What is worse, all of the shared copy of out of the cache as long as there is any other favored
the page should be invalidated to get the exclu- page. And also, threshold ~i for each set i prevents
sive ownership for it. the pages in a working set from being replaced.
2.2 False Sharing
As this example shows, selecting a remote victim
requires higher network overhead than local page re- If the memory coherence unit of DSM is larger than
placement. Now, we suggest two replacement strate- transactional unit of OMS, it is likely that more than
gies exploiting with tight cooperation between DSM one site will write access to a single coherence unit.
and the cache management module of OMS3 This is called false sharing and may induce thrashing,
The first one is to use a new explicit remote paging where a memory unit moves back and forth at such
interface. The way t o make a shared memory seg- a high rate that any work cannot be done[l6]. The
ment accessible to an execution site can be implicit granularity of DSM sharing should be same or smaller
and explicit. Implicit method is based on page fault than the granularity of locking to avoid this kind of
of operating system. The explicit one uses DSM in- false sharing. But otherwise, mechanisms t o reduce
terface directly. Implicit remote paging requires the thrashing are require to assure reasonable performance
cost of handling page-fault by virtual memory of op- of. systems.
erating system. But explicit remote paging may avoid Two existing DSM systems[8][3] give solutions t o
the cost and also can give useful hints easily. this problem at the DSM level. Mirage system[8] guar-
Because a victim will be overwritten as soon as it antees that a reader or a writer possesses the sharing
is shipped from its previous owner, transferring the unit without interrupt for a specific time window A.
This prevents the sharing unit from being stolen away
page is not necessary. Thus, a new explicit interface
which does not make unnecessary remote paging can before any work can be done. Although optimally
tuned value for A may give high throughput decreas-
be added for reducing the performance degradation.
ing network traffic, it is difficult to choose an appropri-
Addition t o ‘get’ primitive for getting a page in the
specified mode from its owner, we propose a new in- ate value for A dynamically. Munin system[3] employs
terface ‘getnew’. It gets a page in the exclusive write another solution to reduce thrashing. It uses different
mode without transferring page itself. Figure 3 shows coherence protocols for each shared data type. Type
how the ‘getnew’ operation works. information specified by a programmer may improve
The second one, we propose, is replacement-cost overall system performance. But it imposes a heavy
hints algorithm. This replacement strategy partitions burden on a programmer to predict the type of every
cache space by their ownership and gives priorities to shared data.
each of them according t o their replacement costs. Re-
It is well known that sequential consistency is too
placement cost of each set shows how costly to replace restrictive and weakening the coherence requirement
a victim on a distributed shared memory space.* makes adopting DSM more viable. We can also get
The basic idea underling replacement-cost hints is performance gain by combining two separate synchro-
the following: nization activities - memory coherence control of DSM
and concurrency control of OMS[14].
Pages are organized into victim sets, where a While above two systems do not fully use the appli-
victim set consists of all of pages which have a cation specific knowledge in maintaining coherence of
same replacement cost. Victim sets are arranged DSM, our new loose coherence protocol exploits syn-
by their replacement-cost order. chronization activities of transaction manager. This
method grounds on that the relaxed coherence seman-
Cache searches its victim sets in inverse order, tic will allow more efficient shared accesses and concur-
starting from the lowest replacement cost victim rency control will synchronize access t o shared data.
set. It is described briefly as follows:
31n this work, we assume that all site are sharing disks
4When using a non-shared disk architecture, it includes the 1. Addition to READ and WRITE modes, there are
cost of flushing out SHARED-READ and SHARED-WRITE modes in the
296
Shared copy site
distributed sh.ared memory space. As the strict should be merged with other copies before in-
coherence protocol, multiple readers may exist validation. Data merging can be done by DSM
at any instant for a single memory coherence server or by other shared owners distributively.
unit, but only one writer may exist at any in- It is worth t o note that modified portion of data
stant. Unlike these two basic modes, there can can be identified easily by recovery scheme.
exist many shared-readers or shared-writers for
In this protocol, because SHARED-READ or
a unit at the same time.
SHARED-WRITE mode does not incur any conflict for
2. Initially, strict coherence protocol is used for a shared unit, each site can service all conflicting ac-
each coherence unit. But if thrashing is likely cesses to it, but consistency is preserved by concur-
to occur,’ coherence protocol is weakened t o re- rency control mechanism. Thus, on the assumption
duce thrashing. There are two cases wh.ere the that all shared data accesses should be done in well-
protocol is looeed.6 formed concurrency control protocol, the suggested
protocol works correctly.
(a) WRITE - ‘WRITE: When an exclusive owner
receives ‘get-withsritemode’request for
3 Distributed Shared Recoverable Vir-
a unit, the owner looses WRITE mode to tual Memory Architecture
SHARED-WRITE and returns SHARED-WRITE Many researchers have studied issues in using op-
ownershtp to the request site instead of ex- erating system’s virtual memory as caches in ob-
clusive ject management systems[7][17]. In this approach,
(b) WRITE - READ : As the case 1, the owner databases are directly mapped into virtual memory
changes the mode to SHARED-WRITER and and OMSs use persistent objects as transient mem-
returns SHARED-READ capability t o the re- ory objects. DSRVM approach is a sort of natural
quest site. extension of this approach - mapping databases into
DSM[12][13].
3. Once the protocol is weakened, In DSRVM architecture, OMS never worries about
get -with-wr i,temode or get - w i t hreadmode where objects are and how t o access them because ob-
requests are handled by returning SHARED-READ ject accessibility is governed by DSM system. Also,
mode or SHARED-WRITE for each. it does not have t o implement distributed caching
management and concern some aspect of cache co-
4. When a transaction ends, all of the SHARED-READ herency problem. As such, this architecture fully uti-
or SHARED-WRITE units should be invalidated. If lizes the advantages of DSM. Moreover, since most
a unit is owned with SHARED-WRITE mode, it typical OMS applications show tighter working set
than traditional ones, only little performance degra-
5Thrashing can be dletected by monitoring the transfer rates dation is expected from using less DBMS-optimized
for each unit.
6This work excludes ‘READ - WRITE’ case for simplicity. But caching.
it can be done easily as other cases As data are mapped into DSM and are directly ma-
7Data is also transfered. nipulated in DSM, underlying DSM must support re-
297
coverable manipulations of data. This means, any an-
ticipated crashes cannot violate the consistency of the
data in DSM. To be recoverable, DSM system must be
incorporated with log manager and recovery manager
in OMSs. In the next subsection, we propose a pro-
tocol that integrates the cache coherence, two phase
lock protocol and write-ahead logging protocol[lO].
3.1 Transactional DSM for DSRVM
we assume that the system is a client-server ar-
chitecture so that a designated server process (DSM
server) knows the global status of DSM pages8 over all
nodes. Also, t o provide permanence of DSM, we as-
sume that server has non-volatile storages for backing
DSM memory pages and it logs the changes in DSM
page.
A client is a node participated in DSM complex and
it is composed of application transactions (APT) and
an agent transaction (AT) (see figure 4). An applica-
tion transaction is an application process which is en-
closed in transaction boundary. DSM system provides
access control of DSM pages for their APT. When an
APT tries to read(write) a DSM page which is not
permitted t o read(write), a page-fault is trapped. By
Figure 4: Architecture for Integrated Protocol
this mechanism, DSM manager provides a transpar-
ent way to guarantee cache coherence, serializability,
atomicity, and permanence. AT is a stand-alone pro- holder is AT, FH sends get-withreadmode
cess and it is only a transaction which interacts with message t o AT and locks the page in SH mode
DSM server so that it receives(sends) valid copies of again. After the lock is granted, FH marks the
DSM pages from(to) a DSM server on behalf of APTs. page readable and resumes the process.
According t o the lock mode of DSM page, the status
of a page is determined. When AT holds a lock in 0 On write page P fault: FH tries to lock the page
exclusive (EX) mode for a page, it implies the node conditionally in EX mode. If the request is not
has the invalid page. When AT has a lock in shared granted immediately and AT is the one of the
(SH) mode, the node has a valid copy of the page but lock holders. FH sends get-with-writemode
no APTs in the client can write the page. When AT message t o AT and locks the page in EX mode
locks a page in NL mode (i.e. AT does not hold a lock again. After the lock granted, FH marks the
for the page), the node has a writable valid copy of page writable (which also means the page is
the page. readable) and resumes the process.
3.2 Client Protocol 0 On APT commit: If APT has any writable DSM
The following is the scenario of the integrated pro- pages, it requests AT to send commit message
tocol executed in client node. to DSM server with modified pages. After AT
completes t o send those pages, APT releases the
0 On client boot up: AT locks all pages of DSM all locks it holds and disables access t o all pages.
in EX mode, which implies that all pages are 0 On APT abort: If APT has any writable DSM
invalid. page, it requests AT to send d i s c a r d messages
0 On APT start: DSM manager of this process for those pages. After AT completes t o send
disables the access of all DSM pages so that any messages, APT transfers the all exclusive locks
access t o the DSM pages traps page fault. it holds to AT and disables the access t o all DSM
pages. By transferring exclusive locks t o AT,
0 On read page P fault: Fault handler (FH) tries invalid access from any other lock waiting APT
t o lock the page conditionally in SH mode. If the can be avoided.
request is not granted immediately and the lock
0 When AT receives get-withreadmode
81t means a memory coherence unit of DSM. ( g e t s i t h s r i t e m o d e ) message from APT: AT
298
forwards it t o the DSM server. After received 3.4 Pros and Cons of the Protocol
the valid copy (acknowledgment) from DSM We believe this protocol has the following advan-
server, AT downgrades its lock t o SH (NL) tages. First, it integrates cache coherence protocol
mode. and locking protocol, so that the number of messages
0 When AT receives r e c a l l ( i n v a l i d a t e ) mes- between DSM server and client can be reduced. When
sage from the DSM server: AT tries to lock the separated protocol is used, almost two times more
the corresponding page in SH (EX) mode. messages are required than the integrated one for APT
t o access a page. Second, this protocol takes advan-
After the lock granted, AT sends the page
tages of data caching and lock caching, which may
(acknowledgment) to DSM server.
reduce the number of messages between clients and a
3.3 DSM Server Protocol server to access a page. An application transaction
can read/write a DSM page without any server in-
The following is the scenario of the server part of
teractions, if that page is already cached in the client.
the protocol.
Third, this protocol requires any specific interfaces for
0 On receiving g e t s i t h r e a d r m o d e message from DSM manager except transaction commitment and
a client C: DSM server checks to see DSN server transaction abort. An application transaction pro-
has a valid copy of it. If server has, it sends grammer does not have t o lock pages, nor have to
the copy to the client C (more precisely, to the generate log records for DSM page updates. Fourth,
AT in the client C). Otherwise, it sends r e c a l l this protocol does not require the complex two phased
message to the page owner (say C’), and wait for commit protocol, which makes transactional protocol
valid page arrival. After receiving the valid copy complicated in most existing distributed DBMS.
of it, server marks C’ as a page holder (instead But, this protocol has a few shortcomings. It
of page owner) and marks also the client, C as a supports only sequential memory consistency which
page holder. may be too restrictive for some applications. But we
worry that any relaxed memory consistency will result
0 On receiving, get-with-writemode message in database inconsistency in most OMS applications.
from a client C: And sometimes transaction concept makes a relaxed
consistency protocol meaningless because locking pro-
1. If there is a page owner node for the page, it tocol, in general, requires very restrictive memory con-
sends r e c a l l message t o that client. After sistency. This protocol supports only FORCE buffer
receiving the valid copy of it, server marks strategy[ll], which reduces transaction throughput.
that client as a page holder. We understand that most commercial DBMSs having
2. If there are any page holder nodes for data shipping architecture use FORCE buffer strat-
the page, it sends i n v a l i d a t e messages t o egy. Lastly, If some DSM pages are frequently ac-
those clients. After receiving acknowledg- cessed from several nodes, overall system is likely t o
ments from all of them, server marks each fall into thrashing. We are currently studying t o over-
clients as ‘invalid-page holder’. come this problem.
3. Server sends a valid copy of the jpage to
the requesting client C and marks C as the 4 Related Works
page owner. Most of researches on DSM are emphasized on
0 On receiving commit message from a cllient C: memory coherence, granularity of sharing, heterogene-
ity, avoiding thrashing and so on. Only a few works
server receives valid pages from client and saves
(Hsu and Tam[l4], Hasting[l2]) are done in imple-
them into non-volatile storage of those pages.
menting DBMS or atomic transaction using DSM.
0 On receiving Idiscard message from a client c: Hsu and Tam’s work puts an emphasis on perfor-
server marks the client C as ‘invalid-page holder’ mance enhancement by integrating cache coherence
for that page, and marks itself as a page owner. (coherent memory) and concurrency control (process
synchronization). They show the performance en-
Due t o space constraints, we omit the correctness hancement using simulation study of two synchro-
arguments of this protocol. But, since locks hold by a nization algorithms: .2PL-MC, which separates trans-
APT are released only after it cornmits(or abort), this action synchronization from memory coherence, and
protocol is two phase locking protocol. Also, tlhe most 2PL* which bypasses memory coherence. Based on
recent page will be accessed by ’recall’ mechanism, this simulation results, they argue that significant per-
protocol guarantees sequential memory consistency. formance gain can potentially result from bypassing
299
memory coherence and supporting process synchro- [6] W. Effelsberg and T. Haerder. “Principles of
nization directly on DSM. Hastings’s work was t o prc- Database Buffer Management”. A C M Runs.
pose transactional distributed shared m e m o r y (TDSM) Database Syst., 9(4), Dec. 1984.
using Camelot[7] transaction facility, which provides [7] J. L. Eppinger, L. B. Mummert, and A. Z. Spector,
recoverable virtual memory and Mach external m e m - editors. Camelot and A d o n : A Distributed ll-ans-
ory manager (XMM)[S]. action Facility. Data Management Systems. Morgan
Kaufmann Publichers, Inc., San Mateo, CA, 1991.
5 Conclusion
[8] B. D. Fleisch and G. J. Popek. “Mirage: A Coherent
In this paper, we proposed two alternative dis- Distributed Shared Memory Design”. In Proc. 14th
tributed system architectures which are attempts at ACM Symposium Operating System Principles, pages
adopting DSM for distributed object management sys- 211-223, 1989.
tem: distributed shared cache (DSC) architecture [9] A. Forin, J. Barrera, M. Young, and R. Rashid. “De-
and distributed shared recoverable virtual memory sign, Implementation, and Performance Evaluation
(DSRVM) architecture and addressed some of the ma- of a Distributed Shared Memory Server for Mach”.
jor issues. Technical Report CMU-CS-88-165, School of Com-
In DSC architecture, we explored the tradeoffs in puter Science, Carnegi Mellon University, Aug. 1988.
the use of DSM as an object cache relative t o DSM as [lo] J. Gray and A. Reuter. Tkansaction Processing: Con-
a page cache. We also suggested a new replacement cepts and Techniques. Morgan Kaufman Publishers,
strategy exploiting the knowledge of the ownership of Inc., 1993.
data items and provide some feasible solutions to false
sharing problem. [ll] T. Haerder and A. Reuter. “Principles of Transaction-
Oriented Database Recovery”. A CM Comput. Suru.,
The major advantage of DSRVM architecture is 15(4):287-317, Dec. 1983.
to provide transactional facilities for direct manipu-
lations of data in DSM. We presented a new protocol [12] A. B. Hastings. “Ransactional Distributed Shared
for DSM to support transaction concept with minor Memory”. PhD thesis, School of Computer Science,
additional interfaces. We also discussed the pros and Carnegie Mellon Univ., 1992. CMU-CS-92-167.p~.
cons of the proposed protocol. [13] M. Hsu and V.-0. Tam. “Managing Databases in
We currently are studying in relieving contention Distributed Virtual Memory”. Technical Report TR-
for lock and log data by exploiting the semantics of 07-88, Harvard University, Mar. 1988.
these data. Also, we are working on the development [14] M. Hsu and V.-0. Tam. “Transaction Synchroniza-
of DSM adopted object storage system, SOPRANO[1]. tion in Distributed Shared Virtual Memory Systems”.
Technical Report TR-05-89, Aiken Computaion Lab.
References Harbard University, Jan. 1989.
[l] J.-H. Ahn, K.-W. Lee, and H.-J. Kim. “Soprano:
Implementation of High Performance Object Storage [15] J. E. B. Moss. “Working with Persistent Objects: To
System Using Distributed Shared Memory”. In prepa- Swizzle or Not to Swizzle”. IEEE n u n s . Softw. Eng.,
ration, 1995. 18(8):657-673, Aug. 1992.
[16] B. Nitzberg and V. Lo. “Distributed Shared Memory:
[2] R. Ananthanarayanan, S. Menon, and A. Mohindra.
A Survey of Issues and Algorithms”. In T. L. Casa-
“Experiences in Integrating Distributed Shared Mem- vant and M. Singhal, editors, Readings in Distributed
ory with Virtual Memory Management”. ACM Op-
erating System Reviews, 26(3):4-26, July 1992. vm- Computing Systems, pages 375-386. IEEE Computer
dsm-expr .ps.Z . Society Press, 1994.
[17] I. L. Traiger. “Virtual Memory Management for
[3] J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Database Systems”. ACM Operating System Reviews,
‘‘Munin: Distributed Shared Memory Based on Type- 16(4):26-48, Oct. 1982.
Specific Memory Coherence”. In Proc. 1990 Conf.
Principles abd Practice Parallel Programming, pages [l8] Y. Wang and L. A. Rowe. “Cache Consistency and
168-176, 1990. Concurrency Control in a Client/Server DBMS Archi-
tecture”. In Proceedings of the ACM SIGMOD Inter-
[4] M. J. Carey, M. J. Franklin, and M. Zaharioudakis. national Conference on Management of Data, pages
“Fine-Grained Sharing in a Page Server OODBMS”. 367-376, Denver, Colorado, May 1991.
In Proceedings of the ACM SIGMOD International
Conference on Management of Data, pages 359-370, [19] S. J. White and D. J. DeWitt. “A Performance Study
Minneapolis, Minnesota, May 1994. of Alternative Object Faulting and Pointer Swizzling
Strategies”. In The Proceedings of the International
[5] R. G. G. Cattel. “Object Data Managemnet”. Addison Conference on Very Large Data Bases, Aug. 1992.
Wesley, 1991.
300