0% found this document useful (0 votes)
156 views

The Distributed Computing Model Based On The Capabilities of The Internet

This document discusses distributed computing models that utilize idle computing resources from internet-connected devices. It describes key aspects of distributed computing including peer-to-peer and client-server architectures. Examples like BOINC are provided that assign computational tasks to volunteer devices and aggregate the results. Advantages of distributed computing include low costs compared to supercomputers, while disadvantages include lower reliability and independence. The document outlines server daemons needed to manage task generation, distribution, and results analysis in a distributed system.

Uploaded by

manjulakinnal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views

The Distributed Computing Model Based On The Capabilities of The Internet

This document discusses distributed computing models that utilize idle computing resources from internet-connected devices. It describes key aspects of distributed computing including peer-to-peer and client-server architectures. Examples like BOINC are provided that assign computational tasks to volunteer devices and aggregate the results. Advantages of distributed computing include low costs compared to supercomputers, while disadvantages include lower reliability and independence. The document outlines server daemons needed to manage task generation, distribution, and results analysis in a distributed system.

Uploaded by

manjulakinnal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The Distributed Computing Model Based on The

Capabilities of The Internet


Lukasz Swierczewski
Computer Science and Automation Institute
College of Computer Science and Business Administration in Łomża
Lomza, Poland
luk.swierczewski@gmail.com

Abstract—Paper describes the theoretical and practical aspects of


the proposed model that uses distributed computing to a global
network of Internet communication. Distributed computing are
widely used in modern solutions such as research, where the
requirement is very high processing power, which can not be
placed in one centralized point. The presented solution is based
on open technologies and computers to perform calculations
provided mainly by Internet users who are volunteers.

Keywords-distributed computing, architectures and design


systems

I. INTRODUCTION Figure 1. Peer-To-Peer Architecture Diagram

According to the definition, distributed system is a


collection of independent equipment connected together as a
one seamless logical entity. Most commonly the term
equipment is used to describe desktop computers, however,
currently that can also include media tablets as well as
handsets. The solution described in the article is based on the
Internet, yet less sophisticated communication methods can
also be used. Distributed systems create the illusion of a
centralized system (single and integrated). This feature is
called transparency and it is one of the key characteristic of this
class of solutions. The concept of using distributed resources
appeared in the late 70s. Today, thanks to the Internet, it is
possible to exploit millions of computers provided by Figure 2. Client/Server Architecture Diagram
volunteers.

II. GENERAL III. PREPARE YOUR PAPER BEFORE STYLING


There are various distributed system models. The most The principle behind BOINC is quite straightforward. Each
popular ones include: Peer-To-Peer, Cloud Computing, Grid Internet user can download the client application, that
Computing and Client/Server. In open-source solutions Client/ automatically connects to the server and downloads selected
Server architecture is most widespread. One of the perfect data portions for further computation. After the assigned task is
example of a Client/Server model is BOINC (Berkley Open completed, client uploads the results to the server where
Infrastructure for Network Computing), that started as a scientists can analyze them.
resource pooling solution for SETI@Home project and is
constantly being developed at University of California,
Berkeley.

Identify applicable sponsor/s here. If no sponsors, delete this text box. (sponsors)
Figure 3. Performance comparison of different computational systems.

IV. ADVANTAGES AND DISADVANTAGES OF DISTRIBUTED SYSTEMS information provided by system administrators was collected.
These are most commonly based on the number of uploaded
Distributed systems despite many advantages also have
results in a given timeframe.
disadvantages. They are completely different in contrast to
supercomputers widely adopted in the academic setting, that A simple comparison of supercomputer and a distributed
are typically organized in computing clusters. Distributed system has been shown in Table I.
systems, are not able to solve all the problems, that
supercomputers can handle. The most severe restriction is the
TABLE I. COMPARING THE CHARACTERISTICS OF A DISTRIBUTED SYSTEM AND
data transfer rate. Distributed systems need to relay on the CLASSICAL SUPERCOMPUTERS
Internet connections that are significantly slower than the
Distributed
solutions used in computing clusters (e.g. Infiniband interface). system
Supercomputer
Therefore, tasks should not require sending large chunks of
Reliability Low High
data, in fact, they should provide a long computation times on
modern CPUs with relatively small input information. Another Independence Low High
significant drawback is the calculation uncertainty. One needs Scalability High Mean
to remember, that the data are being processed on third-party
computers, so there is no guarantee that they will be done Cost Very low Very high
correctly. Moreover, there is no assurance that the computation
results will be uploaded back to the server, while the user can
at any time uninstall required software or reinstall their There are many additional components that can help to
operating system. design and build individual distributed platforms. The system
can use precompiled solutions for the middleware, that
facilitate effective communication between different
Significant advantage of the distributed systems, is their components of the system. Most commonly used middleware
relatively low operating cost. Within the Client/Server platforms are CORBA (Common Object Request Broker
architecture only the server administration needs to be handled Architecture), RMI (Remote Method Invocation) and DCOM
and that often means as little as one physical server. The server (Distributed Component Object Model).
is utilized only to assign and manage tasks that are then being
completed by other computers. Low cost of such solutions has V. POSSIBLE SERVER DAEMONS
led to high interest from amateur programmers, individual Certain services need to be offered by a distributed system
scientists and research facilities. server i.e. task generation, distributing the tasks between the
Fig. 3 presents a comparison of a few most important clients and analyzing returned results. Number of daemons, as
distributed systems with three supercomputers that are listed on well as work scheduling, do not need to be evenly distributed,
TOP500. With regards to the overall performance, distributed however most often it does not differ significantly from the
systems are often not lagging far behind the technology used structure used by BOINC.
currently by NASA or the military (that needs significant
a) Assimilator: Assimilator operates on tasks that have
financial capital to operate). In order to create the chart for
supercomputers, LINPACK performance results that measure been finished and their results are already known. This service,
speed of solving a complex system of linear equations were usually saves relevant data regarding the task to the central data
used. On the other hand, for the distributed systems, base system, and when the need arise, it can also delete
temporary data from the database.
b) Transitioner: Transitioner is responsible for analyzing regions where downloading larger amounts of data can result in
tasks’ status. This is the service that e.g. assigns tasks to other long waiting times. Due to the large physical separation, issues
computers when the initial one did not return the results in a related with Internet backbone can also be visible. The term
given timeframe. In the mean time the old task is cancelled. scalability also defines system administration and maintenance.
Moreover, when the same task is sent to a few different Despite the fact that the system is being distributed, it should
computers and different results are returned, Transitioner sends be perceived by the users as one logically consistent system.
it to additional computers to verify computation correctness.
c) Validator: Validator is one of the last daemons
handling task-related operations. This service verifies uploaded
results. Generally, this daemon also assigns points (credits) to
users that are considered an award for the computation
contribution (this is done in the BOINC).
d) Work Generator: Work Generator is a daemon
generating tasks in a fully automated manner, that are then
distributed to other computers. These tasks can be generated
considering computational capabilities of various computers
(available RAM memory, free HDD space, type of CPU). Figure 5. Main problems concerning the scalability of distributed
systems.
e) Others: There are additional daemons that can be
described in the server structure. These daemons can be
responsible for other operations (e.g. cleaning) on files and VII. SECURITY
data base records. Due to their work, despite long operating Similarly to the scalability issue, during the security
times of the system, the amount of stored data does not analysis one can denote smaller, distinct components.
increase indefinitely and waste information are not stored.

Distributed systems frequently include a browser based


interface, that allows to view tasks and computer related
statistics. Often, there are other services aimed to cooperate
with that kind of interface.

Figure 3. sa

Figure 6. Main problems regarding distributed systems security

Data processed using distributed systems are more


vulnerable to illegal intrusions. The platform should maximize
security, however it will always be lower than with
supercomputers that are operated only by certain entities.

VIII. APPLICATIONS AND ALGORITHMS


Figure 4. Separate server daemons at Berkeley Open
Infrastructure for Network Computing. Distributed systems have potential use not only in
traditional computing tasks. Other applications using these
principles are network routing, distributed databases, or ICS
VI. SCALABILITY (Industrial Control System). Such systems are also used in
Within the term scalability in distributed systems there are aircraft flight control systems, where exchange of information
clearly a few key aspects that can be mentioned. Figure 5 between heterogeneous systems is widespread.
presents these aspects. Algorithms executed in the high performance computing
Size scalability can pose a significant challenge. There systems can be classified into the following:
might be a large number of users joining the system that is 1) Parallel algorithms in shared-memory model (e.g. SMP
available on the Internet. The amount of clients and computers architecture computers),
can not increase indefinitely, therefore at a certain point, the
2) Parallel algorithms in message-passing model (e.g.
server will become the bottleneck. Furthermore, computers can
be located in different parts of the world, sometimes even in performance clusters),
3) Distributed algorithms.
The main difference between algorithms lies in the
communication capabilities. In the first case, the programmer
can use shared memory, which is very useful and allows to
implementation a large number of problems in a simple manner
(library OpenMP). In the model, using message passing
interface (MPI library), the developer no longer has so much
freedom, but may try to define his own logical network
structure, which will be used to transfer the data. However, in
the case of distributed algorithms, the designer needs to accept
the fact that the network structure may be the weakest link.

IX. GOLDBACH CONJECTURE PROJECT


In the course of this research, self-developed platform
operating under BOINC was used. The goal was to verify the
correctness of the Goldbach conjecture. This famous number Figure 7. Avarage server CPU utilization (Intel Celeron 1200
theory problem states that every even natural number greater MHz)
than 2 is the sum of two primes. For more than 250 years, no
one managed to confirm or disprove this hypothesis. This
problem is very well suited for distributed computing, due to
easy division of tasks and complete independence of their
execution. The project was supported by about 15000
computers that were connected by nearly 1000 users. The
system was not promoted by any additional advertisements,
which clearly shows the interest of users in such endeavors.
The main problem with the BOINC server was MySQL
database server, that was heavily utilized. If old results were
not remove from the database for several days, storing large
amount of historical data resulted in non efficient query
handling by the system. Due to the available resources, only an
standard hard drive was used - perhaps if an array of modern
SSD hard drives were used and the tables were distributed
among them the problem would not be visible. RAM is another
component that is essential for a server. For use in larger
projects, 4 GB of RAM is the absolute minimum. Monthly Figure 8. Average performance platform Goldbach Conjecture
Project
maintenance cost of a dedicated server at hosting company are
approximately $50 (without activation fee). To run a
It is easy to realize, that performance remains fairly
distributed data processing server at home, link offering high
constant regardless of the time. Intuition suggests, that during
throughput is required. Most of the connections offered for
night-time, data processing speed should drop, because many
individual users may not be sufficient, because of the
users would turn off their computers. However, one should
asymmetrical link properties (too low upload speed). There are
note, that the users and their computers are scattered around the
also problems with maintaining a large number of
globe, and thus are located in different time zones. It is the very
simultaneous connections on such a link. That kind of
reason why performance level is steady. Most severe drop in
disadvantages, can often dissuade volunteers from connecting
performance is always caused by server problems. The whole
and sharing their resources with the project. This results in an
system is dependent on its reliability. In addition, one needs to
overall decrease in system performance. One of the less
keep in mind that users participating in our project, collect
important server components during the Goldbach Conjecture
points, which are stored in the database. Therefore, one needs
Project was the CPU. It was not extensively utilized by any of
to particularly care about not losing their achievements, since
the running services. Despite this fact, official BOINC
they are often the key to volunteers motivation. On average,
requirements for the CPU are high e.g. for particularly
1.5 users per day joined the Goldbach Conjecture. From
demanding validators of the returned results. The average load
another perspective, number of computers increased every day
on the CPU (Intel Celeron 1200 MHz) on a Goldbach
by approximately 2. It is shown on Fig. 9 and Fig. 10
Conjecture Project Server is shown on Fig. 7. Average system
respectively. These values were measured for a system already
performance was approximately 10 TeraFLOPS. It has been
running longer than 6 months. In case of a new BOINC project,
calculated based on the number of returned task during given
they are initially much higher. Goldbach Conjeture Project did
time frame, and the average amount of computation required to
not solve the Goldbach's conjecture. However, working with
execute one of them. Performance of 10 TeraFLOPS is an
server infrastructure developed by the University of California
astonishing result, especially when taking into account such
illustrated the great potential that resides in this solution.
low operating costs. Platform performance measurements are
presented on Fig. 8.
X. CONCLUSIONS
This paper discusses the possibility of using a distributed
system as a very good alternative to centralized systems, in
some cases. Traditional supercomputers require significantly
larger capital and operating expenditures. Nowadays, solutions
based on distributed systems are becoming increasingly
popular, and Internet users willingly share their computer
resources to support computation carried out by academic
institutions around the globe. Certainly in the coming years,
this technology will grow rapidly.

However, one needs to remember that with the


development of the Internet and other technologies, distributed
systems will have to face new problems and challenges.
Figure 9. Number of registered users during certain days

REFERENCES
[1] G. Coulouris, J. Dollimore and T. Kindberg, Distributed Systems -
Concepts and Design. U.K.: Addison-Wesley, Fourth Edition, 2005.
[2] B. C. Neuman, Scale in Distributed Systems, Readings in Distributed
Computing Systems. IEEE Computer Society Press, 1994.
[3] B. Jacob, M. Brown, K. Fukui, N. Trivedi, Introduction to Grid
Computing. International Business Machines Corporation, U.S.A.: IBM,
International Technical Support Organization, First Edition, 2005.
[4] D. P. Anderson, BOINC: A System for Public-Resource Computing and
Storage. Proceedings of the Fifth IEEE/ACM International Workshop on
Grid
[5] Computing, 2004.
[6] C. U. Sottrup, J. G. Pedersen, Developing Distrubited Computing
Solutions: Combining Grid Computing and Public Computing. M. Sc.
Thesis, Department of Computer Science, University of Copenhagen,
2005.
[7] R. J. Al-Ali, K. Amin, G. von Laszewski, O. F. Rana, D. W. Walker, M.
Hategan, N. Zaluzec, Analysis and Provision of QoS for Distributed
Figure 10. Number of computers added during certain days
Grid Applications. Kluwer Academic Publishers, 2004.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy