Computer Science: Source: Distributed Systems

Definition of a Distributed System

The definition of a distributed system is one which has independent and self-sufficient – often heterogeneous or autonomous – spatially-separated components which must us a common interconnect to exchange information in order to coordinate information, and to make the whole system appear to its user as a single coherent system.

To further this more a distributed system is the result of collaboration between separate, independent processes. In order for separate, independent processes to collaborate, then they must interact.

If this interaction is by means of message passing then each component must be aware of the mechanisms for interprocess communication (IPC). As well as the protocols for using these mechanisms.

Event Synchronization

Each component within a distributed system must be aware that its internal events are ordered by a timeline that is different (distinct) from that of every other component on the same distributed system.

Therefore, since there is no global clock, then this interaction event via some IPC mechanism requires event synchronization of one kind or another. For example, one component A must be in a position to receive what another component B wants to send.

Distributed Computing Architectures

Different distributed computing (DC) architectures arise from the different ways IPC and event synchronization are handled within a distributed system that follows that paradigm.

Currently there are 5 DC paradigms in wide use. These are:

Direct Message Exchange (DME)

Mediated Message Exchange (MME)

Remote Procedure Call (RPC)

Early-Binding (E-B) Remote Method Invocation (RMI)

Late-Binding (L-B) RMI

Direct Message Exchange (DME)

Within DME architecture, IPC follows a simple pattern. A process must be in a position to receive a message from some other process. This implies that a process may send a message to some other process and that whenever a process receives a message, then it may process that message and sends a response message.

P₂ is in the receiving state.

P₁ sends a message M₁ to process P₂. P₁ is now in the receiving state.

P₂ receives message M₁ and processes it.

P₂ now sends a response message M₂ back to P₁. P₂ enters receiving state.

In DME architectures, event synchronization is unmediated. i.e., the interacting processes themselves must take care of it. There are two basic types of DME:

Asymmetric

Symmetric

Asymmetric DME

In an asymmetric DME, event synchronization is simpler because one process (server) can just wait for requests and the other process (client) can just wait for a response. An asymmetric DME also allows a process to be assigned special roles, i.e., a process may act as a server to other processes acting as clients.

So we can say that a client sends a request message to the server, and then waits in receiving mode. The server has been waiting in receiving mode for a request message from the client. So on receiving said message, the server processes it and sends the client the response message.

Within an asymmetric DME, it is common for the server to take on heavier loads, this requires balancing, and can prevent scaling.

Some typical examples of asymmetric DME’s are the Web and email.

[caption id="attachment_265" align="aligncenter" width="383" caption="Asymmetric Interconnect"] asymmetric intefconnect

[/caption]

Symmetric DME

In a symmetric DME the problem is much less acute. No special role is assigned to any process in a symmetric DME, unlike in an asymmetric DME. This time processes are peers, hence the term peer-to-peer (P2P) architecture. P2P architectures are seen as leading naturally to balanced loads, and graceful scaling

No process simply waits for a request message in order to respond, and no process simply makes a request message and waits for a response. This is why event synchronization is less simple in a symmetric DME.

An example is Skype:

Skype is a P2P VoIP (Voice over IP) application

Skype uses super-peers which are basically just peers which take on a specialized role, in order to form a hierarchy amongst peers.

Skype is therefore not a pure symmetric DME.

[caption id="attachment_266" align="aligncenter" width="333" caption="Symmetric Interconnect"]

[/caption]

Mediated Message Exchange (MME)

Within an MME, IPC is not direct, unlike in DME. This means that processes do not send and receive messages directly, this is because they have to communicate directly with a middleware component first. This form of middleware is referred to as message-oriented middleware or MOM.

The MOM component is interposed between processes and it is used to mediate IPC. It also allows event synchronization to be decoupled in time without threading or forking.

[caption id="attachment_267" align="aligncenter" width="466" caption="Message-Oriented Middleware"]

[/caption]

Messages are sent to the MOM in a queue, which then sends them to the receiving process. If a message comes from one source process and goes to one destination process, the MME architecture is said to be point-to-point.

In some MME architectures a message can go to more than one destination process, such as publish-subscribe systems.

In this case, a process subscribes with the MOM expressing an interest in certain events from other processes. Then, when a process publishes an event with the MOM, it notifies the subscribing process.

Remote-Call Approaches (RPC)

Approaches that are based on message exchange need to consider event synchronization explicitly. They also need to submit explicit IPC discipline by means of protocols such as HTTP, SMTP, VoIP etc.

An alternative way is to view both IPC and event synchronization as implicit. One paradigm for this is known as remote procedure call (RPC). In RPC architecture, IPC and event synchronization are triggered by a call to a remote process. The information that is exchanged is no longer seen as a message, but as a parameter passing.

RPC can also be middleware-based as well, similar to that of MME.

Remote-Call Approach – An Object-Oriented Case

Sometimes, RPC can be expresses using Object-Oriented, this is known as remote method invocation (RMI).

Because of inheritance, the object abstraction implies that there may be options as to which remote object is being invoked. If the invoking process knows which remote object to invoke then one can refer to this as an early-binding (E-B) RMI architecture.

Put quite simply it is possible to find out which method within a remote object is being invoked from an invoking process.

Another RMI architecture arises if the invoking process delays the decision as to which remote object to invoke. This can be referred to as late-binding (L-B) RMI architecture.

In this case the invoking process may search a directory to select the remote object it wishes to invoke. This means that different objects may be selected at different times. This can be seen as introducing time-decoupling in the IPC approach.

Abstraction Levels in DC Architectures

DME architectures are the most prevalent within a DC.

The benefits of middleware-based architectures such as MME, RPC and RMI are that they raise the level of abstraction at which we can engineer distributed systems.

All architectures build on low-level IPC and event synchronization.

============================================

Abbreviations

IPC = InterProcess Communication

DME = Direct Message Exchange

MME = Mediated Message Exchange

RPC = Remote Procedure Call

MOM = Message-Oriented Middleware

RMI = Remote Method Invocation

E-B = Early-Binding

L-B = Late-Binding

DC = Distributed Computing

============================================

Thursday, 10 September 2009

Distributed Systems - Architectural Paradigms