Computer Science: Source: Distributed Systems

Definition of a Distributed System

A distributed System is one in which independent, self-sufficient – often autonomous or heterogeneous – spatially-separated components must use a common interconnect to exchange information and coordinate actions, and allow the whole to appear to the user as one single coherent system.

Interaction

Interaction is for coordination and cooperation. This means that the relationships between events in the timeline of each interacting process matters, meaning that it may lead to competition, which in turn, implies the need for management if global goals are to be achieved.

This interaction can be achieved through many things, such as sharing an address space, or by synchronous communication mechanisms (eg, remote procedure calls (RPC)), or by message passing, in this case, protocols at different abstraction levels are relied upon to deal with transport, routing, integrity security and reliability.

If components are spatially-separated, then interaction implies an interconnect, which centralized systems do not need to take into account explicitly. The crucial new element in distributed computation compared to the centralized case is the need to consider and interconnect between different elements.

Interconnection Types

No Interconnection

The very basic model is one that uses no interconnect. That is, a single algorithm can only access a single data-repository, this is cost-neutral. Within computer architecture, this model is often referred to as SISD – Single Instruction Single Data.

Algorithm

↓

Data

Interconnecting Storage

Now we look at a basic model, that does use an interconnect. With this, we can apply an algorithm to a distinct partition of a data-repository that we have distributed for such as purpose. However, access to the data is no longer cost-neutral, this is because there is a cost going through the interconnect. Within computer architecture, this model is often referred to as SIMD – Single Instruction Multiple Data.

Algorithm (ALGO)
↓
D	A	T	A

Interconnecting Processors

Now we see another model. This time, unlike before, rather than applying an algorithm to partitions of the data, this time, we shall apply several algorithms to the same data. Again, like before, this model is not cost-neutral as there is a cost going through the interconnect. In computer architecture, this model is often referred to as MISD – Multiple Algorithm Single Data.

A	L	G	O
↓	↓	↓	↓
DATA

This type of model however, is generally less useful in comparison to the SIMD model.

This approach is generally used in graphics, where an exam could be that a single pixel can be operated on in many different ways.

Interconnecting Machines

This final model is an inherently-distributed system. That is, one in which an interconnect binds together centralized systems taken as autonomous components. The interconnect, again, is still not cost-neutral, however, the grain of functionality is much bigger. This interconnect cost is paid for complete computations and not just from data accesses. In computer architecture, this model is often referred to as MIMD – Multiple Instructions Multiple Data.

A	L	G	O
↓	↓	↓	↓
D	A	T	A

Interconnects at Various Scales

Microchips:

The Buses on a microchip, such as the Data Bus and the Address Bus are all used to connect the multiple cores of a system.

Parallel Machines / Appliances:
This uses a very fast type of interconnect – such as InfiniBand. The interconnects bind together multiple, specially-assembled computing and storage resources.

Clusters Of Workstations (COW):
These types of interconnects are used across racks that pile up commodity machines.

Network Of Workstations (NOW):
This type of interconnect is now a form of Local Area Network or LAN. This can be in the form of, eg, an Ethernet cable.

Web / Grid:
This type of interconnect is now in the form of a Wide Area Network or WAN. Such an example is, of course, the Internet.

Networks as Interconnects

In the real world, the most common model of a distributed system is an MIMD – or Multiple Instructions Multiple Data. These models can also be called Shared-Nothing architectures. Another feature to add to theMIMD model, is that the interconnect in this case is also a full-blown network, and not just a piece of electronics or advanced cabling technology.

Because of this, the network exists independently as a complex, physical fabric, to which the use of which must be mediated by complex and sophisticated software.

The Physical Fabric and its Components

In this, we shall ignore wireless elements, such as wireless networks. So all LANs and WANs are wired networks.

We start of with the basics, so for individual machines, or networked machines in a household. These machines all connect to a server on the service provider or (ISP – Internet Service Provider), this is done by using a dial-up line (DS0) or a cable (ISDN = 2xDS0) or by using a Digital Subscriber Lines (DSL).

Note: ISDN stand for Integrated Services Digital Network.

OK, now we move on to a larger scale. This time let’s look at LANs that are formed within an organization using Ethernet cables. This Ethernet is used as the fabric to connect the LAN to a Point-Of-Presence (POP) through an optical T1 = 24xDS0 trunks.

Then, finally we have a T3 = 28xT1 trunks. This gives access to the Network Access Points (NAP) that are linked by Very-high-speed Backbone Network Services (vBNS), this will contain links such as OC3 = 84xT1 trunks, and even higher (faster). OC3 stands for Optical Carrier with the 3 being the level of digital signal that can be carried across the fiber optic networks.

: interconnects example

Forming Networks

The Internet is a Packet Switched Network. This is a network in which data is transmitted in units calledpackets. These packets are then routed individually over the most efficient network connection and reassembled to form a complete message at the destination point. To help these packets reach there destination across variousautonomous and physical networks, control data is usually added, and these are typically headers.

On their way, packets normally pass through routers. These routers then examine a packets destination point – via the header – and then take into account traffic volumes before it forwards the packet.

This packet is forwarded to another router that is closer to the destination, and on the way to which there seems to be a lightly-loaded route – more traffic free.

Routers are therefore crucial in avoiding congestion in networks, which are due to the competition between a lot of sources for a route to one target.

Application

↓

Transport

↓

Internet

↓

Link

↓

Physical

This is an example of the Software Stack that is used over the Fabric.

Protocol Layers

The Internet can be understood as having 4 main protocol layers. These layers are:

Application layer: This is the only layer that deals with messages in the form of SEND and RECEIVE.

Transport Layer: This is the layer that breaks a message down into its packet form, and then wraps these packets with control information – headers.

Internet Layer: This layer then breaks data even further, in order to fulfil its main task, which is to route the packets forward to their destination.

Link Layer: This layer oversees the working of the Physical Fabric and may request and send confirmation of error-free transmission.

Protocols at each Layer

Application	HTTP
↓	↓
Transport	TCP
↓	↓
Internet	IP
↓	↓
Link	MAC
↓	↓
Physical	802.11b

The last protocol, ie 802.11b, is also known as WiFi. This is a technology standard for wireless transmissions of data through radio frequencies. Or, an IEEE (the Institute of Electrical and Electronics Engineers) standard forWLAN networks.

Protocol Instances

Each protocol layer can be instantiated by many distinct, specific protocols. The best known application layer protocol is the HyperText Transfer Protocol, or, as referred to above – HTTP. This protocol is used to retrieve text documents via the link traversal.

The main protocols that are used for email are SMTP and POP3/IMAP. And VoIP is used for audio.

The best known transport protocol is the transmission control protocol (TCP), however, the User Datagram Protocol (UDP) is widely used where it is most suited for.

The Internet Protocol, or more commonly known as IP, is the most universal Internet layer protocol for routing in the Internet.

As it can be seen, the transport protocol can be split into two further protocols, namely TCP and UDP. Now the question is… which is better? Here are some of the basic differences between TCP and UDP:

TCP is connection-oriented, whilst UDP is not. This implies that TCP establishes a connection between communicating nodes before it will transmit the required data. So similar to the synchronous SEND andRECEIVE architectures.

TCP is reliable, this is because it ensures that the message payload will be:complete - by requesting acknowledgement of a receipt of a packet and resending if it fails to receive that acknowledgement.Intact - by adding a checksum to each packet that reveals en-route corruption if it happens during transmission.In the right order and without duplication - by adding a sequence number, this is done even if it resends unnecessarily.

UDP only ensures that there will be no corruption of the data during transmission.

UDP is often only used when the payload can be made small; this is when set-up of communication can be quick. The loss of the odd packet is not problematic, such as in audio and/or video files.

Thursday, 10 September 2009

Distributed Systems - Interconnects