Cluster systems in scientific calculations. Cluster systems

Cluster technologies have long been available to ordinary organizations. This became possible due to the use of inexpensive Intel servers, standard tools for communications and widespread OS in the initial level clusters. Cluster solutions on Microsoft platforms are oriented primarily on the fight against operator errors, equipment failures. Cluster solutions - an effective means to solve these problems.

As computer technology develops, the degree of its integration into business processes of enterprises and the activities of organizations have increased dramatically. There was a problem of a sharp increase in time during which computing resources are available, and it becomes increasingly relevant. The reliability of the servers is becoming one of the key factors of the successful work of companies with developed network infrastructureThis is especially important for large enterprises in which special systems support real-time production processes, for banks with an extensive branch network, or telephone operator service centers that use a decision support system. All such enterprises require servers that work continuously and provide every day information 24 hours without interruptions.

The cost of idle equipment for the enterprise is constantly growing, as it consists of the value of lost information, lost profits, the cost of technical support and restoration, customer dissatisfaction, etc. How to create a reliable system and how much costs to solve this problem? There are a number of techniques that allow you to calculate the cost of a minute of downtime for this enterprise and then on the basis of this calculation you can choose the most acceptable solution with the best price ratio and functionality.

There are many options and means to build a reliable system of the computing system. RAID disk arrays, backup power supplies, for example, "insure" part of the system equipment in case of failure of other similar components of the system, and allow you to not interrupt the processing of requests to information when refusal. Uninterruptible power sources will support the performance of the system in case of failures in the power supply network. Multiprocessor system boards will ensure the functioning of the server in the event of a failure of one processor. However, none of these options save if the entire computing system will be out of the entire computing system. Clusterization comes to the rescue.

Historically, the first step to the creation of clusters is considered widespread in their time the system "hot" reserve. One or two such systems included in the network from several servers do not perform any useful work, but are ready to start functioning, as soon as any of the main systems fails. Thus, the servers duplicate each other in case of failure or breakage of one of them. But I would like to combine several computers, they did not just duplicate each other, but also performed another useful work, distributing the load among themselves. For such systems, in many cases, clusters are suitable better.

Initially, clusters were used only for powerful computing and support distributed databases, especially where increased reliability is required. In the future, they began to apply for the Web service. However, the decline in prices for clusters led to the fact that such solutions are increasingly used for other needs. Cluster technologies have finally become available to ordinary organizations - in particular, thanks to the use of low-cost Intel servers, standard tools for communication and common operating systems (OS) in the initial level clusters.

Cluster solutions on Microsoft platforms are primarily oriented primarily to combat equipment and software failures (software). Statistics of bounce such systems are well known: Only 22% of them are directly caused by equipment failures, OS, power supply, etc. To exclude these factors, various technologies for enhancing servers failover are applied (redundant and replaceable disks, power supply, boards in PCI connectors, etc.). However, 78% of the remaining incidents are usually caused by application failures and operator errors. Cluster solutions - an effective means to solve this problem.

Clusters allow you to build a unique architecture that has sufficient performance, resistance to equipment failures and software. Such a system is easily scaled and upgraded by universal means, based on standard components and for a reasonable price, which is significantly less than the price of a unique fault tolerant computer or a system with mass parallelism).

The term "cluster" means both fault tolerance, and scalability, and handling. You can give a classic cluster definition: "The cluster is a parallel or distributed system consisting of several computers related to each other and at the same time used as a unified, unified computer resource." The cluster is a combination of several computers, which at a certain level of abstraction are controlled and used as a single integer. On each node of the cluster (the node is usually a computer included in the cluster) is its own copy of the OS. Recall that systems with SMP and NUMA architecture having one common copycannot be considered clusters. The cluster node can be both single-processor and multiprocessor computer, and within one cluster, computers may have a different configuration (a different number of processors, different volumes of RAM and disks). Cluster nodes are connected to each other or using conventional network connections (Ethernet, FDDI, Fiber Channel) or by non-standard special technologies. Such intraclastic, or interstal connections allow nodes to interact with each other independently of the external network medium. On intra-cluster channels nodes not only communicate with information, but also control each other's efficiency.

There is a wider definition of a cluster: "The cluster is a system acting as a single integer guaranteeing a high reliability having centralized management of all resources and a shared file system and, in addition, providing configuration flexibility and ease in resource buildup."

As already noted, the main purpose of the cluster is to ensure high - compared to the fragmented set of computers or servers - the level of readiness (otherwise called the level of availability - High Availability, HA), as well as a high degree of scalability and ease of administration. Improving the system's readiness ensures the work of the applications critical for the user during the longest period of time. It is possible to include all applications that directly depends on the company's ability to make a profit, to provide a service or provide other vital functions. As a rule, the use of the cluster allows you to ensure that if the server or any application stops functioning normally, another server in the cluster, continuing to perform your tasks will take on the role of a faulty server (or run a copy of a faulty application) In order to minimize user downtime due to a malfunction in the system.

Readiness is usually measured As a percentage of time spent by the system in a working condition, from the total time of work. Various applications require different readiness from the computing system. The system's readiness can be increased by various methods. The choice of the method is carried out depending on the cost of the system and the cost for the enterprise of idle time. There are fairly cheap solutions that, as a rule, focus mainly at a decrease in idle time after a malfunction occurs. More expensive provide the normal functioning of the system and provide service to users even if one or more of its components failed. As the system is heard, its price increases nonlinearly. Similarly, the cost of its support is non-linear. Systems with relatively low costs have a not enough high level of fault tolerance - no more than 99% (this means that approximately four days a year the information structure of the enterprise will be inoperable). This is not so much if here includes both planned downtime related to preventive work or reconfiguration.

High availability (readiness) implies such a solution that can continue to function or restore the functioning after the occurrence of most errors no operator intervention. The most advanced (and naturally expensive) failover solutions are capable of providing 99.999% of the reliability of the system, (i.e. not more than 5 minutes downtime per year).

Between single server systems with mirrored disk subsystems (or RAID disk arrays) and fault-tolerant systems, the "gold middle" provide cluster solutions. In terms of availability, they approach fault tolerant systems with incommensurable less cost. Such solutions are ideal for cases when only very minor unplanned downtime can be allowed.

In case of failure of the cluster system recovery Manages Special Software and Hardware. Cluster software allows you to automatically define a single hardware or software failure, isolate it and restore the system. Specially designed subroutines can choose the most fast way Recovery and in the minimum time to ensure the performance of services. With the help of a built-in tool tool for developing and programming interface, you can create special programs that detect, insulating and eliminating failures that occur in applications developed by the user.

An important advantage of clustering is to ensure scalability. The cluster allows you to flexibly increase the system computing power, adding new nodes to it and without interrupting users. Modern cluster solutions provide automatic load distribution between cluster nodes, as a result of which one application can work on multiple servers and use their computing resources. Typical applications operated on clusters, it is:

database;
enterprise resource management systems (ERP);
message processing and postal systems;
transaction processing tools via Web and Web servers;
clients interaction systems (CRM);
file and print separation systems.

So, the cluster combines multiple servers interconnected special Communication Canal, Frequently called system network. Cluster nodes control each other's performance and exchange specific information, such as the cluster configuration, and also transmit data between common drives and coordinate their use.

Control of workability Exercised with help special signal HeartBeat. ("pulse"). This signal cluster nodes transmits each other to confirm their normal functioning. In small headbeat clusters, the same channels are transmitted in the same channels as the data, special lines are highlighted in large cluster systems. Cluster software should receive a "pulse" signal of each server at a certain time interval - in case of its non-treatment, the server is considered to be non-working and the cluster is automatically reconfigured. Conflicts between servers are automatically allowed when the problem of selecting a "master" server or server group arises when the cluster is started, whose task is to form a new cluster.

For the organization of the cluster communication channel, ordinary network technologies (Ethernet, token Ring, FDDI, ATM), shared I / O bus (SCSI or PCI), high-speed Fiber Channel interface or specialized CI technologies (COMPUTER INTERCONNECT), DSSI (Digital Storage System InterConnect) or Memory Channel.

The DSSI interface is designed to access drives and interacting systems among themselves. It is similar to the SCSI-2 multi-end protocol, but has more performance and the possibility of organizing computer interaction. DSSI clusters support system reliability tools, resource separation, distributed file system and transparency. From the point of view of control and security, the DSSI cluster is represented by a single domain.

CI interface - double sequential tire with a exchange rate of up to 70 Mbps. It is connected to the computer I / O system by means of an intelligent controller capable of maintaining both double and with a single bus, depending on the requirements for access reliable for a particular computer. All CI interface communication lines are connected with a CI integrator - a special device tracking connections with nodes and cluster configurations.

MEMORY CHANNEL technology allows you to create a highly efficient communication environment that provides high-speed (up to 100 MB / s) messaging between servers in a cluster.

The requirements for the speed of the communication channel depends on the degree of integration of cluster nodes and the nature of the application of applications. If, for example, applications in different nodes do not interact with each other and do not carry out simultaneous access to disk drives, the nodes exchange only with control messages confirming their performance, as well as information on changing the cluster configuration, i.e. adding new nodes , the redistribution of disk volumes, etc. This type of exchange will not require significant interspair resources and may well be satisfied with a simple 10 megabit ethernet channel.

Real cluster configurations There are a huge amount. There are solutions that are a combination of several clusters, and even with additional devices. Each options meets the requirements of the relevant different applications And, naturally, differ both in terms of cost and complexity of implementation. Such topology of clusters as a star, ring, N-N, and others are widely used. But, no matter how complex and exotic cluster can be qualified in two criteria:

Organization of RAM of cluster nodes,

The degree of availability of I / O devices, first of all - disks.

As for RAM, there are two options: either all cluster nodes have an independent rAMOr they have a common shared memory. The degree of availability of cluster I / O devices is mainly determined by the possibility of using external memory with shared disks, and this implies that any node has transparent access to the shared disk space system. In addition to the shared disk subsystem, local disks can have local disks on the cluster nodes, but in this case they are mainly used to load the OS on the node. Such a cluster must have a special subsystem called a Distributed Lock Manager, DLM, to eliminate conflicts while writing to files from different cluster nodes. In systems where there is no DLM, applications cannot work in parallel with the same data, and the total disk memory, if any, is assigned to one of the nodes at a specific point in time.

In clusters that do not support simultaneous access to external memory, all nodes are completely autonomous servers.In the case of two nodes, access to the shared memory on disks is carried out using the separated I / O bus (Fig. 1). For each node, such a tire ends in a disk array. At any time, only one node owns a shared file system. If one of the servers fails, control over the bus and the separated discs proceeds to another node.

Fig. 1. Building a cluster of two nodes.

For companies with an integrated information system, where only part of resources is involved to perform critical application reliability, a scheme for constructing an active - backup cluster (Fig. 2) can be applied. In such a system, the simplest case includes an active server that performs the most important applications, and a backup machine that solves less responsible tasks. If the active server fails, all its applications are automatically transferred to the backup, where applications with lower priority are stopped functioning. This configuration allows you to exclude the slowdown in the work of critical applications - users simply won't notice any changes (a special case of this scheme - the "passive - backup" configuration, in which the backup server does not carry any load and is in standby mode).

Fig. 2. Building a cluster of the type "Active - Reserve".

There is also a configuration "Active - Active", which implies the execution by all selected applications cluster servers the same high priority, the computational resources of the backup server are used in everyday work. The advantage of this approach is that the user has at its disposal a highly accessible system (server is duplicated) and at the same time can use all cluster computing resources. This reduces the overall cost of the system, referred to a single computing power unit. Applications in case of failure are transferred from the non-working machine to the remaining, which, of course, affects the overall performance. The "Active - Active" clusters can exist only as dedicated systems on which low-priority tasks of type of office work support type cannot be launched. In addition, when building clusters with an active backup server, you can have fully duplicate servers with their own individual discs. At the same time, it is necessary to constantly copy data from the main server to the backup - this ensures that in the event of a failure, the backup server will have the right data. Since the data is fully duplicated, the client can have access to any server, which allows you to talk about load balancing in a similar cluster. In addition, the nodes of such a cluster can be separated by geographically, which makes the configuration of disaster resistant. This approach provides a very high level of availability, but has a number of the following drawbacks:

The need to constantly copy data (this means that part of computing and network resources will continuously spend on synchronization);

Even the fastest network interface between servers inside the cluster does not exclude delays in transmitting information, which ultimately can lead to desynchronization if one server has failed, and not all transactions made with its disk reflected on the second server disk.

In a cluster without separation of resources (Fig. 3) servers are connected to one disk array, but each of them controls with its set of disks. In the event of a malfunction on one of the nodes, the remaining server takes on control of its discs. This method eliminates the need for constant data synchronization between servers and thereby releases additional computing and network resources. But in this configuration, the discs become a single point of failure, therefore, in this case, drives are used using RAID technology.

Fig. 3. Building a cluster without shared resources.

In systems with full resource separation (Fig. 4) All servers in the cluster have simultaneous access to the same disk. This approach implies the presence of a carefully developed software that provides multiple access to one medium. As in the previous case, the discs here can be a single point of failure, therefore it is also desirable to use RAID arrays. In this embodiment, it disappears the need for constant data synchronization between servers. Thus, additional computing and network resources are released.

Fig. 4. Building a cluster with shared resources.

All program executed by a cluster can be conditionally divided into several categories. On any cluster node, you can run almost any conventional program. Moreover, one and the same program can be run on different cluster nodes. However, each copy of the program should use its own resource (file system), since the file system is secured by a specific node. In addition to the usual software for clusters, there are so-called truly cluster applications. Such programs would be distributed through the cluster nodes, and between the parts of the program operating on different nodes, interaction is organized. True cluster programs allow you to parallerate the load on the cluster. An intermediate position occupy applications designed to work in a cluster. Unlike true cluster programs, it is not used in them with explicit parallelism; In fact, the program is usual, but it can use some possibilities of the cluster, primarily associated with the migration of resources.

All cluster solutions on Microsoft platforms are oriented primarily to combat equipment and software failures. Special software is something that combines servers in clusters. Many modern corporate applications and the OS have built-in clustering support, but the uninterrupted functioning and transparency of the cluster can guarantee only the special level on the intermediate level. It is responsible:

For the coordinated work of all servers;

For permission arising in the conflict system,

Ensures the formation and reconfiguration of the cluster after failures;

Provides load distribution via cluster nodes;

Recognizes restoring the operation of failed servers on the available nodes (Failover - the migration procedure);

Monitors the state of hardware and software environments;

Allows you to run any application on the cluster without preliminary adaptation to the new hardware architecture.

Cluster software typically has several predetermined system recovery scripts, and can also provide the administrator with the ability to configure such scenarios. Recovery after failures can be supported both for the node as a whole and for individual components - applications, disk volumes, etc. This function is automatically initiated in the case of a system failure, and can also be run by the administrator if it, for example, must be disabled One of the nodes for reconfiguration.

Cluster solutions in modern computing systems besides increased reliability and speed, several additional requirements are presented:

They must provide a single external representation of the system,

High speed backup and data recovery,

Parallel access to the database,

Have the possibilities of transferring the load from emergency nodes to serviceable,

Have a high level of readiness configuration tools, guarantee recovery after an accident.

Of course, the use of several cluster nodes, which simultaneously refer to the same data, increases the complexity of the backup and subsequent recovery of information. The load transfer from the emergency node to the serviceable is the main mechanism for ensuring the continuous application of applications, subject to the optimal use of the cluster resources. For effective collaboration cluster systems and DBMS system should have distributed lock manager, providing a consistent change in the database when requested sequence from different cluster nodes. Configuring the cluster configuration with simultaneous provision of high availability of applications is a rather complicated process (this is due to the complexity of determining the rules for which those or other applications are transferred from the cluster emergency nodes to good). The cluster system is obliged to allow you to easily transfer applications from one cluster node to another, as well as restore an emergency application on another node. The system of the system is not required to know that it works with a cluster system, so the cluster should look like a single computer. It must have a single file system for all nodes, a single IP address and a single system kernel.

The most reliable are distributed clusters. Even the most reliable systems may fail if there is a fire, earthquake, flooding, or the attack of terrorists will occur. With a global scale of modern business, such events should not harm it, so the cluster may (or should) be distributed.

All presenters computer companies (Compaq, Dell, Hewlett-Packard, IBM, SUN Microsystems) offer their own cluster solutions. The leading positions in the UNIX-cluster segment takes IBM, which actively promotes its DB2 database, SUN actively promotes its Sun Cluster solution. One of the most active players (both by the number of platforms certified for clusters and by the variety of cluster solutions themselves) recognize the COMPAQ Corporation, which offered a practically a full range of clusters on Windows platforms for a department or a remote branch, for applications in the corporation infrastructure and large centers. Data processing. The COMPAQ TrueCluster Server cluster solution maximizes the current requirements for companies to such technology. New software allows, for example, to establish a database on multiple servers associated together. The need for such an association occurs, for example, if a large container is required or you need to reduce the idle time in the event of a failure on the server, which is achieved by transferring operations to another cluster server. This allows you to significantly reduce the cost of hardware platforms, making it economically justified building clusters from low-cost standard architecture servers even for relatively small enterprises. Compaq and Oracle actively cooperate in technology and business, which will create a more scalable, managed, reliable and cost-effective cluster database platform. In addition, Oracle began to cooperate with Dell and Sun Microsystems, which offer customers pre-configured and tested systems working with Oracle clustering. Dell, for example, supplies cluster software on the tested servers with Windows and Linux.

In the corporate systems market, clusters play one of the key roles. In many cases, cluster solutions simply have no worthy alternative. Real high availability and wide scalability of cluster information systems, allows them to successfully solve increasingly complex tasks, and with increasing needs, it is easy to increase the computing power of the platform with an acceptable cost-based enterprises.

Cluster (group of computers)

Load distribution clusters

The principle of their action is based on the distribution of requests through one or more input nodes that redirect them to the processing into the remaining, computing nodes. The initial purpose of such a cluster is productivity, however, methods that increase reliability are also used in them. Similar structures are called server farms. Software (software) can be both commercial (OpenVMS, MOSIX, Platform LSF HPC, Solaris Cluster, Moab Cluster Suite, Maui Cluster Scheduler) and free (OpenMosix, Sun Grid Engine, Linux Virtual Server).

Computing clusters

Clusters are used in computational purposes, in particular in scientific research. For computing clusters, significant indicators are the high performance of the processor in operations over floating point numbers (FLOPS) and low latency of the combining network, and less significant - the speed of I / O operations, which is more important for databases and Web services. Computing clusters make it possible to reduce the calculation time, compared to a single computer, breaking the task to parallel branches that are exchanged by the binding data. One of the typical configurations is a set of computers collected from publicly available components with installed on them. operating system Linux, and related Ethernet, MyRinet, InfiniBand network or other relatively inexpensive networks. This system is customary to be called the BEOWULF cluster. Specially allocate high-performance clusters (the English is indicated by abbreviation HPC Cluster. - HIGH-PERFORMANCE Computing Cluster). List of most powerful high-performance computers (also can be denoted by the abbreviation Hpc.) You can find in the world ranking Top500. Russia has a rating of the most powerful Computeors of the CIS.

Distributed Computing Systems (GRID)

Such systems are not considered clusters, but their principles are largely similar to cluster technology. They are also called grid systems. The main difference is the low availability of each node, that is, the inability to guarantee its operation at a specified point in time (the nodes are connected and disconnected during the operation), so the task must be divided into a number of processes independent from each other. Such a system, in contrast to clusters, is not similar to a single computer, and serves as a simplified means of distribution of calculations. Configuration instability, in this case, is compensated by a large number of nodes.

Cluster servers organized programmatically

Cluster systems occupy a worthy place in the list of the fastest, while significantly winning supercomputers in the price. For July 2008, the 7th place of the Top500 rating is the SGI Altix Ice 8200 cluster (Chippewa Falls, Wisconsin, USA).

A relatively cheap alternative to supercomputers are clusters based on the BEOWULF concepts, which are built from ordinary inexpensive computers based on free software. One of the practical examples of such a system is Stone Soupercomputer (UAC Ridge, Tennessee, USA,).

The largest cluster belonging to the private person (out of 1000 processors) was built by John Koza (John Koza).

History

The history of cluster creation is inextricably linked with early developments in the field of computer networks. One of the reasons for the emergence of high-speed communication between computers has become hopes for combining computing resources. In the early 1970s The TCP / IP protocol development team and the Xerox PARC laboratory are assigned standards of network interaction. The HYDRA operating system ("Hydra") appeared for PDP-11 computers made by DEC, created on this basis, the cluster was called C.MPP (Pittsburgh, pcs. Pennsylvania, USA,). However, only near the city, mechanisms were created, allowing to use the distribution of tasks and files through the network, for the most part these were developing in SunOS (BSD-based operating system from Sun Microsystems).

The first commercial cluster project was ArcNet, created by Datapoint in profit, he did not, and therefore the construction of clusters did not develop before the city, when DEC built its VAXCluster based on the VAX / VMS operating system. ArcNet and VaxCluster were designed not only for joint calculations, but also the sharing of the file system and the periphery, taking into account the preservation of the integrity and uniqueness of the data. VaxCluster (called now VMScluster) - is an integral component of the OpenVMS operating system using Alpha and Itanium processors.

Two other early cluster products that have received recognition include Tandem Hymalaya (, HA class and IBM S / 390 PARALLEL SYSPLEX (1994).

The history of clusters from ordinary personal computers is largely required to project Parallel Virtual Machine. In this software, to combine computers in a virtual supercomputer opened the possibility of instant creating clusters. As a result, the total performance of all cheap clusters created then overtook the sum of the capacities of "serious" commercial systems.

Creating clusters based on cheap personal computers united by a data network continued in the city of the American Aerospace Agency (NASA), then the development of the BEOWULF clusters specifically developed on the basis of this principle. The successes of such systems pushed the development of GRID networks, which existed since the creation of UNIX.

Software

A widely common tool for organizing an overly interaction is the MPI library supporting languages \u200b\u200band Fortran. It is used, for example, in the MM5 weather modeling program.

The Solaris operating system provides SOLARIS CLUSTER software, which is used to ensure high availability and user-breaking of servers running solaris. For OpenSolaris there is a realization with open source entitled OpenSolaris Ha Cluster..

Multiple programs are popular among GNU / Linux users:

dISTCC, MPICH, etc. - Specialized means for parallelizing programs. DISTCC allows parallel compilation in the GNU Compiler Collection.
Linux Virtual Server, Linux-HA - node software for distributing queries between computing servers.
MOSIX, OpenMosix, Kerrighed, OpenSSI - Full-featured cluster environments built into the kernel automatically distributing tasks between homogeneous nodes. OpenSSI, OpenMosix and Kerrighed create between nodes.

Cluster mechanisms are planned to be embedded in the Dragonfly BSD kernel, branched in 2003 from FreeBSD 4.8. In the distant plans also turning it to wednesday of a single operating system.

Microsoft is manufactured by a HA cluster for the Windows operating system. It is believed that it is based on the Digital Equipment Corporation technology, supports up to 16 (since 2010) nodes in a cluster, as well as work on the SAN network (Storage Area Network). The API set is used to support distributed applications, there are blanks for working with programs that do not provide for work in the cluster.

Windows Compute Cluster Server 2003 (CCS) released in June 2006 is designed for high-tech applications that require cluster computing. The publication is designed for deployment on a variety of computers that are collected in a cluster to achieve a supercomputer capacities. Each cluster on Windows Compute Cluster Server consists of one or more control machines that distribute tasks and several subordinate machines performing the main operation. In November 2008, Windows HPC Server 2008 is designed to replace Windows Compute Cluster Server 2003.

Department 29 "Managing Intelligent systems"

Abstract on the topic:

Cluster systems

Performed:

student group K9-292.

Popov I.A.

Moscow 2001.

1. Introduction

2. Basic classes of modern parallel computers

3. Cluster architecture of parallel computers

4. Goals of creating cluster systems

5. Failover clusters

6. High-performance clusters

7. Project BEOWULF.

8. Conclusion

9. Literature

Introduction

Development of multiprocessor computing systems

The development of traditional architectures for the construction of computing systems, such as SMP, MPP, vector parallel systems goes fairly rapidly. Productivity increases, reliability and fault tolerance increases. However, these architectures have one disadvantage - the cost of the systems being created, sometimes inaccessible for many users of such systems - educational and research organizations. It has a very high due to the complication of hardware and software components of the system, which are required to provide such rates of productivity growth. However, the need for computational resources is currently very high in many areas of scientific and practical activities and lacks the resources of traditional supercomputer systems.

Cluster systems arose as a cheaper solution to the problem of lack of computing resources, and are based on the use of widespread and relatively cheap technologies in their architecture, hardware and software, such as PC, Ethernet, Linux, etc. The use of mass technology in cluster systems has become possible due to significant progress in the development of components of conventional computing systems, such as central processors, operating systems, communication media.

Since cluster systems are architecturally, the development of systems with mass parallelism MPP, the main role in their development is progress in the field of network technologies. To date, inexpensive, but effective communication solutions appeared. This predetermined the rapid appearance and development of cluster computing systems. Other factors also contributed to the progress of the development of cluster systems.

The performance of personal computers based on Intel processors has also increased significantly in recent years. Such computers began to create serious competition to work stations based on more expensive and powerful RISC processors. At the same time, it began to acquire increasing popularity of the Linux OS - free of charge the UNIX version distributed. At the same time, in scientific organizations and universities, where most cluster systems are developed, as a rule, there are specialists in Linux OS.

The high degree of development of cluster systems on date shows the fact that in the list of the most powerful supercomputers of the world Top500 - 11 cluster installations are listed.

Basic classes of modern parallel computers

Cluster systems are the development of parallel systems. To make the place of cluster systems among other types of parallel architectures of computing systems, you need to give their classification. Parallel systems can be classified by various criteria.

From a hardware point of view, the main parameter for the classification of parallel computers is the availability of general (SMP) or distributed memory (MPP). Something the average between SMP and MPP is numa architecture, where memory is physically distributed, but is logically accessible.

Symmetric multiprocessor systems

SMP The system consists of several homogeneous processors and a general memory array. One of the frequently used SMP approaches to the formation of a scalable, public memory system, consists in a uniform memory access organization by organizing a scalable memory processor channel:

Each memory access operation is interpreted as a transaction over the processor memory bus. Cache coherence is maintained by hardware.

In SMP, each processor has at least one own cache memory (and maybe several).

It can be said that the SMP system is one computer with several equal processors. Everything else is in one instance: one memory, one I / O subsystem, one operating system. The word "equal" means that each processor can do everything that any other. Each processor has access to all memory, can perform any input / output operation, interrupt other processors, etc.

The disadvantage of this architecture is the need to organize the channel processor processors with a very high bandwidth.

Massive-parallel systems

Massive-parallel MPP system consists of homogeneous computing nodes, including:

one or more central processors (usually RISC)
local memory (direct access to memory of other nodes is impossible)
communication processor or network adapter
hard drives and / or other devices in / in

Special I / O components and control nodes can be added to the system. Nodes are associated through some communication environment (high-speed network, switch, etc.)

Systems with non-uniform access to the NUMA memory

Numa (Nonuniform Memory Access) Unlike the usual SMP of the shared memory architecture is several separate processors, each of which, in addition to its own cache, also has local memory:

In such an architecture, the processor and memory modules are closely integrated, therefore, the speed of access to local memory is much higher than to the memory of the "adjacent" processor. I / O subsystems can be part of each node or consolidated on the selected I / O nodes. If the cache coherence is maintained throughout the system, then such architecture is called CC-NUMA.

The easiest way to describe the NUMA system, presenting a large SMP system, divided into several parts, these parts are associated with the communication highway connected to the system tires, and each part includes its own basic memory and an input / output subsystem. This is Numa: Large SMP, broken into a set of smaller and simple SMPs. The main problem NUMA is to ensure cache coherence. The equipment allows you to work with all separate devices of the main memory of the components of the system (called usually nodes) as with a single giant memory.

Cluster architecture

Consider the location of the cluster architecture of computing systems in this classification.

The cluster is a tied set of full-fledged computers used as a single resource. Under the concept of a "full-fledged computer" is a complete computer system with all that is required for its operation, including processors, memory, I / O subsystem, as well as the operating system, subsystems, applications, etc. Usually, personal computers or parallel systems that can have SMP architectures and even Numa are suitable for this. Clusters are weak-lost systems, node links are used by one of the standard network technologies (Fast / Gigabit Ethernet, MyRINET) on the basis of the bus architecture or switch. Therefore, they are cheaper in the construction of MPP architecture modification.

Clustered architecture of parallel computers

General principles

As already mentioned earlier, the computing cluster is a set of computers combined within a certain network to solve one task (Fig. 3), which is submitted for the user as a single resource. Such a cluster concept was first suggested and implemented in the early 80s corporation Digital Equipment, which to this day develops this technology

The concept of "Unified Resource" means the availability of software that gives you the opportunity to users, administrators and application programs to assume that there is only one entity with which they work is a cluster. For example, the cluster packet processing system allows you to send a task for the cluster processing, and not some separate computer. A more complex example are database systems. Almost all manufacturers of database systems have versions operating in parallel mode on several cluster machines. As a result of the application using the database, do not take care of where their work is performed. The DBMS is responsible for synchronizing parallel action and maintain the integrity of the database.

The computers forming the cluster are the so-called cluster nodes - are always relatively independent, which allows stopping or turning off any of them to conduct preventive work or installing additional equipment without disrupting the performance of the entire cluster.

As computing nodes in the cluster, single-processor personal computers are usually used, two or four-processor SMP servers. Each node is running its copy of the operating system, which is most often used by standard operating systems: Linux, NT, Solaris, etc. The composition and power of the nodes can vary even within the same cluster, allowing the ability to create inhomogeneous systems. The choice of a particular communication medium is determined by many factors: the characteristics of the class of solved tasks, the need to subsequent cluster expansion, etc. It is possible to enable specialized computers in the configuration, such as a file server, and, as a rule, the possibility of remote access to the cluster via the Internet is provided.

From the definition of the architecture of cluster systems it follows that it includes a very wide range of systems. Considering the extreme dots, the cluster can be considered as a pair of PC connected by the local 10-megabit network Ethernet and the computing system created as part of the CPLANT project in the Sandia National Laboratory: 1400 workstations based on Alpha processors associated with a high-speed MYRINET network.

Thus it can be seen that there are a lot of different cluster construction options. At the same time, the used communication technologies and standards are of great importance in the cluster architecture. They largely determine the range of tasks, for which clusters built on the basis of these technologies can be used.

Communication technologies for building clusters

Clusters can stand both based on specialized high-speed data transmission tires and based on mass network technologies. Among the mass communication standards is now most often used by the Ethernet network or more of its productive option - Fast Ethernet, as a rule, on the basis of switches. However, large overhead reports for sending messages within Fast Ethernet lead to serious restrictions on the range of tasks that can be effectively solved on such a cluster. If a cluster requires great performance and versatility, it is necessary to use more speed and specialized technologies. These include SCI, MyRinet, Clan, Servernet, etc. Comparative characteristics of these technologies are given in
Table 1.

			Servernet	Fast Ethernet
Latence (MPI)
Bandwidth (MPI)				180 MB / C
Bandwidth (hardware)	400 MB / C	160 MB / C	150 MB / C		12.5 MB / C
Implementation MPI		HPVM, MPICH-GM, etc.

Table 1.

The performance of communication networks in cluster systems is determined by several numerical characteristics. The main characteristics are two: latency - the initial delay time when sending messages and network bandwidth, which determines the speed of information transfer through communication channels. At the same time, there are not so many peak characteristics stated in the standard as real, achieved at the level of user applications, for example, at the MPI application level. In particular, after challenged by the user of sending a message send (), the message will consistently pass through a whole set of layers determined by the features of the software organization and equipment, before leaving the processor - therefore there is a significant scheduling on the standards of latency values. The presence of latency leads to the fact that the maximum transfer rate over the network cannot be achieved on messages with a small length.

The speed of data transmission over the network within the Fast Ethernet and Scalable COHERENT interface (SCI) technology depends on the message length. For Fast Ethernet, a large variety of latency is characterized - 160-180 μs, while latency for SCI is a value of about 5.6 μs. Maximum transfer rate for these same technologies 10 MB / C and 80 MB / s, respectively.

Objectives to create cluster systems

The developers of cluster system architectures have checked various purposes when creating them. The first was Digital Equipment with VAX / VMS clusters. The purpose of creating this car was to improve the reliability of the system, ensuring high availability and fault tolerance of the system. Currently, there are many similar system architecture from other manufacturers.

Another purpose of creating cluster systems is to create cheap high-performance parallel computing systems. One of the first projects that gave a name to a whole class of parallel systems - the BEOWULF cluster - originated in the center of the NASA Goddard Space Flight Center to support the necessary computing resources of the Earth and Space Sciences project. The BEOWULF project began in the summer of 1994, and soon a 16-processing cluster was assembled on Intel 486DX4 / 100 MHz processors. Each node was installed 16 MB of RAM and 3 network Ethernet adapters. This system was very successful in relation to price / performance, so such architecture began to develop and widely use in other scientific organizations and institutes.

For each class of clusters, the architectures and hardware used are characteristic. Consider them in more detail.

Failover clusters

Principles of construction

To ensure reliability and fault tolerance of computing systems, many different hardware and software solutions are applied. For example, in the system, all subjects subject to failures - power supplies, processors, operational and external memory can be duplicated. Such fault tolerant systems with reservation of components are used to solve problems in which there is not enough reliability of conventional computing systems as evaluated in currently The probability of trouble-free operation is 99%. These tasks requires a probability of 99.999% and higher. Such reliability can be achieved by using different methods of increasing fault tolerance. Depending on the level of readiness of the computing system, four types of reliability are allocated to use:

Readiness level,%	MAX. downtime	System type
	3.5 days per year	Conventional)
	8.5 hours a year	High accuracy (High Availability)
	1 hour per year	FAULT RESILIENT)
	5 minutes a year	FAULT TOLERANT)

Table 2.

In contrast to fault tolerant systems with excess components, as well as various variants of multiprocessing, clusters combine relatively independent machines, each of which can be stopped for prevention or reconfiguration, without breaking the working capacity of the cluster as a whole. High cluster performance and minimizing application downtime is achieved due to the fact that:

in the event of a power failure on one of the nodes, the application continues to function or automatically restarts on other cluster nodes;
the failure of one of the nodes (or several) will not lead to the collapse of the entire cluster system;
preventive I. repair work, Reconfiguration or change of software versions, as a rule, can be carried out in the cluster nodes alternately, without interrupting the operations of other nodes.

An integral part of the cluster is a special software that, in fact, and solves the problem of recovery of the node in the event of a failure, and also solves other tasks. Cluster software typically has several predetermined system recovery scripts, and can also provide the administrator with the ability to configure such scenarios. Recovery after failures can be supported both for the node as a whole and for individual components - applications, disk volumes, etc. This feature is automatically initiated in the case of a system failure, and can also be run by the administrator, if it, for example, it is necessary to turn off one of the nodes for reconfiguration.

Clusters may have a shared memory on external disksusually on the RAID disk array. The RAID disk array is the Server I / O Subsystem for Large Data Storage. IN rAID arrays A significant number of discs relative to low capacity are used to store large amounts of data, as well as to provide higher reliability and redundancy. A similar array is perceived by a computer as a single logical device.

Recovery after failures can be supported both for the node as a whole and for individual components - applications, disk volumes, etc. This feature is automatically initiated in the case of a system failure, and can also be run by the administrator, if it, for example, it is necessary to turn off one of the nodes for reconfiguration.

Cluster nodes control each other's performance and exchange specific "cluster" information, for example, a cluster configuration, as well as transmit data between the shared drives and coordinate their use. Help control is carried out using a special signal that cluster nodes transmit each other in order to confirm their normal functioning. Termination of signals from one of the nodes signals cluster software About the crash and the need to redistribute the load on the remaining nodes. As an example, consider a fault-tolerant cluster VAX / VMS.

VAX / VMS cluster

Dec's first announced the concept of a cluster system in 1983, determining it as a group of combined computing machines, which are a single information processing unit. Essentially, the VAX cluster is a weakly coupled multifaceric system with a common external memory that provides a single mechanism of management and administration.

The VAX cluster has the following properties:

Separation of resources. VAX computers in a cluster can share access to shared tape and disk drives. All VAX computers in the cluster can access separate data files as local.

High readiness. If one of the VAX computers fails, the tasks of its users can automatically be transferred to another cluster computer. If there are several HSC controllers in the system and one of them refuses, other HSC controllers automatically pick up its operation.

High throughput . A number of application systems can use the possibility of parallel execution of tasks on multiple cluster computers.

Convenience of system maintenance . Shared databases can be served from a single place. Application programs can be installed only once on common cluster discs and divided between all cluster computers.

Extensibility . An increase in the cluster computing power is achieved by connecting additional VAX computers to it. Additional drives on magnetic disks and magnetic tapes become available to all computers included in the cluster.

The operation of the VAX cluster is determined by two main components. The first component is a high-speed communication mechanism, and the second is a system software that provides customers with transparent access to system service. Physically connected inside the cluster is implemented using three different tire technologies with various characteristics Performance.

Basic communication methods in the VAX cluster are presented in Fig. four.

Fig. 4 VAX / VMS cluster

CI Computer bus (Computer InterConnect) runs with a speed of 70 Mbps and is used to connect VAX computers and HSC controllers using the Star Coupler switch. Each CI connection has double redundant lines, two for transmission and two to receive, using CSMA basic technology, which uses specific delay for this node to eliminate collisions. Maximum length CI communication is 45 meters. Star Coupler's star-shaped switch can support connecting up to 32 CI tires, each of which is designed to connect the VAX computer or the HSC controller. The HSC controller is an intelligent device that controls the operation of disk and tape drives.

VAX computers can be combined into a cluster also via a local network.

Ethernet using Ni - Network InterConnect (so-called local VAX clusters), but the performance of such systems is relatively low due to the need to share the bandwidth of the Ethernet network between the cluster computers and other network clients.

Also clusters can cost DSSI tires (Digital Storage System InterConnect). On the DSSI bus can be combined to four computers VAX lower and middle class. Each computer can support multiple DSSI adapters. A separate DSSI bus operates with a speed of 4 MB / s (32 Mbps) and allows you to connect up to 8 devices. The following types of devices are supported: DSSI System Adapter, RF Series Disc Controller and TF Series Ribbon Controller. DSSI limits the distance between the nodes in the cluster of 25 meters.

System Software Vax Clusters

To guarantee the correct interaction of processors with each other when accessing shared resources, such as discs, DEC uses a distributed DLM Lock Manager (DISTRIBUTED LOCK MANAGER). Highly an important function DLM is to provide a coherent state of disk caches for I / O operations of the operating system and application programs. For example, the DLM relational applications are responsible for maintaining the agreed state between the database buffers on various cluster computers.

The task of maintaining the coherence of the I / O cache memory between the processors in the cluster is similar to the problem of maintaining cache coherence in a strongly connected multiprocessor system built on the basis of a certain tire. Data blocks can simultaneously appear in several caches and if one processor modifies one of these copies, other existing copies do not reflect the current state of the data block. The concept of capturing a block (block ownership) is one of the ways to manage such situations. Before the unit can be modified by the block ownership.

Working with DLM is associated with significant overhead costs. Overheads in the Vax / VMS environment can be large requiring transmission up to six messages over the CI bus for one I / O operation. Overhead costs can reach the values \u200b\u200bof 20% for each processor in the cluster.

High-performance clusters

Principles of construction

The architecture of high-performance clusters appeared as the development of the principles of constructing MPP systems on less productive and mass components controlled by the overall use of general purpose. Clusters as well as MPP systems consist of weakly coupled nodes, which can be both homogeneous and, in contrast to MPP, various or heterogeneous. Special attention in the design of a high-performance cluster architeecture is paid to ensuring the high efficiency of the communication tire connecting the cluster nodes. Since there are often massive relatively low-performance tires in clusters, it is necessary to take a number of measures to exclude their low bandwidth on cluster performance and the organization of effective parallelization in the cluster. For example, the bandwidth of one of the most high-speed Fast Ethernet technologies to orders is lower than that of interconnections in modern supercomputers of MRR architecture.

To solve the low product performance problems, several methods are used:

The cluster is divided into several segments, within which the nodes are connected by a high-performance tire of the MYRINET type, and the connection between the nodes of different segments is carried out by low-performance networks of type Ethernet / Fast Ethernet. This allows you to reduce the cost of clusters to significantly increase the performance of such clusters while solving tasks with intensive data exchange between processes.

The use of the so-called "trunking", i.e. Combining multiple channels Fast Ethernet to one common high-speed channel connecting multiple switches. The obvious disadvantage of this approach is the "loss" of portions of ports involved in interconnection of switches.

To improve performance, special information exchange protocols on such networks are created that allow you to more efficiently use channel bandwidth and remove some restrictions overlapped with standard protocols (TCP / IP, IPX). This method is often used in the Systems of the BEOWULF class.

The main quality that should have a high-performance cluster will be horizontal scalability, since one of the main advantages that the cluster architecture provides the ability to increase power existing system By simply adding new nodes into the system. Moreover, the increase in power occurs almost in proportion to the power of the added resources and can be carried out without stopping the system during its operation. In systems with another architecture (in particular, MPP), only vertical scalability is usually possible: adding memory, increasing the number of processors in multiprocessor systems or add new adapters or disks. It allows you to temporarily improve system performance. However, the system will be set to the maximum supported number of memory, processors or disks, system resources will be exhausted, and to increase productivity, you will have to create a new system or significantly process the old. The cluster system also admits vertical scalability. Thus, by vertical and horizontal scaling, the cluster model provides greater flexibility and simplicity of increasing system performance.

Project BEOWULF.

BEOWULF is a Scandinavian epic that tells about the events of the VII - the first third of the 6th century, whose participant is the hero of the same name, which glorified himself in battles.

One example of the implementation of the cluster system of such a structure is BEOWULF clusters. The BEOWULF project united about one and a half dozen organizations (mainly universities) in the United States. The leading project developers are specialists of the NASA agency. In this form of clusters, you can allocate the following main features:

The BEOWULF cluster consists of several separate nodes united in general Network, total resources cluster nodes are not used;

The optimal is considered to build clusters based on two-processor SMP systems;

To reduce overhead of interaction between nodes, a full-duplex 100 MB Fast Ethernet is used (SCI is less common), create multiple network segments or connect the cluster nodes through the switch;

As software, Linux is used, and freely distributed communication libraries (PVM and MPI);

History of Project BEOWULF.

The project began in the summer of 1994 at the NASA Space Center - Goddard Space Flight Center (GSFC), more precisely in the CESDIS created on its basis (Center of Excellence In Space Data and Information Sciences).

The first BEOWULF cluster was created on the basis of computers Intel. architecture under Linux OS. It was a system consisting of 16 nodes (on processors 486DX4 / 100MHz, 16MB memory and 3 network adapter On each node, 3 "parallel" Ethernet cables of 10MBIT). It was created as a computing resource of the project "Earth and Space Sciences Project" (ESS).

Next in GSFC and other NASA divisions, other, more powerful clusters were collected. For example, the Highly-Parallel Integrated Virtual Environment) contains 64 nodes of 2 Pentium Pro / 200MHz and 4GB memory processors in each, 5 Fast Ethernet switches. The total cost of this cluster is approximately $ 210 thousand. The BEOWULF project has developed a number of high-performance and specialized network drivers (In particular, the driver for using multiple Ethernet channels simultaneously).

Architecture BEOWULF.

Cluster nodes.

This is either single-processor PC, or SMP servers with a small number of processors (2-4, possibly up to 6). For some reason, the optimal is considered to build clusters based on two-processor systems, despite the fact that in this case the cluster setting will be somewhat more complicated (mainly because it is not allowed relatively inexpensive motherboards for 2 Pentium II / III processors). It is worth installing for each RAM 64-128MB node (for two-processor systems 64-256Mb).

One of the cars should be highlighted as a central (head) where to install a sufficiently large hard drive, a more powerful processor and more memory is possible than the rest (working) nodes. It makes sense to provide (protected) the connection of this machine with the outside world.

When the working components are configured, it is quite possible to abandon hard drives - these nodes will download the OS via the network from the central machine, which, in addition to saving funds, allows you to configure the OS and everything you need only 1 time (on the central machine). If these nodes are not simultaneously used as custom jobs, there is no need to install video cards and monitors on them. It is possible to install nodes in the rack (rackmounting), which will reduce the place occupied by nodes, but will cost somewhat more expensive.

It is possible to organize clusters based on existing networks of workstations, i.e. User workstations can be used as cluster nodes at night and on weekends. Systems of this type are sometimes called COW (CLUSTER OF WORKSTATIONS).

The number of nodes should be chosen based on the necessary computing resources and affordable. financial means. It should be understood that with a large number of nodes, it will also have to establish more complex and expensive network equipment.

The main types of local networks involved in the BEOWULF project are Gigabit Ethernet, Fast Ethernet and 100-VG Anylan. In the simplest case, an Ethernet segment is used (10Mbit / SEC on twisted pair). However, the cheapness of such a network, due to the collisions, turns into large overhead costs for interprocessor exchanges; And the good performance of such a cluster should be expected only on tasks with a very simple parallel structure and with very rare interactions between processes (for example, busting options).

To obtain good interprocessor exchange performance, full-duplex Fast Ethernet is used per 100mbit / sec. At the same time, to reduce the number of collisions or set several "parallel" Ethernet segments, or connect the cluster nodes through the switch (Switch).

More expensive, but also a popular option is to use MYRINET type switches (1.28Gbit / SEC, full duplex).

Less popular, but also actually used when building clusters with network technologies are Clan, SCI and Gigabit Ethernet technologies.

Sometimes for communication between the cluster nodes, several telecommunication channels are used in parallel - the so-called "Channel Bonding" (Channel Bonding), which is commonly used for FAST Ethernet technology. In this case, each node is connected to the Fast Ethernet switch with more than one channel. To achieve this, the nodes are equipped with either multiple network cards or Fast Ethernet multiport boards. The use of channel binding in nodes running Linux software allows you to organize a uniform distribution of the reception / transmission load between the corresponding channels.

System System

Operating system. The Linux system is usually used in versions specifically optimized for distributed parallel calculations. The Linux 2.0 kernel was refined. In the process of building clusters, it turned out that standard drivers Network devices in Linux are very ineffective. Therefore, new drivers have been developed, first of all for Fast Ethernet and Gigabit Ethernet networks, and the possibility of a logical association of several parallel network connections between personal computers (Similar to the hardware binding of channels), which allows cheap local networks with low bandwidth, to build a network with high total bandwidth.

As in any cluster, there is a copy of the OS kernel on each cluster node. Thanks to the finalization, the uniqueness of the processes identifiers within the entire cluster, and not individual nodes, are ensured.

Communication libraries. The most common parallel programming interface in the message transfer model is MPI. Recommended Free Implementation MPI - MPICH package developed in Argon National Laboratory. For clusters based on the MYRINET switcher, a HPVM system has been developed, which also includes the implementation of MPI.

For an effective organization of parallelism within one SMP-Systems, two options are possible:

For each processor in the SMP machine generates a separate MPI process. MPI processes inside this system exchange messages via shared memory (you need to configure MPICH accordingly).
Only one MPI process starts on each machine. Inside each MPI process, parallelization is made in the "Shared Memory" model, for example, using the OpenMP directives.

After installing the MPI implementation makes sense to test the real performance of network forwarding.

In addition to MPI, there are other libraries and parallel programming systems that can be used on clusters.

Example of the implementation of the BEOWULF cluster - Avalon

In 1998, in the Los Alamos National Laboratory Astrophysicik Michael Warren and other scientists from a group of theoretical astrophysics built an Avalon supercomputer, which is a BEOWULF -Claster based on DEC Alpha / 533MHz processors. Avalon originally consisted of 68 processors, then it was extended to 140. In each node, 256MB of RAM, an EIDE-hard drive on 3.2GB, a network adapter from Kingston (the total cost of the node is $ 1700). The nodes are connected using 46-36-port switches of Fast Ethernet and the "in the center" 12-port switch Gigabit Ethernet from 3Com.

The total cost of Avalon is $ 313 thousand, and its LINPACK performance (47.7 GFLOPS) allowed him to take the 114th place in the 12th edition of the Top500 list (next to the 152 IBM SP2 processor system). The 70-processor configuration of Avalon on many tests has shown the same performance as the 64-processor SGI ORIGIN2000 / 195MHZ processor system cost exceeds $ 1 million.

Currently, Avalon is actively used in astrophysical, molecular and other scientific calculations. At the SC conference "98, Avalon's creators presented a report entitled" Avalon: An Alpha / Linux Cluster Achieves 10 GFlops for $ 150k "and deserved a prize in terms of price / performance (" 1998 Gordon Bell Price / Performance Prize ").

Conclusion

The leading manufacturers of microprocessors: Sun Microsystems, Dell and IBM hold the same point of view to the future of supercomputers industry: to replace individual, independent supercomputers must come a group of high-performance servers combined into a cluster. Already today, distributed cluster systems are ahead of modern classic supercomputers in terms of performance: the most powerful computer in the world - IBM ASCI WHITE - has a capacity of 12 teraflops, network performance [Email Protected] It is estimated at about 15 teraflops. At the same time, IBM ASCI WHITE was sold for $ 110 million, and in the entire history of existence [Email Protected] It was spent about 500 thousand dollars.

Literature

2. http://www.beowulf.com.

3. http://newton.gsfc.nasa.gov/thehive/

4. LOBOS, http://www.lobos.nih.gov.

5. http://parallel.ru/news/kentucky_klat2.html.

6. http://parlell.ru/news/anl_chibacity.html

7. http://parlell.ru/cluster/

8. http://www.ptc.spbu.ru.

MIMD computers

Mimd computer has N. processors independently performing N. command streams and processing N. Data streams. Each processor operates under the control of its own stream of commands, that is, the MIMD computer can perform completely different programs in parallel.

MIMD architecture is further classified depending on the physical memory organization, that is, whether the processor has its own local memory and refers to other memory blocks using the switching network, or the commuting network connects all processors to public memory. Based on the organization of memory, the following types of parallel architectures distinguish:

Computers with distributed memory (Distributed Memory.)
The processor can access local memory, can send and receive messages transmitted over the network connecting processors. Messages are used to communicate between processors or, equivalent to reading and writing remote memory blocks. In an idealized network, the cost of sending a message between two network nodes does not depend on both the location of both nodes and the network traffic, but depends on the message length.

Computers with general (shared) memory (True Shared Memory.)
All processors are jointly referred to a common memory, usually through a tire or hierarchy. In an idealized PRAM (Parallel Random Access Machine - a parallel machine with arbitrary access), a model, often used in theoretical studies of parallel algorithms, any processor can access any memory cell in the same time. In practice, the scalability of this architecture usually leads to some form of a memory hierarchy. The frequency of access to the overall memory can be reduced by saving copies of frequently used data in the cache associated with each processor. Access to this Cash memory is much faster than directly access to the overall memory.

Computers with virtual shared (shared) memory (Virtual Shared Memory.)
The overall memory is missing. Each processor has its own local memory and can access local memory of other processors using the "global address". If the "global address" indicates not local memory, then memory access is implemented using messages sent by the communication network.

An example of machines with shared memory can be:

Sun Microsystems (multiprocessor workstations)
Silicon Graphics Challenge (multiprocessor workstations)
Sequent Symmetry.
Convex
Cray 6400.

The following computers belong to the class of machines with distributed memory.

IBM-SP1 / SP2
Parsytec GC.
CM5 (Thinking Machine Corporation)
Cray T3d.
Paragon (Intel Corp.)
ncube.
Meiko CS-2
AVX (ALEX PARALLEL COMPUTERS)
IMS B008.

MIMD architecture with distributed memory can also be classified on the bandwidth of the commuting network. For example, in the architecture in which pairs from the processor and memory module (processor element) are connected by a network with topologies schgtka Each processor has the same number of network connections, regardless of the number of computer processors. The total bandwidth of such a network grows linearly relative to the number of processors. On the other hand in architecture having a network with topologies hyperkub The number of processor connections to the network is a logarithmic function from the number of processors, and the network bandwidth increases faster than linearly in relation to the number of processors. In topology clique Each processor must be connected to all other processors.

Network with topology 2D schgtka (tOR)

Network with topology 2D tOR

Network with topology clique

National Center for Supercomputer Applications (University PC. Illinois, Urbana-Champaign)

MPI: The Message Passing Interface

The name "message transmission interface", speaks for itself. This is a well-standardized mechanism for building parallel programs in the messaging model. There are standard "bindings" MPI to Languages \u200b\u200bC / C ++, Fortran 77/90. There are free and commercial implementations for almost all supercomputer platforms, as well as for networks of UNIX and Windows NT workstations. Currently, MPI is the most widely used and dynamically developing interface from its class.

BEOWULF - clusters based on Linux OS

Mikhail Kuzminsky

"Open systems"

On the threshold of millennia, we have every chance of witnessing the monopolization of the computer industry, which can cover both microprocessors and operating systems. Of course, we are talking about microprocessors from Intel (Merced threatens to outset RISC architecture processors) and OS from Microsoft.

In both cases, success is largely determined by the power of the marketing machine, and not just the "consumer" properties of the products produced. In my opinion, the computer community has not yet realized the scale of possible consequences.

Some experts compare the potential monopolization of the computer market with the IBM monopoly domination observed in the 70s - both in the area of \u200b\u200bmainframes and operating systems. I for a long time I work with this technique and as the Unix is \u200b\u200bdistributed in our country, it is increasingly aware of many advantages of the IBM MVS operating system. Nevertheless, I share a common point of view that such a monopoly did not contribute to the acceleration of progress.

Western universities, who at one time among the first passed to the use of UNIX, still in their promising developments rely on this system, and Linux is becoming increasingly elected as a platform. This article is dedicated to one of the instructive academic developments.

Linux as a social phenomenon

We are no longer surprised by the fact that Linux has shown a noticeable phenomenon of computer life. In combination with the richest set of freely distributed software GNU, this operating system has become extremely popular with non-commercial users as we have and abroad. Its popularity is increasing. Linux versions exist not only for the Intel X86 platform, but also for others processor architectures, including Dec Alrha, and are widely used for Internet applications, as well as the tasks of the estimated nature. In short, Linux became a kind of "people's operating system". Almost, however, say that Linux has no weak places; One of them is insufficient support for SMR-architectures.

The cheapest way to build computer resources, including computing power, is to build a cluster. Massive-parallel supercomputers with physically and logically distributed operational memory can also be considered as peculiar clusters. The most vivid example of such an architecture is the famous IBM SP2 computer.

The whole question is that the computers (nodes) in the cluster binds. In the "real" supercomputers, specialized and therefore expensive equipment designed to provide high bandwidth. In clusters, as a rule, ordinary network standards are applied - Ethernet, FDDI, ATM or HIRI.

Cluster technologies using the Linux operating system began to develop several years ago and became available long before the Wolfrack appears for Windows NT. So in the mid-1990s, the BEOWULF project originated.

Hero of the epic poem

Beowulf is a Scandinavian epic that tells about the events of the VII - the first third of the 7th century, the participant of which is the hero of the same name, who glorified himself in battles. It is not known whether the authors of the project were conceived, with whom now will fight BEOWULF (probably with Windows NT?), However, the heroic image made it possible to combine about one and a half dozens of organizations in the consortium (mainly universities) in the United States. It cannot be said that supercomputer centers dominate among the project participants, however, Loki and Megalon clusters are installed in the world famous high-performance computing centers such as Los Alamos and the Sandia Laboratory of the US Department of Energy; The leading project developers are specialists of the NASA agency. In general, without exception, the clusters created by the project participants receive loud names.

Besides BEOWULF, one more close cluster technology is known - now. In NOW, personal computers usually contain information about themselves and tasks themselves, and the duties of the system administrator of such a cluster include the formation of this information. BEOWULF clusters in this regard (that is, from the point of view of the system administrator) is simpler: there individual nodes do not know about the cluster configuration. Only one selected node contains configuration information; And only he has a connection on the network with the outside world. All other cluster nodes are combined with a local network, and only the "thin bridge" from the control unit with the outer world is connected.

Nodes in BEOWULF technology are PC motherboards. Usually local hard drives are also involved in the nodes. Standard types of local networks are used to connect nodes. We will consider this question below, first we will stop on the software.

Its foundation in BEOWULF is the usual commercially available Linux OS, which can be purchased on the CD-ROM. At first, the majority of the project participants focused on CDs published by Slackware, and now preferences are preferred by the version of RedHat.

In the usual Linux OS, you can install well-known means of parallelization in the messaging model (Lam Mri 6.1, PVM 3.3.11 and others). You can also use the R-Threads standard and the standard means of interprocessor interaction, which are included in any UNIX SYSTEM V. As part of the BEOWULF project, serious additional developments were performed.

First of all, it should be noted the refinement of the Linux 2.0 kernel. In the process of building clusters, it turned out that the standard network device drivers in Linux are very ineffective. Therefore, new drivers were developed (by most developments - Donald Becker), first of all for FAST Ethernet and Gigabit Ethernet networks, and it is possible to logically combine several parallel network connections between personal computers, which allows from cheap local networks with more than modest speed. , build a network with high cumulative bandwidth.

As in every cluster, its own copy of the OS kernel lives in each node. Thanks to the finalization, the uniqueness of the processes identifiers within the entire cluster, and not individual nodes, as well as the "remote delivery" of the Linux OS signals are ensured.

In addition, it is necessary to mark the downloads on the network (NetBooting) when working with motherboard Intel PR 440FX, and they can also be used to work with other motherboards equipped with AMI BIOS.

Very interesting features provide network virtual memory mechanisms or DSM shared distributed memory (DISTRIBUTED SHARED MEMORY), allowing you to create a specific "illusion" of the overall memory of the nodes.

Network - delicate matter

Since to parallelize supercomputer applications in general, and cluster in particular, high bandwidth and low delays in messaging between nodes are needed, network characteristics become parameters that define the cluster performance. The choice of microprocessors for nodes is obvious - these are standard Intel production processors; But with the topology of the cluster, network type and network circuit boards, you can experiment. It is in this area that the main research was carried out.

When analyzing the various PC network cards presented today on the market, special attention was paid to such characteristics as effective broadcast support (Multicasting), supporting work with large sizes packages, etc. The main types of local networks involved in the BEOWULF project, - This is Gigabit Ethernet, Fast Ethernet and 100-VG Anylan. (The capabilities of ATM technology were also actively studied, but as far as the author is known, it was done outside the framework of this project.)

How to collect a supercomputer

After analyzing the results of the work performed in the BEOWULF project, you can come to the following conclusion: The solutions found allow you to independently assemble a high-performance cluster based on standard components for PC components and use the usual software. Among the largest instances, it is impossible not to mark the 50-node cluster in CESDIS, which includes 40 data processing nodes (based on single and two-processor Rentium РRO / 200 MHz) and 10 scaling nodes (dual-processor board of the Rentium Р / 166 MHz). The ratio of cost / peak performance in such a cluster is very successful. The question is how efficiently it is possible to parallerate applications - in other words, what will be real, and not peak performance. The project participants are now working on the solution of this problem.

It should be noted that the construction of clusters from ordinary PCs becomes quite fashionable today in a scientific environment. Some academic institutions in our country also plan to create similar clusters.

When combining computers in a cluster of different power or different architecture, they say heterogeneous (inhomogeneous) clusters. Cluster nodes can simultaneously be used as custom workstations. In the case when it is not necessary, the nodes can be significantly facilitated and / or installed in the rack.

Standard OS workstations are used, most often, freely distributed - Linux / FreeBSD, together with special means of supporting parallel programming and load distribution. Programming, as a rule, within the framework of the message transfer model (most often - MPI). It is discussed in more detail in the next paragraph.

The history of the development of cluster architecture.

Dec's first announced the concept of a cluster system in 1983, determining it as a group of combined computing machines, which are a single information processing unit.

One of the first projects that gave a name to the whole class of parallel systems - BEOWULF clusters - originated in the center of the NASA Goddard Space Flight Center to support the necessary computing resources of the Earth and Space Sciences project. The BEOWULF project started in the summer of 1994, and a 16-processing cluster was collected soon on Intel 486DX4 / 100 MHz processors. Each node was installed 16 MB of RAM and 3 network Ethernet adapters. To work in such configuration, special drivers distributing traffic between available network cards have been developed.

Later in GSFC, a cluster of Thehive - Highly-Parallel Integrated Virtual Environment, the structure of which is shown in Fig. 2. This cluster consists of four subclusters E, B, G, and DL, combining 332 processors and two selected hosts. All nodes of this cluster are running Redhat Linux.

In 1998, in the Los Alamos National Laboratory of Astrophysicik Michael Warren and other scientists from a group of theoretical astrophysician built a supercomputer Avalon, which is a Linux cluster based on Alpha 21164a processors with a clock frequency of 533 MHz. Initially, Avalon consisted of 68 processors, then extended to 140. In each node, 256 MB of RAM, a hard disk on a 3 GB and a Fast Ethernet network adapter was installed. The total cost of the Avalon project was 313 thousand dollars, and the performance shown on the LINPACK test - 47.7 GFLOPS, allowed him to take 114th place in the 12th edition of the Top500 list next to the 152 IBM RS / 6000 SP processor system. In the same 1998, at the most prestigious conference in the field of high-performance computing SuperComputing'98, Avalon made a report "Avalon: An Alpha / Linux Cluster Achieves 10 GFlops for $ 150k", which received the first prize in the nomination "Best Price / Performance" nomination.

In April of this year, a cluster of Velocity +, consisting of 64 nodes with two Pentium III / 733 MHz processors and 2 GB of RAM each, and 2 GB of RAM each and with a total disk 27 GB were installed for biomedical studies in Cornell University. Nodes work under windows control 2000 and combined by Giganet CLAN.

The Lots of Boxes On Shelfes project is implemented at the US National Institute of Health in April 1997 and is interesting to using Gigabit Ethernet as a communications environment. First, the cluster consisted of 47 nodes with two Pentium Pro / 200 MHz processors, 128 MB of RAM and a disk of 1.2 GB on each node. In 1998 was implemented

the next stage of the project is LOBOS2, during which the nodes were transformed into desktop computers while maintaining the combination into a cluster. Now LOBOS2 consists of 100 computing nodes containing two Pentium II / 450 MHz processors, 256 MB of operational and 9 GB of disk memory. In addition to the cluster, 4 control computers with a total RAID-array capacity of 1.2 TB are connected.

One of the last cluster developments was the AMD Presto III supercomputer, which is a BEOWULF cluster of 78 Athlon processors. The computer is installed in the Tokyo Technological Institute. To date, AMD has built 8 supercomputers united in clusters using the BEOWULF method running Linux OS.

IBM clusters

RS / 6000.

IBM offers several types of weakly related systems based on RS / 6000, combined into clusters and running the High-Availability Clustered MultiProcessor / 6000 software (HACMP / 6000) software.

Cluster nodes operate in parallel, sharing access to logical and physical resources using the capabilities of the lock manager included in the HACMP / 6000.

Starting with the ad in 1991, the HACMP / 6000 product was constantly evolving. It includes a parallel resource manager, a distributed lock manager and a parallel logic volume manager, and the latter provided the possibility of balancing the load at the level of the entire cluster. Maximum amount Nodes in the cluster increased to eight. Currently, nodes with symmetric multiprocessing processing, built using Data Crossbar Switch, which provides linear performance growth with an increase in the number of processors appeared on the cluster.

RS / 6000 clusters are built on the basis of Ethernet, Token Ring or FDDI local networks and can be configured in various ways in terms of improving high reliability:

Hot reserve or easy switching in case of failure. In this mode, the active node performs application tasks, and the backup can perform non-critical tasks that can be stopped if necessary when the active assembly is needed.
Symmetric reserve. Similar to the hot reserve, but the role of the main and backup nodes is not fixed.
Mutual pickup or mode with load distribution. In this mode, each node in the cluster can "pick up" tasks that are performed on any other cluster node.

IBM SP2.

IBM SP2 is leading in the list of the largest Top500 supercomputers in terms of installation number (141 installation, and in total, 8275 such computers are operating with a total number of nodes above 86 thousand. The basis of these supercomputers is based on the architecture as a cluster approach using a powerful central switch. IBM uses this Approach for many years.

General architecture SP2.

The overall view of the SP2 architecture gives rice. 1. Its main feature of architecture is the use of high-speed switch with low delays to connect nodes among themselves. This externally extremely simple scheme, as the experience showed, was extremely flexible. First, the SP2 nodes were single-processor, then nodes with SMP architecture appeared.

Actually, all the details are hidden in the structure of the nodes. Moreover, nodes are different typesMoreover, even processors in neighboring nodes can be different. It provides

great configuration selection flexibility. The total number of nodes in the computing system can reach 512. SP2 nodes are actually independent computersAnd their straightmarks are sold to IBM under independent names. The most striking example of this is the four-processor SMP server RS \u200b\u200b/ 6000 44P-270 with microprocessors of POWER3-II, which in itself can be attributed to the class of computers of the middle class or even to mini supercomputers.

The microprocessors established in the SP2 nodes developed in two architectural lines: Power - Power2 - Power3 - Power3-II and the PowerPC line up to the model 604e with a clock frequency of 332 MHz.

Traditional for SP2 are "thin" (Wide Node) and "wide" (WIDE NODE) \u200b\u200bnodes with SMP architecture. They can be installed both PowerPC 604e (from two to four processors) and POWER3-II (up to four). The battery capacity of the nodes is from 256 MB to 3 GB (when using POWER3-II - up to 8 GB). The main differences between thin and wide nodes relate to the I / O subsystem. Wide nodes are designed for tasks requiring more powerful I / O capabilities: there are ten in them slots PCI (including three 64-bit) against two slots in thin nodes. Accordingly, the number of mounting compartments for disk devices in wide nodes is greater.

The speed of the switch is characterized by low delay values: 1.2 ms (up to 2 ms with the number of nodes above 80). This is an order of magnitude better than what can be obtained in modern Linux-clusters BEOWULF. Peak bandwidth of each port: It is 150 MB / s in one direction (that is, 300 MB / s per duplex transmission). The same bandwidth possess the switcher adapters located in the SP2 nodes. IBM also provides excellent delays and bandwidth results.

The most powerful SP2 nodes are "high" (HIGH NODE). The high node is a complex consisting of a computing node with connected input / output / output devices in an amount up to six pieces. Such a node also has an SMP architecture and contains up to 8 POWER3 processors with a clock frequency of 222 or 375 MHz.

In addition, the node of this type contains an input / output fee, which is also connected to system board. The I / O board contains two SABER symmetric logical blocks through which data is transmitted to external devices, such

like discs and telecommunication equipment. On the I / O board there are four PCI 64-bit slots and one 32-bit slot (frequency 33 MHz), as well as integrated UltraScSi controllers, Ethernet 10/100 Mbps, three consecutive and one parallel port.

With the advent of high nodes and microprocessors of POWER3-II / 375 MHz on the Linpack Parlell tests of the IBM SP2 system reached the performance of 723.4 GFLOPS. This result is achieved when using 176 nodes (704 processor). Considering that nodes can be set up to 512, this result shows that serially produced IBM SP2 is potentially close to 1 TFLOPS.

Cluster SUN solutions Sun Microsystems

Sun Microsystems offers cluster solutions based on its product SPARCCLASTER PDB Server, in which multiprocessor SMP servers SPARCSERVER 1000 and SPARCCENTER 2000 are used as nodes, and the SPARCSERVER 1000 can enter up to eight processors, and in SPARCCenter 2000 to 20 SuperSPARC processors. The following components include the following components: two cluster nodes based on SparcServer 1000 / 1000e or SPARCCENTER 2000 / 2000E, two SparcStorage Array disk array, as well as a package for building a cluster, including duplicate communication equipment, Class Management Console Console Class Management Console , SPARCCLASTER PDB Software software and cluster service package.

To ensure high performance and availability of communications, the cluster supports the full duplication of all data triggers. Cluster nodes are combined with SunfasteThernet channels with 100 Mbps bandwidth. To connect the disk subsystems, the fiber-optic Fiber Channel interface with a 25 Mbps bandwidth, which admits the removal of drives and nodes from each other to a distance of up to 2 km. All links between nodes, nodes and disk subsystems are duplicated at the hardware level. Hardware, software and cluster network means provide the absence of such a place in the system, a single failure or a failure of which would derive the entire system.

University projects

Interesting development of the University of Kentucky - Klat2 Cluster (Kentucky Linux Athlon Testbed 2). The KLAT2 system consists of 64 diskless nodes with AMD ATHLON / 700 MHz processors and 128 MB RAM on each. Software, compilers and mathematical libraries (Scalapack, Blacs and Atlas) were finalized to efficiently use 3DNow technology! aMD processorsWhat made it possible to increase productivity. Significant interest is the used network decision, named "Flat NeighbourgHood Network" (FNN). Each node has four FAST Ethernet network adapters from SmartLink, and the nodes are connected using nine 32-port switches. At the same time, for any two nodes there is always a direct connection through one of the switches, but there is no need to connect all nodes through a single switch. Thanks to the optimization of software under the AMD architecture and the FNN topology, it was possible to achieve a record price / performance ratio - $ 650 per 1 GFLOPS.

The idea of \u200b\u200bsplitting the cluster to the sections received an interesting embodiment in the Chiba City project implemented in the Argonne National Laboratory. The main section contains 256 computing nodes, on each

of which there are two Pentium III / 500 MHz processors, 512 MB of RAM and a local disk with a capacity of 9 GB. In addition to the computing partition, the visualization section includes (32 IBM Intellistation personal computers with Matrox Millenium G400 graphics, 512 MB of RAM and 300 GB drives), data storage section (8 iBM servers NetFinity 7000 with XEON / 500 MHz processors and 300 GB drives) and control partition (12 computers IBM NetFinity 500). All of them are combined by a MyRinet network that is used to support parallel applications, as well as Gigabit Ethernet and Fast Ethernet for managers and service purposes. All sections are divided into "cities" (Town) 32 computers. Each of them has its "mayor", which locally serves its "city", reducing the load on the service network and providing fast access to local resources.

Cluster projects in Russia

In Russia, there was always a high need for high-performance computing resources, and the relatively low cost of cluster projects served as a serious impetus to the widespread dissemination of such solutions in our country. One of the first appeared a cluster "Parity", assembled in Yvvibd and consisting of eight Pentium II processors associated with the MYRINET network. In 1999, the cluster solution based on the SCI network was tested in NICEW, which is essentially a pioneer using SCI technology to build parallel systems in Russia.

A high-performance cluster based on the SCI communication network is installed in the Scientific Research Center for Moscow State University. Nivc cluster includes 12 dual-processor servers "EXIMER" based on Intel Pentium. III / 500 MHz, a total of 24 processors with a total peak performance of 12 billion operations per second. The total cost of the system is about 40 thousand dollars or about 3.33 thousand per 1 GFLOPS.

Cluster computing nodes are connected by unidirectional SCI network channels in a two-dimensional TOR 3x4 and simultaneously connected to the central server through the Fast Ethernet auxiliary network and the 3COM SuperStack switch. SCI network is a cluster core making this system a unique computing installation of a supercomputer class oriented to a wide class of tasks. The maximum data exchange rate over the SCI network in user applications is more than 80 MB / s, and the latency time is about 5.6 μs. When building this computing cluster, the integrated WulfKit solution developed by Dolphin Interconnect Solutions and Scali Computer (Norway) was used.

The main means of parallel programming on the cluster is MPI (Message Passing Interface) version of SCAMPI 1.9.1. On the LinPack test, when solving a system of linear equations with a size matrix of 16000x16000, the actual performance was more than 5.7 GFLOPS. On NPB package tests, the cluster performance is comparable, and sometimes exceeds the performance of the supercomputers of the Cray T3e family with the same number of processors.

The main area of \u200b\u200bapplication of the Nivz MSU NIVC computing cluster is support for fundamental scientific research and educational process.

From other interesting projects, it should be noted the decision implemented in St. Petersburg University on the basis of FAST Ethernet technology: the collected clusters can be used and as full-fledged independent training classes, and as a single computing installation that solves a single task. In the Samara Scientific Center

they went along the way to create an inhomogeneous computing cluster, which includes computers based on Alpha and Pentium III processors. At the St. Petersburg Technical University, installing on the basis of the Alpha processors and the MYRINET network without using local disks on computing nodes. In the Ufa State Aviation Technical University, a cluster is designed on the basis of twelve Alpha-stations, a Fast Ethernet network and Linux OS.

2. Arithmetic logic devices. Structure, approach to design, the main equations of Allu

3. Organization of transfer circuits within the Union section. Extension of bit, accelerated transfer scheme

4. Register Allu is the basic structure of the microprocessor. Options for building register structures. Task management and synchronization

7. Firmware control device. Structure, ways to form control signals, microcomand addressing

8. System of commands and ways to address operands. Conveyor principle of execution of commands

9. Structural conflicts and ways to minimize them. Conflicts according to data, stops of the conveyor and the implementation of the detachment mechanism

10. Reducing losses for performing transition commands and minimizing management conflicts

11. Classification of memory systems. Organization of memory systems in microprocessor systems

12. Principles of Cash Memory. Ways to display data from RAM to Cash Memory

13. Direct memory access modes. Structures of controllers of the PDP

14. Principles of Virtual Memory

15. Typical structures and principles of functioning of microprocessor systems

16. The main modes of the functioning of the microprocessor system. Implementation of the main program, calling subroutines

17. The main modes of the functioning of the microprocessor system. Processing of interrupts and exceptions

18. Systems with a cyclic survey. Block of priority interrupt

19. Exchange information between the elements in microprocessor systems. Highway arbiter

Part 2

20. Classification of modern microprocessor architectures. Architecture with full and abbreviated command set, supercalar architecture

21. Classification of modern microprocessor architectures. Princeton (Background Neymanan) and Harvard Architecture

22. Structure of modern 8-bit microcontrollers Crisc-architecture

22 (?). The structure of modern 32-bit microcontrollers Crisc-architecture

23. Digital processing processors: Principles of organization, generalized structure

24. General purpose processors on the example of architectureInTelp6

25. Classification of architectures of parallel computing systems. Systems with shared shared memory

26. Classification of architectures of parallel computing systems. Distributed Memory Systems

27. Vector conveyor computing systems. Memory with bundle. Features of the system structureCray-1

28. Matrix computing systems. Features of building memory systems and switches

29. Data control machines. Principles of operation and features of their construction. Graphic program presentation method

30. Systems with a programmable structure. Uniform computing environments

31. Systolic computing systems

32. Cluster computing systems: definition, classification, topology

Cluster computing systems have become a continuation of the development of ideas laid down in the architecture of MPA systems. If a processor module is performed as a complete computing node in the MPA system, then serial computers are used as such computing nodes.

Cluster- This is a coupled set of full-fledged computers used as a single computing resource. As cluster nodes, both the same (homogeneous clusters) and different (heterogeneous clusters) computing machines can be used. By its architecture, the cluster computing system is weakly owned. To create clusters, either simple single-processor personal computers are usually used or two or four-processor SMP servers. It does not impose any restrictions on the composition and architecture of nodes.

At the hardware level cluster- A combination of independent computing systems united by the network.

The simplest classification of cluster systems is based on the method of using disk arrays: together or separately.

Cluster configuration without shared discs:

Cluster configuration with shared discs:

The structures of clusters of two nodes are shown, the coordination of which is ensured by the high-speed line used to exchange messages. It can be a local network used also and not included in the cluster computers or a dedicated line. In the case of a dedicated line, one or more cluster nodes will have access to a local or global network, thereby connectivity between the server cluster and remote client systems.

The difference between the clusters represented is that in the case of a local network, the nodes use local disk arrays, and in the case of a selected node line, one excessive array of independent hard drives or the so-called RAID (RedundantArrayOfindependentDisks) is shared. RAID consists of several disks controlled by the controller, interconnected by high-speed channels and perceived by the external system as a whole. Depending on the type of array used, various degrees of fault tolerance and speed can be provided.

Classification of clusters for clustering methods used, which determine the main functional features of the system:

∙ clustering with passive reservation;

∙ clustering with active reservation;

∙ Independent servers;

∙ Servers with connecting to all discs;

∙ Servers with shared discs.

Reservation clustering is the oldest and universal method. One of the servers assumes the entire computing load, while the other remains inactive, but ready to accept calculations when the main server fails. Active (or primary) server periodically sends a backup message with a backup (secondary) server. In the absence of tacting messages, which is considered as a primary server failure, the secondary server takes control over.

Passive reservation for clusters is uncharacteristic. The term "cluster" refer to a variety of interrelated nodes actively involved in the computing process and jointly creating the illusion of one powerful computing machine. This configuration usually use the concept of a system with an active secondary server, and there are three clustering methods: independent servers, servers without sharing disks and server-sharing servers.

In the first method, each cluster node is considered as an independent server with its own disks, and none of the disks in the system are shared.

To reduce communication costs, most clusters are currently consisting of servers connected to common disks, usually represented by the RAID disk array. One of the options for this approach suggests that the sharing of disks does not apply. Common discs are divided into partitions, and each cluster node is allocated. If one of the nodes refuses, the cluster can be reconfigured in such a way that access rights to its section of the common disk are transmitted to another node. With a different version, multiple servers are separated in time access to common disks, so that any node has access to all the sections of all common disks. This approach requires any blocking tools that guarantee that only one of the servers will have access to data at any time.

Topology of cluster pairs:

Topology of cluster par used in organizing two or four-numeric clusters. Nodes are grouped in pairs, disk arrays are joined to both nodes that are part of the pair, and each pair node has access to all disk arrays of this pair. One of the nodes of the pair is used as a backup for another.

Four-chosl cluster steam is a simple expansion of a two-zone topology. Both cluster pairs from the point of view of administration and settings are treated as a single integer.

Topology N.+ 1:

Topology N. + 1 Allows you to create clusters of two, three and four nodes. Each disk array is connected only to two cluster nodes. Disk arrays are organized according to the RAID1 scheme (Mirroring). One server has a connection with all disk arrays and serves as a backup for all other (basic or active) nodes. The backup server can be used to ensure a high degree of readiness in a pair with any of the active nodes.

Topology N. × N. Similar to topology N. + 1 allows you to create clusters of two, three and four nodes, but unlike it has greater flexibility and scalability. Only in this topology, all cluster nodes have access to all disk arrays, which, in turn, are built according to the RAID1 scheme (Mirroring). The scalability of the topology is manifested in the simplicity of adding additional nodes to the cluster and disk arrays without changing the connections in the system.

The topology allows you to organize a cascade fault tolerance system, in which the processing is transferred from a faulty node to the backup, and in case of its failure to the next reserve assembly, etc., the general topology has better fault tolerance and flexibility compared to other topologies.

Topology N.× N.:

Topology with fully separated access:

Topology with fully separated access allows the connection of each disk array with only one cluster node. It is recommended only for those applications for which the architecture of fully separate access is characteristic.

Blue Gene / L and SGI Altix family.

As a basic software for organizing computing on cluster systems, Windows Compute Cluster Server (CCS) 2003 is considered. It is given its overall characteristics and the composition of services operating on cluster nodes.

When entering into this section, the rules for working with the Start and Managing Cons Tube CCS are given. Describes the details of the CCS scheduler in the execution of tasks on the cluster.

1.1. Architecture of high-performance processors and cluster systems

In the history of the development of computer processor architecture, two major stages can be distinguished:

1st stage - an increase in the clock frequency of processors (up to 2000),
2nd stage - the emergence of multi-core processors (after 2000)

Thus, an SMP-based approach (Symmetrical Multiprocessing), which has developed when building high-performance servers, in which several processors share the system resource, and, first of all, the RAM (see Figure 1.1), shifted down to the level of the nuclei inside processor.

Fig. 1.1.

On the way to multi-core processors, the first appeared Hyper-Threading technology, first applied in 2002 in Intel Pentium 4 processors:

Fig. 1.2.

In this technology, two virtual processors share all the resources of one physical processor, namely, caches, execution conveyor and individual actuators. At the same time, if one virtual processor took a shared resource, the second will expect its release. Thus, the Hyper-Threading processor can be compared with a multi-tasking operating system that ensures each process that works its virtual computer with a full range of funds and planning the procedure and time of these processes in physical equipment. Only in the case of Hyper-Threading, all this happens at a significantly lower hardware level. However, two flows of commands allow you to more efficiently load the actuators of the processor. The real increase in the performance of the processor from the use of Hyper-Threading technology is estimated from 10 to 20 percent.

A full-fledged dual-core processor (see Figure 1.3), on individual tasks demonstrates performance growth from 80 to 100 percent.

Fig. 1.3.

Thus, dual-core and, in the general case, multi-core processor , It can be considered as SMP in a miniature, which does not have the need to use complex and expensive multiprocessor motherboards.

Moreover, each kernel may (as, for example, in the Intel Pentium Extreme Edition 840 processor) support Hyper-Threading technology, and therefore such a dual-core processor can perform four software streams simultaneously.

In early 2007, Intel introduced the 80-core single-chip processor, called Teraflops Research Chip (http://www.intel.com/research/platform/terascale/teraflops.htm). This processor can reach the performance of 1.01 Teraflops with a minimum clock rate of the kernel 3.16 GHz and voltage of 0.95 V. In this case, the total energy consumption The chip is only 62 W.

According to Intel forecasts, commercial versions of processors with a large number of nuclei will appear in the next 5 years, and by 2010 a quarter of all the supplied servers will have terafop performance.

Cluster computing systems and their architecture

Cluster - This is a local (located geographically in one place) computing system consisting of a variety of independent computers and network connecting them. In addition, the cluster is a local system because it is managed within a separate administrative domain as a single computer system.

Computer knots Of which it consists, are standard, universal (personalized) computers used in various fields and for a variety of applications. The computational node may contain either one microprocessor or several forming, in the latter case, symmetric (SMP) configuration.

The network component of the cluster can be either a regular local network, or to be built on the basis of special network technologies that provide super-power data transmission between cluster nodes. The cluster network is designed to integrate cluster nodes and, usually, separated from an external network through which users are accessed to the cluster.

Cluster software consists of two components:

development / programming tools and
resource management tools.

Development tools include compilers for languages, library various destination, productivity measurement tools, as well as debuggers, which, all together, allows you to build parallel applications.

Resource management software includes installations, administration and work planning tools.

Although there are a lot of programming models for parallel processing, but, at the moment, the dominant approach is a model based on "Message Message" (Message Passing Interface). MPI is a library of functions with which in programs in C or Fortran languages \u200b\u200bcan be transmitted messages between parallel processes, as well as control these processes.

Alternatives to this approach are languages \u200b\u200bbased on the so-called "global distributed address space" (GPAS - Global Partitioned Address Space), whose typical representatives are HPF languages \u200b\u200b(Unified Parallel C).