Introduction to Distributed Systems. Distributed Application Architecture Distributed Computing Paradigm

According to the well-known expert in the field of computer science E. Tanenbaum, there is no generally accepted and at the same time strict definition of a distributed system. Some wits argue that distributed is such computing system, in which a malfunction of a computer, the existence of which users did not even suspect before, leads to the termination of all their work. A significant part of distributed computing systems, unfortunately, satisfy this definition, but formally it refers only to systems with a unique point of vulnerability ( single point of failure).

Often, when defining a distributed system, the focus is on the division of its functions among several computers. With this approach, any is distributed computing system where data processing is split between two or more computers. Based on the definition of E. Tanenbaum, a somewhat more narrowly distributed system can be defined as a set of independent computers connected by communication channels, which, from the point of view of a user of some software, look like a single whole.

This approach to defining a distributed system has its drawbacks. For example, everything used in such a distributed system software could work on a single computer, but from the point of view of the above definition, such a system will no longer be distributed. Therefore, the concept of a distributed system should probably be based on the analysis of the software that forms such a system.

As a basis for describing the interaction of two entities, consider the general model of client-server interaction, in which one of the parties (the client) initiates the exchange of data by sending a request to the other party (the server). The server processes the request and, if necessary, sends a response to the client (Fig. 1.1).

Rice. 1.1.

Interaction within the framework of the client-server model can be either synchronous, when the client is waiting for the server to process its request, or asynchronous, in which the client sends a request to the server and continues its execution without waiting for the server's response. The client and server model can be used as a basis for describing various interactions. For this course, the interaction of the constituent parts of the software that forms a distributed system is important.

Rice. 1.2.

Consider a certain typical application, which, in accordance with modern concepts, can be divided into the following logical levels (Fig. 1.2): user interface(PI), application logic (LP) and data access (DD), working with the database (DB). The system user interacts with it through the user interface, the database stores data describing the application domain, and the application logic layer implements all algorithms related to subject area.

Since, in practice, different users of the system are usually interested in accessing the same data, the simplest separation of the functions of such a system between several computers will be the separation of the logical layers of the application between one server part of the application, which is responsible for accessing the data, and the client parts located on several computers. implementing the user interface. Application logic can be assigned to the server, clients, or shared between them (Figure 1.3).

Rice. 1.3.

The architecture of applications built on this principle is called client-server or two-tier. In practice, such systems are often not classified as distributed, but formally they can be considered the simplest representatives of distributed systems.

The development of the client-server architecture is a three-tier architecture, in which the user interface, application logic and data access are separated into independent components of the system that can run on independent computers (Fig. 1.4).

Rice. 1.4.

The user's request in such systems is sequentially processed by the client part of the system, the application logic server and the database server. However, a distributed system is usually understood as a system with a more complex architecture than a three-tier one.

Principles of creating an enterprise-wide information processing system

The history of the development of computer technology (and, accordingly, software) began with separate, autonomous systems. Scientists and engineers were preoccupied with the creation of the first computers and were mainly puzzled over how to make these swarms of vacuum tubes work. However, this state of affairs did not last long - the idea of combining computing power was quite obvious and was in the air, saturated with the hum of metal cabinets of the first ENIAKs and Marks. After all, the idea of combining the efforts of two or more computers to solve complex, unbearable tasks for each of them separately lies on the surface.

Rice. 1. Scheme of distributed computing

However, the practical implementation of the idea of connecting computers into clusters and networks was hampered by the lack of technical solutions and, first of all, by the need to create standards and communication protocols. As you know, the first computers appeared in the late forties of the twentieth century, and the first computer network ARPANet, which connected several computers in the United States, only in 1966, almost twenty years later. Of course, such a combination of computing capabilities of modern distributed architecture resembled very vaguely, but nevertheless it was the first step in the right direction.

The emergence of local area networks over time led to the development of a new area of software development - the creation of distributed applications. We had to do this from scratch, as they say, but, fortunately, large companies, whose business structure required such solutions, immediately showed interest in such applications. It was at the stage of creating corporate distributed applications that the basic requirements were formed and the main architectures of such systems were developed, which are still used today.

Gradually, mainframes and terminals evolved towards a client-server architecture, which was essentially the first version of a distributed architecture, that is, a two-tier distributed system. Indeed, it was in client-server applications that part of the computational operations and business logic was transferred to the client's side, which, in fact, became the highlight, the hallmark of this approach.

It was during this period that it became apparent that the main advantages of distributed applications are:

· Good scalability - if necessary, the computing power of a distributed application can be easily increased without changing its structure;

· The ability to manage load - intermediate levels of a distributed application make it possible to manage the flows of user requests and redirect them to less loaded servers for processing;

· Globality - a distributed structure allows you to follow the spatial distribution of business processes and create client workstations at the most convenient points.

As time went on, small islands of university, government and corporate networks expanded, merged into regional and national systems. And then the main player appeared on the scene - the Internet.

Laudatory eulogies about the World Wide Web have long been a common place for publications on computer topics. Indeed, the Internet has played a pivotal role in the development of distributed computing and has made this rather specific area of software development the focus of an army of professional programmers. Today, it significantly expands the use of distributed applications, allowing remote users to connect and making application functions available everywhere.

This is the history of the issue. Now let's take a look at what distributed applications are.

Distributed computing paradigm

Imagine a fairly large manufacturing facility, trading company, or service provider. All of their divisions already have their own databases and specific software. The central office somehow collects information about the current activities of these departments and provides managers with information on the basis of which they make management decisions.

Let's go further and suppose that the organization we are considering is successfully developing, opening branches, developing new types of products or services. Moreover, at the last meeting, progressive-minded executives decided to organize a network of remote workstations from which customers could receive some information about the fulfillment of their orders.

In the described situation, it remains only to pity the head of the IT department if he did not take care of building a general business flow management system in advance, because without it it will be very difficult to ensure the effective development of the organization. Moreover, one cannot do without an enterprise-wide information processing system, designed taking into account the increasing load and, moreover, corresponding to the main business flows, since all departments must perform not only their tasks, but also, if necessary, process requests from other departments and even ( nightmare for a project manager!) customers.

So, we are ready to formulate the basic requirements for modern enterprise-scale applications dictated by the very organization of the production process.

Spatial separation. The divisions of the organization are dispersed in space and often have poorly unified software.

Structural Compliance. The software should adequately reflect the information structure of the enterprise - it should correspond to the main data streams.

Orientation to external information. Modern enterprises are forced to pay increased attention to working with customers. Therefore, enterprise software must be able to work with a new type of user and their needs. Such users knowingly have limited rights and have access to a strictly defined type of data.

All of the above requirements for enterprise-wide software are met by distributed systems - the computation distribution scheme is shown in Fig. 1.

Of course, distributed applications are not free from flaws. Firstly, they are expensive to operate, and secondly, the creation of such applications is a laborious and complex process, and the cost of an error at the design stage is very high. Nonetheless, the development of distributed applications is progressing well - the game is worth the candle, because such software helps to improve the efficiency of the organization.

So, the paradigm of distributed computing implies the presence of several centers (servers) for storing and processing information, implementing various functions and spaced apart. These centers, in addition to the requests of the clients of the system, must also fulfill the requests of each other, since in some cases, the solution of the first task may require the joint efforts of several servers. To manage complex requests and the functioning of the system as a whole, specialized control software is required. And finally, the entire system must be "immersed" in some kind of transport environment that ensures the interaction of its parts.

Distributed computing systems have such common properties as:

· Manageability - implies the ability of the system to effectively control its components. This is achieved through the use of control software;

· Performance - provided due to the possibility of redistributing the load on the servers of the system using the control software;

Scalability - if it is necessary to physically increase productivity, a distributed system can easily integrate new computing resources into its transport environment;

· Extensibility - new components (server software) with new functions can be added to distributed applications.

Access to data in distributed applications is possible from client software and other distributed systems can be organized at various levels - from client software and transport protocols to protection of database servers.

Rice. 2. The main levels of the architecture of a distributed application

The listed properties of distributed systems are sufficient reason to put up with the complexity of their development and high cost of maintenance.

Distributed Application Architecture

Consider the architecture of a distributed application that allows it to perform complex and varied functions. Different sources provide different options for building distributed applications. And they all have a right to exist, because such applications solve the widest range of problems in many subject areas, and the irrepressible development of development tools and technologies pushes for continuous improvement.

Nevertheless, there is the most general architecture of a distributed application, according to which it is divided into several logical layers, data processing layers. Applications, as you know, are designed to process information, and here we can distinguish three main functions of them:

· Data presentation (user level). Here application users can view the necessary data, send a request for execution, enter new data into the system or edit it;

· Data processing (intermediate level, middleware). At this level, the business logic of the application is concentrated, the data flows are controlled and the interaction of the application parts is organized. It is the concentration of all data processing and control functions at one level that is considered the main advantage of distributed applications;

· Data storage (data layer). This is the database server tier. The servers themselves, databases, data access tools, and various auxiliary tools are located here.

This architecture is often referred to as a three-tier or three-tier architecture. And very often on the basis of these "three whales" the structure of the developed application is created. It is always noted that each level can be further subdivided into several sublevels. For example, the user level can be broken down into the actual user interface and rules for validating and processing input data.

Of course, if we take into account the possibility of splitting into sublevels, then any distributed application can be included in the three-tier architecture. But here one cannot ignore another characteristic feature inherent in distributed applications - this is data management. The importance of this feature is obvious because it is very difficult to create a real-world distributed application (with all client stations, middleware, database servers, etc.) that does not manage its requests and responses. Therefore, a distributed application must have another logical layer - the data management layer.

Rice. 3. Distribution of business logic across the levels of a distributed application

Therefore, it is advisable to divide the intermediate level into two independent ones: the data processing level (since it is necessary to take into account the important advantage that it gives - the concentration of business rules for data processing) and the data management level. The latter provides control over the execution of requests, maintains work with data streams and organizes the interaction of parts of the system.

Thus, there are four main layers of a distributed architecture (see Fig. 2):

· Data presentation (user level);

· Business logic rules (data processing layer);

· Data management (data management layer);

· Data storage (data storage layer).

Three of the four levels, excluding the first, are directly involved in data processing, and the data presentation layer allows you to visualize and edit them. With the help of this layer, users receive data from the data processing layer, which, in turn, retrieves information from the repositories and performs all the necessary data transformations. After entering new information or editing existing data, data streams are directed backward: from the user interface through the business rules layer to the repository.

Another layer - data management - stands aside from the data backbone, but it ensures the smooth operation of the entire system, managing requests and responses and the interaction of parts of the application.

Separately, it is necessary to consider the option of viewing data in the "read-only" mode. In this case, the data processing layer is not used in the general data transfer scheme, since there is no need to make any changes. And the flow of information itself is unidirectional - from the storage to the data presentation level.

Physical structure of distributed applications

Now let's turn to the physical layers of distributed applications. The topology of a distributed system implies division into several database servers, data processing servers, and a collection of local and remote clients. All of them can be located anywhere: in the same building or on another continent. In any case, parts of a distributed system must be connected by reliable and secure communication lines. As for the data transfer rate, it largely depends on the importance of the connection between the two parts of the system in terms of data processing and transmission, and to a lesser extent on their remoteness.

Distribution of business logic across distributed application tiers

Now is the time to move on to a detailed description of the levels of a distributed system, but first let's say a few words about the distribution of application functionality across levels. Business logic can be implemented at any of the levels of the three-tier architecture.

Database servers can not only store data in databases, but also contain part of the application's business logic in stored procedures, triggers, etc.

Client applications can also implement data processing rules. If the set of rules is minimal and comes down mainly to procedures for checking the correctness of data entry, we are dealing with a "thin" client. In contrast, a thick client contains a large proportion of the application's functionality.

The level of data processing is actually intended to implement the business logic of the application, and all the basic rules for data processing are concentrated here.

Thus, in the general case, the functionality of the application is "smeared" throughout the application. All the variety of distribution of business logic across application tiers can be represented as a smooth curve showing the proportion of data processing rules concentrated in a specific place. The curves in Fig. 3 are qualitative in nature, but nevertheless allow you to see how changes in the structure of the application can affect the distribution of rules.

And practice confirms this conclusion. After all, there are always a couple of rules that need to be implemented in the stored procedures of the database server, and it is very often convenient to transfer some initial operations with data to the client side - at least to prevent processing of incorrect requests.

Presentation layer

The data presentation layer is the only one available to the end user. This layer simulates the client workstations of a distributed application and the corresponding software. The capabilities of the client workstation are primarily determined by the capabilities of the operating system. Depending on the type of user interface, client software is divided into two groups: clients that use GUI capabilities (for example, Windows), and Web clients. But in any case, the client application must provide the following functions:

· Receiving data;

· Presentation of data for viewing by the user;

· Data editing;

· Checking the correctness of the entered data;

· Saving the changes made;

· Handling exceptions and displaying information about errors for the user.

It is desirable to concentrate all business rules at the data processing level, but in practice this is not always possible. Then they talk about two types of client software. The thin client contains a minimal set of business rules, while the thick client implements a significant portion of the application logic. In the first case, the distributed application is much easier to debug, modernize and expand, in the second, you can minimize the costs of creating and maintaining the data management layer, since some of the operations can be performed on the client side, and only data transfer falls on the middleware.

Data processing layer

The data processing layer combines the parts that implement the business logic of the application, and is an intermediary between the presentation layer and the storage layer. All data pass through it and undergo changes in it, due to the problem being solved (see Fig. 2). The functions of this level include the following:

· Processing of data streams in accordance with business rules;

· Interacting with the data presentation layer to receive requests and return responses;

· Interaction with the data storage layer to send requests and receive responses.

Most often, the data processing layer is equated with the middleware of a distributed application. This situation is fully true for an "ideal" system and only partially for real applications (see Fig. 3). As for the latter, the middleware for them contains a large proportion of data processing rules, but some of them are implemented in SQL servers in the form of stored procedures or triggers, and some are included in the client software.

Such "blurring" of business logic is justified, since it allows to simplify some of the data processing procedures. Let's take a classic example of an order statement. It can include the names of only those products that are in stock. Therefore, when adding a certain item to the order and determining its quantity, the corresponding number must be subtracted from the remainder of this item in the warehouse. Obviously, the best way to implement this logic is through the DB server — either a stored procedure or a trigger.

Data management layer

The data management layer is needed to ensure that the application remains coherent, resilient and reliable, has the ability to modernize and scale. It ensures the execution of system tasks, without it, parts of the application (database servers, application servers, middleware, clients) will not be able to interact with each other, and connections broken during an increase in load cannot be restored.

In addition, various system services of the application can be implemented at the data management level. After all, there are always functions common to the entire application that are necessary for the operation of all levels of the application, therefore, they cannot be located on any of the other levels.

For example, a time stamp service provides all parts of an application with system timestamps that keep them in sync. Imagine that a distributed application has a server that sends clients tasks with a specific deadline. If the deadline is missed, the task should be registered with the calculation of the delay time. If the client workstations are located in the same building as the server, or on an adjacent street, no problem, the accounting algorithm is simple. But what if customers are located in different time zones - in other countries or even overseas? In this case, the server must be able to calculate the difference taking into account the time zones when sending tasks and receiving responses, and clients will be required to add service information about the local time and time zone to the reports. If a single time service is included in a distributed application, then this problem simply does not exist.

In addition to the one time service, the data management level can contain services for storing general information (information about the application as a whole), generating general reports, etc.

So, the functions of the data management layer include:

· Managing parts of a distributed application;

· Management of connections and communication channels between parts of the application;

· Control of data flows between clients and servers and between servers;

· Load control;

· Implementation of system services of the application.

It should be noted that often the data management layer is created on the basis of ready-made solutions supplied to the software market by various manufacturers. If the developers have chosen the CORBA architecture for their application, then it includes an Object Request Broker (ORB), if the platform is Windows, they have a variety of tools at their service: COM + technology (development of Microsoft Transaction Server technology, MTS), processing technology MSMQ message queues, Microsoft BizTalk technology, etc.

Data storage layer

The storage tier brings together the SQL servers and databases used by the application. It provides a solution to the following tasks:

· Storing data in a database and keeping them in working order;

· Processing of requests of the level of data processing and return of results;

· Implementation of a part of the business logic of a distributed application;

· Management of distributed databases using administrative tools of database servers.

In addition to the obvious functions - storing data and processing queries, a layer can contain a part of the application's business logic in stored procedures, triggers, constraints, etc. And the very structure of the application database (tables and their fields, indexes, foreign keys, etc.) ) there is an implementation of the data structure with which the distributed application works, and the implementation of some rules of business logic. For example, the use of a foreign key in a database table requires the creation of a corresponding restriction on data manipulations, since the records of the main table cannot be deleted if there are corresponding records linked by the foreign key of the table.

Most database servers support a variety of administration procedures, including distributed database management. These include data replication, remote archiving, tools for accessing remote databases, etc. The ability to use these tools should be considered when developing the structure of your own distributed application.

Connecting to SQL Server databases is done primarily with the server client software. In addition, various data access technologies can additionally be used, for example, ADO (ActiveX Data Objects) or ADO.NET. But when designing a system, it is necessary to take into account that functionally intermediate data access technologies do not belong to the data storage level.

Base Level Extensions

The above levels of distributed application architecture are basic. They form the structure of the created application as a whole, but at the same time, of course, they cannot provide the implementation of any application - the subject areas and tasks are too vast and diverse. For such cases, the architecture of a distributed application can be extended with additional layers that are designed to reflect the features of the application being created.

Among others, there are two of the most commonly used base level extensions.

The business interface layer is located between the user interface layer and the data processing layer. It hides from client applications the details of the structure and implementation of business rules of the data processing layer, providing abstraction of the client application code from the implementation features of the application logic.

As a result, developers of client applications use a certain set of necessary functions - an analogue of an application programming interface (API). This makes the client software independent from the implementation of the data processing layer.

Of course, when making serious changes to the system, you cannot do without global alterations, but the level of the business interface allows you not to do this unless absolutely necessary.

The data access layer is located between the data storage layer and the data processing layer. It allows you to make the structure of the application independent of a specific data storage technology. In such cases, the software objects of the data processing layer send requests and receive responses using the means of the chosen data access technology.

When implementing applications on the Windows platform, ADO data access technology is most often used because it provides a universal way to access a wide variety of data sources - from SQL servers to spreadsheets. For applications on the .NET platform, ADO.NET technology is used.

In the previous chapter, we looked at tightly coupled multiprocessor systems with shared memory, shared kernel data structures, and a shared pool from which processes are invoked. Often, however, it is desirable to allocate processors in such a way that they are autonomous from the operating environment and operating conditions for resource sharing purposes. Suppose, for example, a user of a personal computer needs to access files located on a larger machine, but at the same time retain control over the personal computer. Although some programs, such as uucp, support network file transfer and other network functions, their use will not be hidden from the user, since the user is aware that he is using the network. In addition, it should be noted that programs like text editors do not work with deleted files, as with ordinary ones. Users should have the standard set of UNIX system functions and, aside from the potential performance bottleneck, should not feel the crossing of machine boundaries. So, for example, the work of the system functions open and read with files on remote machines should not differ from their work with files belonging to local systems.

The distributed system architecture is shown in Figure 13.1. Each computer shown in the figure is a self-contained unit consisting of a CPU, memory, and peripherals. The model does not break even though the computer does not have a local file system: it must have peripheral devices to communicate with other machines, and all files belonging to it can be located on another computer. The physical memory available to each machine is independent of the processes running on other machines. In this respect, distributed systems differ from the tightly coupled multiprocessor systems discussed in the previous chapter. Accordingly, the core of the system on each machine functions independently of the external operating conditions of the distributed environment.

Figure 13.1. Distributed architecture system model

Distributed systems, well described in the literature, traditionally fall into the following categories:

Peripheral systems, which are groups of machines that have a strong commonality and are associated with one (usually larger) machine. Peripheral processors share their load with the central processor and forward all calls to the operating system to it. The goal of a peripheral system is to increase overall network performance and to provide the ability to allocate a processor to a single process in a UNIX operating environment. The system starts up as a separate module; Unlike other models of distributed systems, peripheral systems do not have real autonomy, except in cases related to process dispatching and local memory allocation.

Distributed systems such as "Newcastle", allowing remote communication by the names of remote files in the library (the name is taken from the article "The Newcastle Connection" - see). Deleted files have a BOM (distinguished name) that, in the search path, contains special characters or an optional name component that precedes the file system root. The implementation of this method does not involve making changes to the system kernel, and therefore it is simpler than the other methods discussed in this chapter, but less flexible.

Distributed systems are completely transparent, in which standard distinguished names are sufficient to refer to files located on other machines; it is up to the kernel to recognize these files as deleted. File search paths specified in their composite names cross machine boundaries at mount points, no matter how many such points are formed when filesystems are mounted on disks.

In this chapter, we will look at the architecture of each model; all information provided is not based on the results of specific developments, but on information published in various technical articles. This assumes that protocol modules and device drivers are responsible for addressing, routing, flow control, and error detection and correction — in other words, that each model is independent of the network being used. The examples of using system functions shown in the next section for peripheral systems work in a similar way for systems like Newcastle and for completely transparent systems, which will be discussed later; therefore, we will consider them in detail once, and in the sections devoted to other types of systems, we will focus mainly on the features that distinguish these models from all others.

13.1 PERIPHERAL PROCESSORS

The architecture of the peripheral system is shown in Figure 13.2. The goal of this configuration is to improve overall network performance by reallocating running processes between the CPU and peripheral processors. Each of the peripheral processors does not have any other local peripheral devices at its disposal other than those it needs to communicate with the central processing unit. The file system and all devices are at the disposal of the central processor. Suppose that all user processes are executed on the peripheral processor and do not move between peripheral processors; once transferred to the processor, they remain on it until completion. The peripheral processor contains a light version of the operating system, designed to handle local calls to the system, interrupt management, memory allocation, work with network protocols and with a device driver for communication with the central processor.

When the system is initialized on the central processor, the core loads the local operating system on each of the peripheral processors via communication lines. Any process running on the periphery is associated with a satellite process belonging to the central processor (see); when a process running on a peripheral processor calls a system function that requires the services of the central processor only, the peripheral process communicates with its satellite and the request is sent to the central processor for processing. The satellite process performs a system function and sends the results back to the peripheral processor. The relationship between a peripheral process and its satellite is similar to the client-server relationship we discussed in detail in Chapter 11: the peripheral process acts as a client of its satellite, which supports the functions of working with the file system. In this case, the remote server process has only one client. In section 13.4 we will look at server processes with multiple clients.

Figure 13.2. Peripheral system configuration

Figure 13.3. Message formats

When a peripheral process calls a system function that can be processed locally, the kernel does not need to send a request to the satellite process. So, for example, in order to obtain additional memory, a process can call the sbrk function for local execution. However, if the services of the central processor are required, for example, to open a file, the kernel encodes information about the parameters passed to the called function and the process execution conditions into a message sent to the satellite process (Figure 13.3). The message includes a sign from which it follows that the system function is performed by the satellite process on behalf of the client, parameters passed to the function and data about the process execution environment (for example, user and group identification codes), which are different for different functions. The remainder of the message is variable-length data (for example, a compound file name or data to be written with the write function).

The satellite process waits for requests from the peripheral process; when a request is received, it decodes the message, determines the type of system function, executes it, and converts the results into a response sent to the peripheral process. The response, in addition to the results of the system function execution, includes the error message (if any), the signal number, and a variable-length data array containing, for example, information read from a file. The peripheral process is suspended until a response is received, after receiving it, it decrypts and transmits the results to the user. This is the general scheme for handling calls to the operating system; now let's move on to a more detailed consideration of individual functions.

To explain how the peripheral system works, consider a number of functions: getppid, open, write, fork, exit, and signal. The getppid function is pretty straightforward as it deals with simple request and response forms that are exchanged between the peripheral and the CPU. The core on the peripheral processor generates a message that has a sign, from which it follows that the requested function is the getppid function, and sends the request to the central processor. The satellite process on the central processor reads the message from the peripheral processor, decrypts the type of system function, executes it, and obtains the identifier of its parent. It then generates a response and passes it on to a pending peripheral process at the other end of the communication line. When the peripheral processor receives a response, it passes it to the process that called the getppid system function. If the peripheral process stores data (such as the process ID of the parent) in local memory, it does not have to communicate with its companion at all.

If the open system function is called, the peripheral process sends a message to its companion, which includes the file name and other parameters. If successful, the companion process allocates an index and entry point to the file table, allocates an entry in the user file descriptor table in its space, and returns the file descriptor to the peripheral process. All this time, at the other end of the communication line, the peripheral process is waiting for a response. He does not have any structures at his disposal that would store information about the file being opened; The descriptor returned by open is a pointer to an entry in the companion process in the user file descriptor table. The results of executing the function are shown in Figure 13.4.

Figure 13.4. Calling the open function from a peripheral process

If a call to the system function write is made, the peripheral processor generates a message consisting of a sign of the write function, a file descriptor and the amount of data to be written. Then, from the space of the peripheral process, it copies the data to the satellite process through the communication line. The satellite process decrypts the received message, reads the data from the communication line and writes them to the corresponding file (the descriptor contained in the message is used as a pointer to the index of which and the record about which in the file table is used); all of these actions are performed on the central processor. At the end of the work, the satellite process sends to the peripheral process a message that confirms the receipt of the message and contains the number of bytes of data that have been successfully copied to the file. The read operation is similar; the satellite informs the peripheral process about the number of actually read bytes (in the case of reading data from a terminal or from a channel, this number does not always coincide with the amount specified in the request). To perform either one or the other function, it may be necessary to send information messages multiple times over the network, which is determined by the amount of data sent and the size of the network packets.

The only function that needs to be changed while running on the CPU is the fork system function. When a process executes this function on the CPU, the kernel selects a peripheral processor for it and sends a message to a special process - the server, informing the latter that it is going to start unloading the current process. Assuming the server has accepted the request, the kernel uses fork to create a new peripheral process, allocating a process table entry and address space. The central processor unloads a copy of the process that called the fork function to the peripheral processor, overwriting the newly allocated address space, spawns a local satellite to communicate with the new peripheral process and sends a message to the peripheral to initialize the program counter for the new process. The satellite process (on the CPU) is a descendant of the process that called fork; a peripheral process is technically a descendant of the server process, but logically it is a descendant of the process that called the fork function. The server process has no logical connection with the child when fork completes; the server's only job is to help unload the child. Due to the strong connection between the system components (peripheral processors do not have autonomy), the peripheral process and the satellite process have the same identification code. The relationship between processes is shown in Figure 13.5: the continuous line shows the parent-child relationship, and the dotted line shows the relationship between peers.

Figure 13.5. Executing a fork function on the CPU

When a process executes the fork function on the peripheral processor, it sends a message to its satellite on the CPU, which then executes the entire sequence of actions described above. The satellite selects a new peripheral processor and makes the necessary preparations for unloading the image of the old process: it sends a request to the parent peripheral process to read its image, in response to which the transfer of the requested data begins at the other end of the communication channel. The satellite reads the transmitted image and overwrites it to the peripheral descendant. When the image unloading is finished, the satellite process fork, creating its child on the CPU, and passes the value of the program counter to the peripheral child so that the latter knows from which address to start execution. Obviously, it would be better if the child of the companion process was assigned to the peripheral child as a parent, but in our case, the generated processes are able to run on other peripheral processors, not just the one on which they are created. The relationship between processes at the end of the fork function is shown in Figure 13.6. When the peripheral process finishes its work, it sends a corresponding message to the satellite process, and that also ends. A companion process cannot initiate a shutdown.

Figure 13.6. Executing a fork function on a peripheral processor

In both multiprocessor and uniprocessor systems, the process must respond to signals in the same way: the process either completes the execution of the system function before checking the signals, or, on the contrary, upon receiving the signal, immediately exits the suspended state and abruptly interrupts the work of the system function, if this is consistent with the priority. with which he was suspended. Since the satellite process performs system functions on behalf of the peripheral process, it must respond to signals in coordination with the latter. If, on a uniprocessor system, a signal causes a process to abort the function, the companion process on a multiprocessor system should behave in the same way. The same can be said about the case when the signal prompts the process to terminate its work using the exit function: the peripheral process terminates and sends the corresponding message to the satellite process, which, of course, also terminates.

When a peripheral process calls the signal system function, it stores the current information in local tables and sends a message to its satellite informing it whether the specified signal should be received or ignored. The satellite process does not care whether it intercepts the signal or the default action. The reaction of a process to a signal depends on three factors (Figure 13.7): whether a signal is received while the process is executing a system function, whether an indication is made using the signal function to ignore the signal, whether the signal occurs on the same peripheral processor or on some other. Let's move on to considering the various possibilities.

sighandle algorithm / * signal processing algorithm * /

if (the current process is someone's companion or has a prototype)

if (signal is ignored)

if (the signal came during the execution of a system function)

put a signal in front of the satellite process;

send a signal message to a peripheral process;

else (/ * peripheral process * /

/ * whether a signal was received during the execution of a system function or not * /

send a signal to the satellite process;

algorithm satellite_end_of_syscall / * termination of a system function called by a peripheral process * /

input information: absent

imprint: none

if (an interrupt was received during the execution of a system function)

send interrupt message, signal to peripheral process;

else / * the execution of the system function was not interrupted * /

send reply: enable the flag showing the arrival of the signal;

Figure 13.7. Signal processing in the peripheral system

Suppose that a peripheral process has suspended its work while the satellite process performs a system function on its behalf. If the signal occurs elsewhere, the satellite process detects it earlier than the peripheral process. Three cases are possible.

1. If, while waiting for some event, the satellite process did not enter the suspended state, from which it would exit upon receiving a signal, it performs the system function to the end, sends the results of execution to the peripheral process and shows which of the signals it received.

2. If the process instructed to ignore this type of signal, the satellite continues to follow the system function execution algorithm without exiting the suspended state by longjmp. In the response sent to the peripheral process, there will be no signal received message.

3. If, upon receiving a signal, the satellite process interrupts the execution of a system function (by longjmp), it informs the peripheral process about this and informs it of the signal number.

The peripheral process looks for information about the receipt of signals in the received response and, if any, processes the signals before exiting the system function. Thus, the behavior of a process in a multiprocessor system exactly corresponds to its behavior in a uniprocessor system: it either exits without exiting the kernel mode, or calls a custom signal processing function, or ignores the signal and successfully completes the system function.

Figure 13.8. Interrupt during execution of a system function

Suppose, for example, that a peripheral process calls a read function from a terminal connected to the central processor and pauses its work while the satellite process performs the function (Figure 13.8). If the user presses the break key, the CPU core sends a signal to the satellite process. If the satellite was in a suspended state, waiting for a portion of data from the terminal, it immediately exits this state and terminates the read function. In its response to a request from a peripheral process, the satellite provides an error code and signal number corresponding to the interrupt. The peripheral process analyzes the response and, since the message says an interrupt has arrived, sends the signal to itself. Before exiting the read function, the peripheral core checks for signaling, detects an interrupt signal from the satellite process, and processes it as usual. If, as a result of receiving an interrupt signal, the peripheral process terminates its work using the exit function, this function takes care of killing the satellite process. If the peripheral process intercepts interrupt signals, it calls the user-defined signal-handling function and returns an error code to the user upon exiting the read function. On the other hand, if the satellite executes the stat system function on behalf of the peripheral process, it will not interrupt its execution when it receives a signal (the stat function is guaranteed to exit any pause, since it has a limited resource wait time). The satellite completes the execution of the function and returns the signal number to the peripheral process. The peripheral process sends a signal to itself and receives it at the exit from the system function.

If a signal occurs on the peripheral processor during the execution of a system function, the peripheral process will be in the dark as to whether it will soon return to control from the satellite process or the latter will go into a suspended state indefinitely. The peripheral process sends a special message to the satellite, informing it of the occurrence of a signal. The core on the CPU decrypts the message and sends a signal to the satellite, the reaction of which to receiving the signal is described in the previous paragraphs (abnormal termination of the function execution or its completion). The peripheral process cannot send a message to the satellite directly because the satellite is busy performing a system function and is not reading data from the communication line.

Referring to the read example, it should be noted that the peripheral process has no idea whether its companion is waiting for input from the terminal or is performing other actions. The peripheral process sends a signal message to the satellite: if the satellite is in a suspended state with an interruptable priority, it immediately exits this state and terminates the system function; otherwise, the function is carried forward to successful completion.

Finally, consider the case of the arrival of a signal at a time not associated with the execution of a system function. If a signal originated on another processor, the satellite receives it first and sends a signal message to the peripheral process, whether the signal concerns the peripheral process or not. The peripheral core decrypts the message and sends a signal to the process, which reacts to it in the usual way. If the signal originated on the peripheral processor, the process performs standard actions without resorting to the services of its satellite.

When a peripheral process sends a signal to other peripheral processes, it encodes a kill call message and sends it to the satellite process, which executes the called function locally. If some of the processes for which the signal is intended is located on other peripheral processors, their satellites will receive the signal (and react to it as described above).

13.2 COMMUNICATION TYPE NEWCASTLE

In the previous section, we considered a type of tightly coupled system, which is characterized by sending all calls to the functions of the file management subsystem that arise on the peripheral processor to a remote (central) processor. We now turn to the consideration of systems with a weaker connection, which consist of machines that make calls to files located on other machines. In a network of personal computers and workstations, for example, users often access files located on a large machine. In the next two sections, we will look at system configurations in which all system functions are performed in local subsystems, but at the same time it is possible to access files (through the functions of the file management subsystem) located on other machines.

These systems use one of the following two paths to identify deleted files. On some systems, a special character is added to the composite filename: the name component that precedes this character identifies the machine, the rest of the name is the file on that machine. So, for example, the distinguished name

"sftig! / fs1 / mjb / rje"

identifies the file "/ fs1 / mjb / rje" on the machine "sftig". This file identification scheme follows the uucp convention for transferring files between UNIX-like systems. In another scheme, deleted files are identified by adding a special prefix to the name, for example:

/../sftig/fs1/mjb/rje

where "/../" is a prefix indicating that the file is deleted; the second component of the filename is the name of the remote machine. This scheme uses the familiar UNIX file name syntax, so unlike the first scheme, user programs do not need to adapt to the use of names with unusual construction (see).

Figure 13.9. Formulating requests to the file server (processor)

We will devote the rest of this section to a model of a system using a Newcastle link, in which the kernel is not concerned with recognizing deleted files; this function is completely assigned to the subroutines from the standard C library, which in this case play the role of the system interface. These routines analyze the first component of the file name, which in both of the described identification methods contains a sign of the file's remoteness. This is a departure from routine in which library routines do not parse filenames. Figure 13.9 shows how requests to a file server are formulated. If the file is local, the local system kernel processes the request normally. Consider the opposite case:

open ("/../ sftig / fs1 / mjb / rje / file", O_RDONLY);

The open subroutine in the C library parses the first two components of the distinguished filename and knows to look for the file on the remote machine "sftig". To have information about whether the process previously had a connection with a given machine, the subroutine starts a special structure in which it remembers this fact, and in case of a negative answer, establishes a connection with the file server running on the remote machine. When the process formulates its first request for remote processing, the remote server confirms the request, if necessary, records in the fields of the user and group identification codes and creates a satellite process that will act on behalf of the client process.

To fulfill client requests, the satellite must have the same file permissions on the remote machine as the client. In other words, the user "mjb" must have the same access rights to both remote and local files. Unfortunately, it is possible that the "mjb" client identification code may coincide with the identification code of another client on the remote machine. Thus, system administrators on machines running on the network should either ensure that each user is assigned an identification code that is unique to the entire network, or perform code conversion at the time of formulating a network service request. If this is not done, the companion process will have the rights of another client on the remote machine.

A more delicate issue is obtaining superuser rights in relation to working with remote files. On the one hand, the superuser client should not have the same rights over the remote system so as not to mislead the remote system's security controls. On the other hand, some of the programs, if they are not granted superuser rights, simply will not be able to work. An example of such a program is the mkdir program (see Chapter 7), which creates a new directory. The remote system would not allow the client to create a new directory because superuser rights are not in effect on deletion. The problem of creating remote directories serves as a serious reason for revising the system function mkdir in the direction of expanding its capabilities in automatically establishing all connections necessary for the user. However, it is still a common problem that setuid programs (such as the mkdir program) gain superuser privileges over remote files. Perhaps the best solution to this problem would be to set additional characteristics for files that describe access to them by remote superusers; unfortunately, this would require changes to the disk index structure (in terms of adding new fields) and would create too much mess in existing systems.

If the open subroutine succeeds, the local library leaves a corresponding note about this in a user-accessible structure containing the address of the network node, the process ID of the satellite process, the file descriptor, and other similar information. The library routines read and write determine, based on the file descriptor, whether the file is deleted, and if so, send a message to the satellite. The client process interacts with its companion in all cases of accessing system functions that require the services of a remote machine. If a process accesses two files located on the same remote machine, it uses one satellite, but if the files are located on different machines, two satellites are already used: one on each machine. Two satellites are also used when two processes access a file on a remote machine. By invoking the system function via satellite, the process generates a message that includes the function number, the name of the search path and other necessary information, similar to that included in the message structure in the system with peripheral processors.

The mechanism for performing operations on the current directory is more complex. When the process selects a remote directory as the current one, the library routine sends a message to the satellite, which changes the current directory, and the routine remembers that the directory is deleted. In all cases where the search path name begins with a character other than a forward slash (/), the subroutine sends the name to the remote machine, where the satellite process routes it from the current directory. If the current directory is local, the routine simply passes the search path name to the local system kernel. The system chroot function on a remote directory is similar, but it goes unnoticed for the local kernel; strictly speaking, the process can ignore this operation, since only the library records its execution.

When a process calls fork, the appropriate library routine sends messages to each satellite. Satellites processes branch out and send their child ids to the parent client. The client process runs the fork system function, which transfers control to the child it spawns; the local child is in dialogue with the remote satellite child whose addresses are stored by the library routine. This interpretation of the fork function makes it easier for the satellite processes to control open files and current directories. When the process working with remote files exits (by calling the exit function), the subroutine sends messages to all of its remote satellites so that they do the same when they receive the message. Certain aspects of the implementation of the exec and exit system functions are discussed in the exercises.

The advantage of a Newcastle link is that a process's access to remote files becomes transparent (invisible to the user), without the need to make any changes to the system kernel. However, this development has a number of disadvantages. First of all, during its implementation, a decrease in system performance is possible. Due to the use of the extended C library, the size of memory used by each process increases, even if the process does not access remote files; the library duplicates kernel functions and requires more memory space. Increasing the size of processes lengthens the startup period and can create more contention for memory resources, creating conditions for more frequent unloading and paging of tasks. Local requests will be executed more slowly due to the increase in the duration of each call to the kernel, and the processing of remote requests can also be slowed down, the cost of sending them over the network increases. Additional processing of remote requests at the user level increases the number of context switches, unloading and swapping operations. Finally, in order to access remote files, programs must be recompiled using the new libraries; old programs and delivered object modules will not be able to work with remote files without it. All of these disadvantages are absent from the system described in the next section.

13.3 "TRANSPARENT" DISTRIBUTED FILE SYSTEMS

The term "transparent allocation" means that users on one machine can access files on another machine without realizing that they are crossing machine boundaries, just as they are on their machine when they are switching from one file system to another traverse the mount points. The names by which processes refer to files located on remote machines are similar to the names of local files: there are no distinctive characters in them. In the configuration shown in Figure 13.10, the directory "/ usr / src" belonging to machine B is "mounted" in the directory "/ usr / src" belonging to machine A. the same system source code, traditionally found in the "/ usr / src" directory. Users running on machine A can access files located on machine B using the familiar syntax of writing file names (for example: "/usr/src/cmd/login.c"), and the kernel itself decides whether the file is remote or local. Users running on machine B have access to their local files (unaware that users of machine A can access the same files), but, in turn, do not have access to files located on machine A. Of course , other options are possible, in particular, those in which all remote systems are mounted at the root of the local system, so that users can access all files on all systems.

Figure 13.10. File systems after remote mount

The similarities between mounting local filesystems and allowing access to remote filesystems have prompted the adaptation of the mount function to remote filesystems. In this case, the kernel has an extended format mount table at its disposal. Executing the mount function, the kernel organizes a network connection with a remote machine and stores information characterizing this connection in the mount table.

An interesting problem has to do with path names that include "..". If a process makes the current directory from a remote filesystem, then using ".." characters in the name will more likely return the process to the local filesystem rather than access files above the current directory. Returning again to Figure 13.10, note that when a process belonging to machine A, having previously selected the current directory "/ usr / src / cmd" located in the remote file system, will execute the command

the current directory will be the root directory of machine A, not machine B. The namei algorithm running in the kernel of the remote system, after receiving the sequence of characters "..", checks whether the calling process is an agent of the client process, and if so, sets, Whether the client is treating the current working directory as the root of the remote file system.

Communication with a remote machine takes one of two forms: a remote procedure call or a remote system function call. In the first form, each kernel procedure dealing with indexes checks to see if the index points to a remote file, and if so, sends a request to the remote machine to perform the specified operation. This scheme naturally fits into the abstract structure of support for file systems of various types, described in the final part of Chapter 5. Thus, an access to a remote file can initiate the transfer of several messages over the network, the number of which is determined by the number of implied operations on the file, with a corresponding increase in the response time to the request, taking into account the waiting time accepted in the network. Each set of remote operations includes at least actions for index locking, reference counting, etc. In order to improve the model, various optimization solutions were proposed related to combining several operations into one query (message) and buffering the most important data (cm. ).

Figure 13.11. Opening a remote file

Consider a process that opens a remote file "/usr/src/cmd/login.c", where "src" is the mount point. By parsing the file name (using the namei-iget scheme), the kernel detects that the file is deleted and sends a request to the host machine to get the locked index. Having received the desired answer, the local kernel creates a copy of the index in memory corresponding to the remote file. Then the kernel checks for the necessary access rights to the file (for reading, for example) by sending another message to the remote machine. The open algorithm continues in full accordance with the plan outlined in Chapter 5, sending messages to the remote machine as needed, until the algorithm is complete and the index is freed. The relationship between the kernel data structures upon completion of the open algorithm is shown in Figure 13.11.

If the client calls the read system function, the client kernel locks the local index, issues a lock on the remote index, a read request, copies the data to local memory, issues a request to free the remote index, and frees the local index. This scheme is consistent with the semantics of the existing uniprocessor kernel, but the frequency of network use (multiple calls to each system function) reduces the performance of the entire system. However, to reduce the flow of messages on the network, multiple operations can be combined into a single request. In the example with the read function, the client can send the server one general "read" request, and the server itself decides to grab and release the index when it is executed. Reducing network traffic can also be achieved by using remote buffers (as we discussed above), but care must be taken to ensure that the system file functions using these buffers are executed properly.

In the second form of communication with a remote machine (a call to a remote system function), the local kernel detects that the system function is related to a remote file and sends the parameters specified in its call to the remote system, which executes the function and returns the results to the client. The client machine receives the results of the function execution and exits the call state. Most of the system functions can be performed using only one network request and receiving a response after a reasonable time, but not all functions fit into this model. So, for example, upon receiving certain signals, the kernel creates a file for the process called "core" (Chapter 7). The creation of this file is not associated with a specific system function, but ends up performing several operations such as creating a file, checking permissions, and performing a series of writes.

In the case of the open system function, the request to execute the function sent to the remote machine includes the part of the file name left after excluding the search path name components that distinguish the remote file, as well as various flags. In the earlier example of opening the file "/usr/src/cmd/login.c", the kernel sends the name "cmd / login.c" to the remote machine. The message also includes credentials such as user and group identification codes, which are required to verify file permissions on a remote machine. If a response is received from the remote machine indicating a successful open function, the local kernel fetches a free index in the memory of the local machine and marks it as the remote file index, stores information about the remote machine and the remote index, and routinely allocates a new entry in the file table. Compared to the real index on the remote machine, the index owned by the local machine is formal and does not violate the configuration of the model, which is broadly the same as the configuration used when calling the remote procedure (Figure 13.11). If a function called by a process accesses a remote file by its descriptor, the local kernel knows from the (local) index that the file is remote, formulates a request that includes the called function, and sends it to the remote machine. The request contains a pointer to the remote index by which the satellite process can identify the remote file itself.

Having received the result of executing any system function, the kernel can resort to the services of a special program to process it (upon completion of which the kernel will finish working with the function), because the local processing of results used in a uniprocessor system is not always suitable for a system with several processors. As a result, changes in the semantics of system algorithms are possible, aimed at providing support for the execution of remote system functions. However, at the same time, a minimum flow of messages circulates in the network, ensuring the minimum response time of the system to incoming requests.

13.4 DISTRIBUTED MODEL WITHOUT TRANSFER PROCESSES

The use of transfer processes (satellite processes) in a transparent distributed system makes it easier to keep track of deleted files, but the process table of the remote system is overloaded with satellite processes that are idle most of the time. In other schemes, special server processes are used to process remote requests (see and). The remote system has a set (pool) of server processes that it assigns from time to time to process incoming remote requests. After processing the request, the server process returns to the pool and enters a state ready to process other requests. The server does not save the user context between two calls, because it can process requests from several processes at once. Therefore, each message arriving from a client process must include information about its execution environment, namely: user identification codes, current directory, signals, etc. functions.

When a process opens a remote file, the remote kernel assigns an index for subsequent links to the file. The local machine maintains a custom file descriptor table, a file table, and an index table with a regular set of records, with an index table entry identifying the remote machine and the remote index. In cases where a system function (for example, read) uses a file descriptor, the kernel sends a message pointing to the previously assigned remote index and transfers process-related information: user identification code, maximum file size, etc. the machine has a server process at its disposal, the interaction with the client takes the form described earlier, however, the connection between the client and the server is established only for the duration of the system function.

Using servers instead of satellite processes can make managing data traffic, signals, and remote devices more difficult. Large numbers of requests to a remote machine in the absence of a sufficient number of servers should be queued up. This requires a higher layer protocol than the one used on the main network. In the satellite model, on the other hand, oversaturation is eliminated because all client requests are processed synchronously. A client can have at most one request pending.

Processing of signals that interrupt the execution of a system function is also complicated when using servers, since the remote machine has to search for the appropriate server serving the execution of the function. It is even possible that, due to the busyness of all servers, a request for a system function is pending processing. Conditions for the emergence of competition also arise when the server returns the result of the system function to the calling process and the server's response includes sending a corresponding signaling message through the network. Each message must be marked so that the remote system can recognize it and, if necessary, terminate the server processes. When using satellites, the process that handles the fulfillment of the client's request is automatically identified, and in the event of a signal arriving, it is not difficult to check whether the request has been processed or not.

Finally, if a system function called by the client causes the server to pause indefinitely (for example, when reading data from a remote terminal), the server cannot process other requests to free up the server pool. If several processes access remote devices at once and if the number of servers is limited from above, there is a quite tangible bottleneck. This does not happen with satellites, since a satellite is allocated to each client process. Another problem with using servers for remote devices will be covered in Exercise 13.14.

Despite the advantages that the use of satellite processes provides, the need for free entries in the process table in practice becomes so acute that in most cases, the services of server processes are still used to process remote requests.

Figure 13.12. Conceptual diagram of interaction with remote files at the kernel level

13.5 CONCLUSIONS

In this chapter, we have considered three schemes for working with files located on remote machines, treating remote file systems as an extension of the local one. The architectural differences between these layouts are shown in Figure 13.12. All of them, in turn, differ from the multiprocessor systems described in the previous chapter in that the processors here do not share physical memory. A peripheral processor system consists of a tightly coupled set of processors that share the file resources of the central processor. A connection of the Newcastle type provides hidden ("transparent") access to remote files, but not by means of the operating system kernel, but through the use of a special C library. For this reason, all programs that intend to use this type of link must be recompiled, which, in general, is a serious drawback of this scheme. The remoteness of a file is indicated using a special sequence of characters describing the machine on which the file is located, and this is another factor limiting the portability of programs.

In transparent distributed systems, a modification of the mount system function is used to access remote files. Indexes on the local system are marked as remote files, and the local kernel sends a message to the remote system describing the requested system function, its parameters, and the remote index. Communication in a "transparent" distributed system is supported in two forms: in the form of a call to a remote procedure (a message is sent to the remote machine containing a list of operations associated with the index) and in the form of a call to a remote system function (the message describes the requested function). The final part of the chapter discusses issues related to the processing of remote requests using satellite processes and servers.

13.6 EXERCISES

*1. Describe the implementation of the exit system function in a system with peripheral processors. What is the difference between this case and when the process exits upon receiving an uncaught signal? How should the kernel dump the contents of memory?

2. Processes cannot ignore SIGKILL signals; Explain what happens in the peripheral system when the process receives such a signal.

* 3. Describe the implementation of the exec system function on a system with peripheral processors.

*4. How should the central processor distribute processes among the peripheral processors in order to balance the overall load?

*5. What happens if the peripheral processor does not have enough memory to accommodate all the processes offloaded to it? How should the unloading and swapping of processes in the network be done?

6. Consider a system in which requests to a remote file server are sent if a special prefix is found in the file name. Let the process call execl ("/../ sftig / bin / sh", "sh", 0); The executable is on a remote machine, but must be running on the local system. Explain how the remote module is migrated to the local system.

7. If the administrator needs to add new machines to an existing system with a connection like Newcastle, then what is the best way to inform the C library modules about this?

*eight. During the execution of the exec function, the kernel overwrites the address space of the process, including the library tables used by the Newcastle link to track links to remote files. After executing the function, the process must retain the ability to access these files by their old descriptors. Describe the implementation of this point.

*nine. As shown in section 13.2, calling the exit system function on systems with a Newcastle connection results in a message being sent to the companion process, forcing the latter to terminate. This is done at the level of library routines. What happens when a local process receives a signal that tells it to exit in kernel mode?

*ten. In a system with a Newcastle link, where remote files are identified by prefixing the name with a special prefix, how can a user, specifying ".." (parent directory) as the filename component, traverse the remote mount point?

11. We know from Chapter 7 that various signals cause the process to dump the contents of memory into the current directory. What should happen if the current directory is from the remote file system? What answer would you give if the system uses a relationship like Newcastle?

*12. What implications for local processes would it have if all satellite or server processes were removed from the system?

*13. Consider how to implement the link algorithm in a transparent distributed system, the parameters of which can be two remote file names, as well as the exec algorithm, associated with performing several internal read operations. Consider two forms of communication: a remote procedure call and a remote system function call.

*fourteen. When accessing the device, the server process can enter the suspended state, from which it will be taken out by the device driver. Naturally, if the number of servers is limited, the system will no longer be able to satisfy the requests of the local machine. Come up with a reliable scheme whereby not all server processes are suspended while waiting for device-related I / O to complete. The system function will not terminate while all servers are busy.

Figure 13.13. Terminal Server Configuration

*15. When a user logs into the system, the terminal line discipline stores the information that the terminal is an operator terminal leading a group of processes. For this reason, when the user presses the "break" key on the terminal keyboard, all processes in the group receive the interrupt signal. Consider a system configuration in which all terminals are physically connected to one machine, but user registration is logically implemented on other machines (Figure 13.13). In each case, the system creates a getty process for the remote terminal. If requests to a remote system are processed by a set of server processes, note that when the open procedure is executed, the server stops waiting for a connection. When the open function completes, the server goes back to the server pool, severing its connection to the terminal. How is an interrupt signal triggered by pressing the "break" key sent to the addresses of processes included in the same group?

*16. Sharing memory is a feature inherent in local machines. From a logical point of view, the allocation of a common area of physical memory (local or remote) can be carried out for processes belonging to different machines. Describe the implementation of this point.

* 17. The process paging and paging algorithms discussed in Chapter 9 assume the use of a local pager. What changes should be made to these algorithms in order to be able to support remote offloading devices?

*eighteen. Suppose that the remote machine (or network) experiences a fatal crash and the local network layer protocol records this fact. Develop a recovery scheme for a local system making requests to a remote server. In addition, develop a recovery scheme for a server system that has lost contact with clients.

*19. When a process accesses a remote file, it is possible that the process will traverse multiple machines in search of the file. Take the name "/ usr / src / uts / 3b2 / os" as an example, where "/ usr" is the directory belonging to machine A, "/ usr / src" is the mount point of the root of machine B, "/ usr / src / uts / 3b2 "is the mount point of the root of machine C. Walking through multiple machines to its final destination is called a multihop. However, if there is a direct network connection between machines A and C, sending data through machine B would be inefficient. Describe the features of the implementation of "multishopping" in a system with a Newcastle connection and in a "transparent" distributed system.

based on L-Net reconfigurable multi-pipeline computing environment

One of the urgent tasks in the field of control systems is the development of software for distributed fault-tolerant control systems. The solutions existing in this area today are proprietary, as a result, expensive and not always effective.

These solutions do not provide for the efficient use of resources of redundant bases, technical and software, which negatively affects both the fault tolerance and scalability of such solutions. If the network architecture is violated, there is no possibility of dynamic reconfiguration of both information processing processes and the transmission of data streams (both control and information). The use of specific microcontrollers, the use of DCS / SCADA complicates the development and support of systems, the expansion of their functionality.

Distributed control system architecture

The generalized typical architecture of a distributed control system (DCS) includes three hierarchically related levels: operator level, control level and I / O level (see Fig. 1).

The main task of the operator level is to provide a human-machine interface (HMI) for setting up and monitoring the functioning of the entire system. The control level is responsible for receiving and processing data from sensors, transmitting data to the operator level and generating control actions on the actuators. The I / O level represents sensors and actuators directly connected to the controlled object.

The task of the software, within the framework of the generalized architecture of the DCS, is to ensure the functioning of the operator level and its connection with the control level of the system. Consequently, the fundamental level in software design and solving issues of its interaction with hardware is the operator's one. The software should make the most efficient use of the available system hardware resources while still being as independent of the internal architecture of the hardware as possible.

Hardware provides computing resources, memory, and communication media between nodes in a system. When designing the general architecture of the system, the specific nodes of the I / O level that will be connected to it in its specific implementation are not considered; therefore, the operator level and the control level are considered in the general architecture. Hardware must be widespread, comply with modern standards, and have all the properties and capabilities necessary to implement the architecture.

DCS requirements

The requirements for the DCS apply not only to the system as a whole, but also to its hardware and software components separately, since specific approaches to meeting these requirements for these components can be fundamentally different. The DCS should be, first of all, fault-tolerant. The simplest method of increasing fault tolerance is redundancy (duplication) of functional units or their aggregate. The second important property is scalability. Scalability is based on the implementation of special algorithms in software and the hardware ability to replace and add new nodes or their component parts. At the same time, the system should remain simple for its operation, development of new nodes or modules, and modification of its architecture.

DCS Architectures Overview

For the review of DCS architectures, we chose Siemens SIMATIC PCS 7 DCS as one of the most demanded on the market and RTS S3 as DCS implemented on the basis of QNX RTOS.

Siemens SIMATIC PCS 7

The system architecture has all the properties of a generic DCS architecture. Operator stations are computers based on x86 processor architecture with Windows operating system and Siemens WinCC package, which provides an HMI. There are servers with databases. Operator stations, engineering stations and servers are linked by an Ethernet-based local area network. The operator level is linked to the control plane of the redundant Industrial Ethernet network. At the control level there are programmable logic controllers (PLC) with the possibility of redundancy due to duplication of functionality. It is possible to connect to external systems and networks and organize remote access to the system.

RTS S3

This architecture is similarly composed of the layers of the generalized structure of the DCS. The operator stations are based on the same hardware platform as in the SIMATIC DCS, but can be run under both Windows and Linux operating systems. Engineering stations are combined with operator stations. The system provides a unified application development environment. Ethernet connects nodes within the carrier layer and the operator level itself to the control plane using the TCP / IP protocol stack. At the control level there are industrial computers running QNX OS with their own database and the possibility of redundancy by duplicating the functionality of the node.

Disadvantages of the described systems

The systems described above use a different hardware / software platform for the operator level and the control plane. Within the operator level, only one processor architecture can be used, and a special engineering station is required to configure and develop the control level. These DCSs offer only hardware redundancy with duplication of the functionality of the redundant node as a way to increase fault tolerance, which is an irrational use of redundant hardware.

Characteristics and functional features of the L-Net system

When developing the L-Net system, the task was to create a control system that would have the following characteristics:

Dynamic reconfiguration with full recovery with minimal loss in the event of host failure or network topology disruption.
Efficient distribution of tasks among the available efficient network nodes.
Duplication of communication channels between nodes with dynamic reconfiguration of data transmission streams.
Ease of use and scalability of the system.
Portability and performance of the system on any hardware platform designed for building control systems and embedded systems.

To build a system with the above characteristics, an operating system is required, which is primarily intended for creating control systems and embedded systems. Analysis of existing operating systems showed that the most suitable operating system is QNX 6 (Neutrino), which has very efficient resource allocation and network capabilities. Wide networking capabilities are provided by the Qnet network protocol. It solves the problem of reliability and dynamic load balancing of communication channels, but it does not solve the problem of fault tolerance of the system as a whole. As a result, an innovative control system was developed based on a distributed reconfigurable multi-pipeline computing environment. The developed system has a peer-to-peer architecture that includes three logical blocks: an input-output block, a general-purpose switch block, and a reconfigurable computing environment (RCS) block (see Fig. 2).

The main advantages of this architecture are:

Peer-to-peer type
Decentralization
Scalability
Spatial distribution

Functional features of this architecture:

Pipelined data processing
Hardware redundancy
Load distribution
On-the-fly reconfiguration

At the first level of the architecture there is an input-output (I / O) unit, which includes: input-output nodes, a switch of input-output nodes, an input-output interface, sensors and actuators. The unit is responsible for the basic mechanisms for generating control actions based on data from local sensors and data received from other levels of the control system. The assigned tasks are distributed among the healthy I / O nodes based on their current relative performance or manually by the operator. Sensors and actuators are connected via a bus to all I / O nodes in the block, which allows any node to interrogate any sensor or generate an effect on any actuator. The I / O node switch provides communication between all I / O nodes to exchange data between them and other levels of the system architecture to obtain control and information data. With the appropriate hardware capabilities, the nodes communicate with each other and with nodes and switches at other levels of the system directly, which reduces the response time in the network. Direct communication between nodes and a certain load of nodes in the current mode of operation of the I / O block allows organizing pipeline calculations in the block, which are necessary for the operation of this block without resorting to external computing power of the control system (DCS), which makes it possible to effectively use free resources provided for redundancy nodes of the I / O block at the time of failure.

The block of general-purpose switches, located at the second level of the architecture, organizes communication lines between the input-output blocks and the DCS and external systems. Each switch can interconnect various bonds and switches in the entire control system. The number of communication lines is determined by the hardware capabilities of the nodes and switches included in the blocks. Since the Qnet network allows you to dynamically distribute data flows, scaling of this block is carried out by simply connecting new devices and does not require configuration, and if one of the switches fails, the data transfer between the nodes will not be interrupted if the other switch provides a similar connection between the nodes or they are directly related. In this case, it is necessary to take care of sufficient network bandwidth required to back up a failed switch.

The reconfigurable computer network (RCN) block, located at the third level of the architecture, provides a high computing power management system for solving complex problems of information processing, decision-making, recognition, etc. The block is responsible for initializing the entire control system: checking the operability of switches and nodes, network integrity, building network graphs of the entire system, setting the starting parameters for the operation of input-output blocks. The nodes of this block provide for archiving both their own data and data from I / O blocks. Each node of this block can play the role of an operator's machine designed to monitor the operation of the system and make adjustments to the work programs of both this node and all nodes of the system, and perform reconfiguration on demand.

Load distribution

One of the main tasks of the L-Net system is the distribution of the computational load on the network nodes. The solution to this problem is based on the construction of computational pipelines. To build a computational pipeline, a task graph is preliminarily constructed - a scheme for exchanging data streams from a source to a receiver. Sensors act as a source, and actuators act as a recipient. The computational pipeline itself is a mapping of the task graph (see Fig. 3) onto the computer network graph (see Fig. 4), taking into account the requirements of the problem to the computational resources of the system and its current state.

The solution is to use a service that provides the recipient with comprehensive information about the current hardware, its state and available data sources, which performs work with network graphs and tasks. As a result, the performance increases due to the pipelining of computations and the rational use of all computational resources available to the system is organized.

fault tolerance

The main problem in the functioning of such a system is the complete disruption of the computational pipelines in the event of failure of any node of this conveyor or in violation of data transfer between them. The basic means of the Qnet protocol achieve the restoration of connections between nodes in case of their partial violation due to the backup lines provided by the architecture. The L-Net system solves the problem of restoring operability in the event of a complete failure of the host of the computing system by dynamically reconfiguring the computing pipeline, i.e. using working resources to replace the bad block. The system provides three scenarios of recovery (reconfiguration), which differ in response time to the fact of failure, recovery time and used hardware resources: upon failure, with passive readiness, with active readiness.

Reconfiguration upon failure- after a failure is detected, the search for available hardware is performed and its inclusion in the task graph.
Reconfiguration with passive readiness- the redundant hardware is determined in advance, a process is started that ensures the implementation of the vertex of the task graph on the node, connections are established, but the process does not process the data unless the main node fails.
Reconfiguration with active readiness- the top of the task graph is implemented on several nodes, which in parallel perform data processing and transmit the result.

As a result, the system provides flexible readiness for failures both at the software and hardware levels, the ability to change the configuration of nodes without interrupting work and losing performance regardless of the implementation of the network, computational pipeline and node.

Conclusion

The developed L-Net system, in contrast to the existing analogs, presupposes the use of a wide range of hardware characteristics of DCS nodes with their full software compatibility. When nodes work under the control of one operating system (QNX Neutrino), it is possible to build them on various processor architectures (x86, ARM, MIPS, etc.) with a variety of interfaces and peripheral devices. Implementation of the nodes is possible in the form of desktop, industrial PCs, wearable PCs and single-board computers. All components of the software package of the developed DCS can be launched on any of its nodes with the QNX OS, while it remains possible to use nodes with a different operating system. This approach allows each node to be used to solve both operator-level and control-level tasks. Consequently, there is a flexible system of interaction between peers without a rigid hierarchy of levels inherent in the generalized DCS architecture and systems using this architecture as a base. Peer-to-peer network simplifies the processes of deploying, operating, scaling, and debugging a system.

To realize the computational potential of the redundant hardware in the developed system, algorithms for dynamic configuration and reconfiguration based on the Qnet network protocol and the L-Net network software are proposed. The dynamic configuration algorithm is based on distributing the computational load across all nodes by pipelining and parallelizing tasks and dynamically balancing the load on data transmission channels between nodes. The system reconfiguration algorithm assumes the presence of three scenarios for restoring operability in case of failure, depending on the available hardware, priorities and tasks assigned to the system: upon failure, with passive readiness (resource allocation) and with active readiness (resource use). Dynamic configuration and reconfiguration algorithms improve performance and reliability using the hardware reserves in the system.

An important advantage of the system is the maximum transparency of both hardware and software technologies used in it, which makes it possible to seriously simplify the technical support of the system and the development of new modules for it.

Output

The developed architectural solutions make it possible to increase such indicators of distributed control systems as reliability, performance, cost, scalability and simplicity due to the possibility of using a wide range of hardware, the implementation of dynamic configuration algorithms and rational use of system resources.

http://kazanets.narod.ru/DCSIntro.htm.
http://kazanets.narod.ru/PCS7Overview.htm.
http://www.rts.ua/rus/news/678/0/409.
Zyl S. QNX Momentics: Application Basics. - SPb: BHV-Petersburg, 2005.
Krten R. Introduction to QNX Neutrino. A guide to developing real-time applications. - SPb: BHV-Petersburg, 2011.

Keywords: distributed control system, information support for control systems, distributed reconfigurable systems.

Architecture of a distributed control system based on reconfigurable multi-pipeline computing environment L-Net

Sergey Yu. Potomskiy, Assistant Professor of National Research University "Higher School of Economics".

Nikita A. Poloyko, Fifth-year student of National Research University "Higher School of Economics". Study assistant. Programmer. Field of training: "Control and informatics in the technical systems".

Abstract. The article is devoted to a distributed control system based on reconfigurable multi-pipeline computing environment. The architecture of the system is given. Also, the basic characteristics and functional properties of the system are given too. The article presents a rationale for the choice of the operating system. The basic advantages of the system in comparison with existing similar developments are shown in the article.

Keywords: distributed control system, systems software support, distributed reconfigurable.

In contact with

(Site material http://se.math.spbu.ru)

Introduction.

Nowadays, virtually all large software systems are distributed. Distributed system- a system in which information processing is concentrated not on one computer, but is distributed among several computers. In the design of distributed systems, which has a lot in common with software design in general, there are still some specifics to consider.

There are six main characteristics of distributed systems.

Sharing resources. Distributed systems allow the sharing of both hardware (hard drives, printers) and software (files, compilers) resources.
Openness.It is the ability to expand the system by adding new resources.
Parallelism.In distributed systems, multiple processes can run concurrently on different computers on the network. These processes can interact while they are running.
Scalability . Under scalability the possibility of adding new properties and methods is understood.
Fault tolerance. The presence of multiple computers allows duplication of information and resistance to some hardware and software errors. Distributed systems can support partial functionality in case of error. A complete failure of the system occurs only with network errors.
Transparency.Users are provided with full access to the resources in the system, while at the same time information about the distribution of resources throughout the system is hidden from them.

Distributed systems also have a number of disadvantages.

Complexity... It is much more difficult to understand and evaluate the properties of distributed systems in general, and they are more difficult to design, test and maintain. Also, system performance depends on the speed of the network, and not on individual processors. Reallocation of resources can significantly change the speed of the system.
Security... Typically, the system can be accessed from several different machines, messages on the network can be monitored and intercepted. Therefore, in a distributed system, it is much more difficult to maintain security.
Controllability... The system can consist of different types of computers on which different versions of operating systems can be installed. Errors on one machine can propagate to other machines in an unpredictable manner.
Unpredictability ... The reaction of distributed systems to some events is unpredictable and depends on the full load of the system, its organization and network load. Since these parameters can constantly change, therefore, the response time to the request may differ significantly from the time.

From these shortcomings, you can see that when designing distributed systems, a number of problems arise that must be considered by developers.

Resource identification ... Resources in distributed systems are located on different computers, so the resource naming system should be thought of so that users can easily access and refer to the resources they need. An example is the URL (Uniform Resource Locator) system, which defines the names of Web pages.
Communication... The universal operability of the Internet and the efficient implementation of TCP / IP protocols in the Internet for most distributed systems are examples of the most effective way of organizing communication between computers. However, in some cases where special performance or reliability is required, it is possible to use specialized tools.
System service quality ... This parameter reflects performance, health, and reliability. A number of factors affect the quality of service: the distribution of processes, resources, hardware, and the adaptability of the system.
Software architecture ... The software architecture describes the distribution of system functions among the components of the system, as well as the distribution of these components among the processors. If you need to maintain a high quality system service, choosing the right architecture is critical.

The challenge for distributed systems designers is to design software and hardware to provide all the required characteristics of a distributed system. This requires knowing the advantages and disadvantages of various distributed systems architectures. There are three types of distributed system architectures.

Client / server architecture ... In this model, the system can be thought of as a set of services provided by servers to clients. In such systems, the servers and clients differ significantly from each other.
Three-tier architecture ... In this model, the server does not provide services to clients directly, but through the business logic server.

It has been said more than once about the first two models, let's dwell on the third in more detail.

Distributed Object Architecture ... In this case, there are no differences between servers and clients, and the system can be thought of as a set of interacting objects, the location of which does not really matter. There is no distinction between the service provider and their users.

This architecture is widely used today and is also called web services architecture. A web service is an application that is accessible over the Internet and provides some services, the form of which is independent of the provider (since the universal data format - XML is used) and the platform of operation. Currently, there are three different technologies that support the concept of distributed object systems. These are EJB, CORBA and DCOM technologies.

First, a few words about what XML is in general. XML is a generic data format used to provide Web services. Web services are based on open standards and protocols: SOAP, UDDI, and WSDL.

SOAP ( The Simple Object Access Protocol, developed by the W3C, defines the format for requests to Web services. Messages between a Web service and its user are packaged in so-called SOAP envelopes (sometimes also called XML envelopes). The message itself can contain either a request to perform some action, or a response - the result of this action.
WSDL (Web Service Description Language).The Web service interface is described in WSDL documents (and WSDL is a subset of XML). Before deploying a service, the developer composes its description in WSDL language, specifies the address of the Web service, supported protocols, a list of allowed operations, and request and response formats.
UDDI (Universal Description, Discovery and Integration) - Internet Web Services Search Protocol ( http://www.uddi.org/). It is a business registry where Web service providers register services and developers find the services they need to include in their applications.

From the talk, it may seem that Web services are the best and uncontested solution, and the only question is the choice of development tools. However, it is not. There is an alternative to Web services, the Semantic Web, which WWW creator Tim Berners-Lee talked about five years ago.

If the goal of Web services is to facilitate communication between applications, then the Semantic Web is designed to solve a much more complex problem - using metadata mechanisms to increase the efficiency of the value of information that can be found on the Web. This can be done by abandoning the document-oriented approach in favor of the object-oriented one.

Bibliography

SomervilleI. Software Engineering.
Dranitsa A. Java vs. .NET. - "Computerra", # 516.
Internet resources.