52 Neural networks in the economy. Neural networks - modern artificial intelligence, its use in the economy

UDC 004.38.032.26

O. V. Konyukhova, K. S. Lappochna

O. V. KONUKHOVA, K. S. LAPOCHKINA

The use of neural networks in the economy and the relevance of their use in the preparation of a short-term budget forecast

Application of Neural Networks in Economy and An Urgency of their Use by Drawing Up of A Short-Term Forecast Of The Budget

This article describes the use of neural networks in the economy. The process of predicting the budget of the Russian Federation and the relevance of the use of neural networks to compile a short-term budget is considered.

Keywords: economics, budget of the Russian Federation, budget prediction, neural networks, genetic algorithms.

In this article Application of Neural Networks in Economy is described. Process of Forecasting Of The Budget of The Russian Federation of Neural Networks for Drawing Up of The Short-Term Budget Is Considered.

Keywords: Economy, Budget of the Russian Federation, Budget forecasting, Neural Networks, Genetic Algorithms.

4) Automatic facilities grouping.

One of the interesting attempts to create a mechanism for rational depressive economy owned by English cybernetics Stafford Biiru. They were proposed to those who became well-known principles of management, based on neurophysiological mechanisms. Models of production systems were considered as very complex relationships between the inputs (resource threads) internal, invisible elements and outputs (results). The inputs of the models served sufficiently generalized indices, the main of which promptly reflect the amount of production of specific production, the necessary need for resources and performance. The solutions offered to effectively function by this kind of systems were taken after how all the options in this situation were found and discussed. Best solution Most of the votes participating in the discussion of managers and experts. To this end, the system has a situational room equipped with appropriate technical means. The approach to the creation of the management system proposed by S. Bir was effective for control not only by large manufacturing associations, such as the steel corporation, but also the economy of Chile 70s.

Similar principles were used in the group accounting method of arguments (Moscow State University) by Ukrainian cybernetic for modeling the economy of prosperous England. Together with economists (Parks, etc.), offered more than two hundred independent variables affecting the gross income, they were revealed by several (five to six) of the main factors, which with a high degree of accuracy determine the value of the output variable. Based on these models, various options for the economy were developed in order to increase economic growth in various savings standards, inflation levels and unemployment.

The proposed method of group accounting of arguments is based on the principle of self-organization of models of complex, in particular economic systems, and allows you to determine complex hidden dependencies in data that are not detected by standard statistical methods. This method was successfully used by A. and Ivakhnenko to assess the state of the economy and forecasting its development in countries such as the United States, United Kingdom, Bulgaria and Germany. Used a large number of independent variables (from fifty to two hundred), describing the state of the economy and affecting gross income in the studied countries. Based on the analysis of these variables using the method of group accounting of arguments, the main, significant factors were detected, with a high degree of accuracy determining the value of the output variable (gross income).

Studies in this direction have stimulating the impact on the development of neural network methods intensively used in recent times due to their ability to extract experience and knowledge from a small classified sequence. Neural networks after training on such sequences are able to solve complex informalizable tasks as experts are made based on their knowledge and intuition. These advantages become particularly significant in the conquering economy, for which the unevenness of the development rate is characterized, various inflation rates, a small duration, as well as incomplete and inconsistency of knowledge about the economic phenomena.

Work is widely known, which successfully applied the principles of self-organization of models of complex economic systems for building a neural network in solving problems of analyzing and modeling the development of the economy of Mordovia and the Penza region.

Characteristic example The successful application of neural computing in the financial sector credit risk management. As you know, before issuing a loan, banks are carried out by complex statistical calculations on the financial reliability of the borrower to assess the likelihood of their own losses from the late refund of funds. Such calculations are usually based on the assessment of credit history, the dynamics of the company's development, the stability of its basic financial indicators and many other factors. One widely well-known US bank has tried the method of neural computing and concluded that the same task on the calculations of this kind is solved faster and more accurate. For example, in one of the cases of estimation of 100 thousand bank accounts new systembuilt on the basis of neural computing, determined over 90% of potential non-payers.

Another very important area of \u200b\u200bapplication of neural computing in the financial sector prediction of the situation in the stock market. The standard approach to this task is based on a rigidly fixed set of "Rules of Games", which over time lose their effectiveness due to changes in the flow conditions on the stock exchange. In addition, systems built on the basis of this approach are too slow for situations requiring instant decision-making. That is why the main Japanese companies operating in the securities market decided to apply the method of neural computing. In the typical system based on the neural network, information was introduced by the total volume of 33 years of business activity of several organizations, including the turnover, the previous value of the shares, income levels, etc. Self-evidence at the real examples, the neural network system showed greater prediction accuracy and better speed: Compared with the statistical approach gave an improvement in performance as a whole by 19%.

One of the most advanced techniques of neural computation is genetic algorithms that imitate the evolution of living organisms. Therefore, they can be used as an optimizer of the neural network parameters. A similar system for predicting the results of contracts for long-term securities of increased reliability was developed and installed on the Sun workstation at Hill Samuel Investment Management. When modeling several bidders strategies, it achieved 57% accuracy in predicting the direction of market movement. In the insurance company TSB General Insurance (Newport) uses a similar method for predicting the level of risk when insuring private loans. This neural network is self-learning on statistical data on the state of unemployment in the country.

Despite the fact that the financial market in Russia is not yet stabilized and, arguing from a mathematical point of view, its model changes, which is due to the one hand, with the expectation of the gradual mining of the securities market and increase the share of the stock market associated with the flow of investments as domestic, so and foreign capital, and on the other - with the instability of a political course, after all, you can see the appearance of firms that need to use the statistical methods other than the traditional, as well as the appearance on the software products and computer equipment Neuropackets to emulate neural networks on the IBM series computers and even specialized neuroplates based on custom-made neurochip.

In particular, in Russia, one of the first powerful neurocomputers for financial use is already successfully operating - CNAPS PC / 128 on the basis of the 4th neurobis of Alaptive Solutions. According to the company "Tora-Center", among organizations using neural networks to solve their tasks, the Central Bank, the Ministry of Emergency Situations, the Tax Inspectorate, more than 30 banks and more than 60 financial companies. Some of these organizations have already published the results of their activities in the field of neurocomputer use.

From the foregoing it follows that it is currently the use of neural networks in the preparation of a short-term budget prediction is an urgent topic for research.

In conclusion, it should be noted that the use of neural networks in all areas of human activity, including in the field of financial applications, is moving along the growing, partly as needed and due to wide opportunities for some, due to prestiges for others and due to interesting Applications for third.

BIBLIOGRAPHY

1. the federal law RF dated 01.01.2001 (with change of 01.01.2001) "On the state forecasting and programs of the socio-economic development of the Russian Federation" [Text]

2. Bir S. Brain Firm [Text] / S. Bir. - M.: Radio and Communication, 1993. - 524 p.

3. Galyushkin, neurocomputers in financial activities [Text] /. - Novosibirsk: Science, 2002. - 215c.

4., Muller predictive models [Text] /, - Kiev: Technique, 1985. - 225 p.

5. Plogging, forecasting methods in the budget process [Text] / // Electronic magazine Corporate Finance, 2011. - № 3 (19) - P. 71 - 78.

6. Rutkovskaya M., Plinsky L. Neural networks, genetic algorithms and fuzzy systems: per. with Polish. [Text] / M. Rutkovskaya, L. Plinsky -: Hot line - Telecom, 20c.

7., Saving solutions on neural networks of optimal complexity [Text] /, // Automation and modern technologies, 1998. - № 4. - P. 38-43.

Federal State Educational Institution of Higher Professional Education " State University - Teaching and Scientific and Production Complex ", Eagle

Candidate of Technical Sciences, Associate Professor, Associate Professor of the Department "Information Systems"

E-mail: ***** @ *** RU

Lapokhna Kristina Sergeevna

Federal State Educational Institution of Higher Professional Education "State University - Training and Scientific and Production Complex", Orel

Student group 11-PI (m)

Ministry of Agriculture

RUSSIAN FEDERATION

FGBOU VPO "Voronezh State

Agrarian University. Emperor Peter I »

Department of Information Security

And modeling agroeconomic systems

Course project

on the topic : "Designing an automated information system for analyzing the efficiency of enterprises (on the example: enterprises of the Kalacheevsky district of the Voronezh region and enterprises

"OOO SP" Tribal poultry farm "Ocherodnyskoye") "

Performed: Student BF-2-7 (BE)

Maksimova A.I.

Leader: Assistant

Mistukova S.V.

Voronezh

Introduction .. 3.

1 Neural networks in the economy .. 4

1.1 Concepts and foundations of neural artificial networks .. 4

1.2 Properties and classification of neural networks .. 6

1.3 Types of neural network architectures. eight

1.4 Using neural networks in economic tasks .. 11

2 Designing an automated information system for analyzing the efficiency of enterprises (on the example of enterprises of the Kalacheevsky district of the Voronezh region and the enterprise LLC SP "Tribal poultry farming" Raidensky ". 17

2.1 Explanatory note .. 17

2.2 Designing forms of documents. eighteen

2.3 Information and logical model. 22.

2.4 Functioning Algorithm Information System .. 25

2.5 Instructions for the user. 26.

Conclusions and suggestions .. 30

List of used literature ... 32

Applications .. 33.

Introduction

Neural networks are a new and very promising computing technology that gives new approaches to the study of dynamic tasks in the economic field. Initially, neural networks opened new opportunities in the field of recognition of images, then statistical and based on the methods of artificial intelligence of decision making and solving problems in the field of economy were added to this.

The ability to model nonlinear processes, working with noble data and adaptability makes it possible to apply neural networks to solve a wide class of tasks. In the past few years, a lot has been developed based on neural networks software Systems For applications in issues such as operations in the commodity market, assessment of the bankruptcy of the bank, assessment of creditworthiness, investment control, loans.

The purpose of this course project is to develop an automated information system for analyzing the efficiency of enterprises.

When creating AIS, to analyze the efficiency of enterprises, the following tasks must be solved:

1. Consider the concept, properties, classification, types and economic use of neural networks.

2. Examine the composition and function of automated information systems; explore the theoretical foundations of the design of AIS;

3. Learning work with the main types of applied softwareused to implement AIS;

4. Design forms input, intermediate and output documents;

5. Build a information and logical model;

6. Develop an algorithm for functioning;

7. Create instructions for the user.

In the course of the course of the course project, such scientific methods are used as modeling, description, analysis, synthesis, calculating method.

Technical meansused to implement the goal - personal computer with operating windows system XP, keyboard and mouse.

AIS was developed in the MS Excel table processor. The description of the work done was carried out in text processor MS Word.

Neural networks in the economy

Fig. 13.12. Fig. 13.13. Fig. 13.14. Fig. 13.15. Fig. 13.16. Fig. 13.17. Fig. 13.18. Fig. 13.19. Fig. 13.20. Fig. 13.21. Fig. 13.22. Fig. 13.23. Fig. 13.24. Fig. 13.25. Fig. 13.26. Fig. 13.28. Overall data processing scheme

The daily practice of financial markets is in an interesting contradiction with an academic point of view, according to which changes in the prices of financial assets occur instantly, without any effort effectively reflecting all available information. The existence of hundreds of market meters, traders and stock managers, whose work is to make a profit, suggests that market participants make a certain contribution to general information. Moreover, since this work is expensive, the volume of the information provided should be significant.

The existence of hundreds of market meters, traders and stock managers in financial markets suggests that they all process financial information and make decisions.

It is more difficult to answer the question of how specifically in financial markets arises and uses information that can make a profit. Studies almost always show that no sustainable trade strategy gives constant profits, and this is, in any case, so if the costs of making transactions also take into account. It is also well known that market participants (and the whole market as a whole) can take completely different decisions on the basis of similar or even unchanged information.

Market participants in their work, apparently, are not limited to linear wealthy decision-making rules, and have several scenarios of action, and what of them is started, it is sometimes depends on external invisite signs. One of the possible approaches to multidimensional and often nonlinear information rows of the financial market is to imitate samples of the behavior of market participants, using such artificial intelligence methods as expert systems or neural networks.

A lot of effort was spent on modeling decision making processes with these methods. It turned out, however, that expert systems in difficult situations work well only when the system is inherent in the internal stationary (i.e., when each input vector there is a single response that does not change over time). Under this description to some extent, the tasks of a comprehensive classification or distribution of loans are suitable, but it seems completely unconvincing for financial markets with their continuous structural changes. In the case of financial markets, it is hardly possible to argue that it is possible to achieve complete or at least to a certain extent adequate knowledge of this subject area, while for expert systems with algorithms based on rules, this is a normal requirement.

Neural networks offer completely new promising opportunities for banks and other financial institutions, which, by the nature of their activities, have to solve problems in conditions of small priori knowledge about the environment. The nature of the financial markets is dramatically changing since due to the weakening of control, privatization and the emergence of new financial instruments, national markets merged into global, and in most sectors of the market freedom of financial transactions increased. Obviously, the foundations of risk management and income themselves could not fail to undergo changes, since the possibility of diversification and risk protection strategies have changed beyond recognition.

One of the applications of neural networks for a number of leading banks was the problem of changes in the position of the US dollar in the foreign exchange market with a large number of unchanged objective indicators. The possibilities of such an application are facilitated by the fact that there are huge bases of economic data, because complex models are always voracious on information.

Bonds and arbitration quotes are another area where the tasks of expanding and narrowing risk, the difference in interest rates and liquidity, the depth and liquidity of the market are a favorable material for powerful computing methods.

Another problem, the value of which has recently increases, is the modeling of funds between institutional investors. The fall in interest rates played a decisive role in increasing the attractiveness of open-type investment funds and index funds, and the presence of options and futures on their shares allows them to purchase them with a full or partial warranty.

Obviously, the task of optimization in conditions when the number of partial equilibrium restrictions is infinitely (for example, on the futures and cash market of any product in any sector of the market play the role of cross differences of interest rates), becomes a problem of emergency complexity, more and more outside the possibilities of any trader.

In such circumstances, traders and, therefore, any systems seeking to describe their behavior, at each moment of time will have to focus on reducing the dimensionality of the problem. It is well known such a phenomenon as a valuable paper of increased demand.

When it comes to the financial sector, it is safe to argue that the first results obtained when using neural networks are very encouraging, and research in this area needs to be developed. As it was already with expert systems, it may be necessary for several years, before financial institutions are sufficiently infected in the possibilities of neural networks and will be used to fully power.

The nature of the development in the field of neural networks is fundamentally different from expert systems: the latter are built on "if ... ..., then ...", which are developed as a result of a long-term system of system training, and progress is achieved mainly due to more successful use formal logical structures. Neural networks are based on a predominantly behavioral approach to solved task: the network "is studying at the examples" and adjusts its parameters using the so-called learning algorithms through the feedback mechanism.

Different types of artificial neurons

Artificial neuron (Fig. 13.1) is called a simple element, first the calculating the weighted amount V input values \u200b\u200bof the formula "SRC \u003d" http://hi-edu.ru/e-books/xbook725/files/13.1.gif "border \u003d" 0 " Align \u003d "absmiddle" alt \u003d "(! lang:(13.1)

Here n- the dimension of the input signals.

Then the resulting amount is compared with the threshold value (or BIAS) of the formula "src \u003d" http://hi-edu.ru/e-books/xbook725/files/18.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d "(! Lang: In a suspended amount (1), they are usually called synaptic coefficients or weights. The very balanced amount V will be called neuron's potential. The output signal then has the form F (V).

The value of the threshold barrier can be considered as another weight coefficient at a constant input signal. In this case, we are talking about Extended entrance space: neuron with n-dimensional input has n + 1 weight ratio..2.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! lang:(13.2)

Depending on the method of signal conversion and the character of the activation function, various types of neural structures occur. We will only consider deterministic neurons (In contrast Probabilistic neuronswhose condition at time t is a random function of the potential and state at the time T-1). Next, we will distinguish static neurons- those in which the signal is transmitted without delay - and dynamic, where the possibility of such delays takes into account ( "Synaps with delay").

Various types of activation function

Functions F can be different types:

Formula "src \u003d" http://hi-edu.ru/e-books/xbook725/files/20.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! Lang:, Krutizna B can be taken into account through the magnitude of the scales and thresholds, and without limiting the generality, it can be assumed to be equal to one.

It is also possible to identify neurons without saturation that take on the output continuous set of values. In the classification tasks, the output value can be determined by the threshold - when making a single solution - or be probabilistic when determining the classroom. To take into account the specifics of a specific task, various other types of activation function - Gaussian, sinusoidal, bursts (Wavelets) can be selected, etc.

Straight-link neural networks

We will consider two types of neural networks: static, which are also often called direct communication networks (Feed-Forward), and dynamic, or recurrent networks. In this section we will deal with static networks. Networks of other species will be briefly reviewed later.

The neural networks with direct communication consist of static neurons, so that the signal at the network output appears at the same time when signals are fed. The network (topology) of the network may be different. If not all components of its neurons are weekends, they say that the network contains hidden neurons. The most common type of network architecture is obtained in the case when all neurons are associated with each other (but without feedback). In specific tasks, neurons are usually grouped into layers. In fig. 13.2 shows a typical scheme of a neural network with a straight link with one hidden layer.

It is interesting to note that, according to theoretical results, neural networks with direct bond and with sigmoid functions are a universal means for approximation (approximation) of functions. More precisely, any real-valued function of several variables on a compact definition area can be accurately similar to the three-layer network. At the same time, however, we do not know how neither the size of the network, which will be required for this, no weight values. Moreover, from the proof of these results it can be seen that the number of hidden elements increases indefinitely with an increase in the accuracy of the approximation. There are direct links, indeed, can serve as a universal agent for approximation, but there is no rule that allows you to find the optimal network topology for this task.

Thus, the task of constructing a neural network is nontrivial. Questions about how much you need to take hidden layers, how many elements in each of them, how many links and which learning parameters, in the existing literature, as a rule, are considered to be relieved.

At the training stage, the synaptic coefficients are calculated in the process of solving the neural network tasks (classification, predictions of time series, etc.), in which the necessary response is determined not by the rules, but with the help of examples grouped into educational set. Such a lot consists of a number of examples with the value of the output parameter specified for each of them, which would be desirable to get. Actions that happen can be called controlled learning: "Teacher" submits on the network input vector of the source data, and on the output node reports the desired value of the result of the calculation. Controlled teaching of the neural network can be viewed as a solution to the optimization problem. Its goal is to minimize the function of errors, or residual, e on this set of examples by selecting the values \u200b\u200bof W.'s scales

Error criteria

The purpose of the minimization procedure is to find a global minimum - its achievement is called the convergence of the learning process. Since it is non-linear that depends on scales, it is impossible to obtain a solution in an analytical form, and the search for a global minimum is carried out through the iterative process - the so-called training algorithmwhich explores the surface of the residual and seeks to detect a global minimum point on it. Usually, an average quadratic error (MSE) is taken as a measure of error, which is defined as the sum of the squares of the differences between the desired value of the output of the formula "SRC \u003d" http://hi-edu.ru/e-books/xbook725/files/22.gif " Border \u003d "0" Align \u003d "absmiddle" alt \u003d "(! lang:for each example to.

example "\u003e Criterion of the maximum of believing:

example "\u003e" Epochs "). Changing weights occurs in the direction opposite to the direction of the greatest steepness for the cost function:

- user-defined parameter called The value of the gradient step or the training coefficient.

Another possible method is called stochastic gradient.

In it, weights are recalculated after each miscalculation of all examples from one learning set, and the partial function of the cost corresponding to this is used, for example, K-Mu, a variety:

subtitle "\u003e

Inverse error distribution

Consider now the most common algorithm for learning neural networks with direct connection - algorithm inverse error distribution (Backpropagation, BP), which is the development of the so-called generalized Delta Rules. This algorithm was re-opened and popularized in 1986 by Ru Melharto and McCleland from the famous group on studying parallel distributed processes in the Massachusetts Institute of Technology. In this paragraph, we will consider the mathematical essence of the algorithm in more detail. It is a gradient descent algorithm that minimizes the total quadratic error:

formula "src \u003d" http://hi-edu.ru/e-books/xbook725/files/24.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! lang:. Calculation of private derivatives is carried out according to the rules of the chain: the weight of the input of the J-th neuron, which comes from the J-R neuron, is recalculated by the formula

formula "src \u003d" http://hi-du.ru/e-books/xbook725/files/23.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! lang:- The length of the step towards the gradient.

If consider separately k-th samplethen the corresponding change in weights equals

it is calculated through similar factors from the subsequent layer, and the error is thus transmitted in the opposite direction.

For weekends, we get:

formula "src \u003d" http://hi-edu.ru/e-books/xbook725/files/25.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! lang:determined like this:

formula "src \u003d" http://hi-du.ru/e-books/xbook725/files/13.14.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! Lang:(13.14)

we get:

an example "\u003e stochastic weight variant is recalculated every time after the next sample, and in the" Epochal ", or OFF-Line version, weighs change after the miscalculation of the entire learning set.

Another frequently used method is that when determining the search direction to the current gradient, amendment is added - the presets vector of the previous step, taken with some coefficient. It can be said that the existing pulse of movement is taken into account. The final formula for changing the scales looks like this:

formula "src \u003d" http://hi-du.ru/e-books/xbook725/files/26.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! Lang:- Number in the interval (0,1), which is specified by the user.

Often the value of the subtitle "\u003e

Other learning algorithms

Finally, recently use the so-called success genetic algorithmsIn which the scales are considered as an individual subject to mutations and crossing, and the error criterion is taken as an indicator of its "quality". As the new generation is raised, the appearance of an optimal individual becomes becoming more likely.

In financial applications, the data is increasingly strong. For example, making transactions can be registered in the database with a delay, and in different cases, with different. Skipping values \u200b\u200bor incomplete information is also sometimes considered as noise: in such cases, the average or the best value is taken, and this, of course, leads to the database toocher. The incorrect definition of an object class in the recognition class is adversely affecting training - this worsens the system's ability to generalize when working with new ones (ie, not included in the number of samples) objects.

Cross confirmation

In order to eliminate the arbitrariness of the database, repeat sample methods can be applied. Consider one of these methods called cross confirmation. His idea is to randomly smash the database on Q in pairwise non-intersecting subsets. Then q is performed by training on (Q -1) set, and the error is calculated by the remaining set. If q is large enough, for example, 10, each learning uses most of the source data. If the learning procedure is reliable, then the results on q various models should be very close to each other. After that, the final characteristic is defined as the average all received error values. Unfortunately, when applying this method, the volume of computing is often very large, as it is required to do q learning, and in a real application with greater dimension it may be impossible. In the limiting case, when Q \u003d P, where R is the total number of examples, the method is called cross-confirmation with one in the residue. This evaluation method has a displacement, and a method has been developed "Folding knife", reducing this disadvantage by price even more computing.

The following class of neural networks, which we consider - dynamic or recurrent, networks. They are constructed from dynamic neurons whose behavior is described by differential or difference equations, as a rule, first order. The network is organized so that each neuron receives input information from other neurons (perhaps from itself) and from the environment. This type of networks is important, since it can be simulated nonlinear dynamic systems. This is a very general model that can potentially be used in a variety of applications, for example: associative memory, non-linear signal processing, modeling of finite automata, system identification, management tasks.

Neural networks with temporary delay

Before describing the actual dynamic networks, consider how a direct connection network is used to process temporary rows. The method is to break the time series for several segments and thus obtain a statistical sample to feed a multilayer network with direct connection. This is carried out using the so-called branched delay line (see Fig. 13.3).

The architecture of such a neural network with a temporary delay allows you to simulate any finite temporal dependence of the form:

subtitle "\u003e

Hopfield networks

With the help of recurrent hopfield networks, it is possible to process unordered (handwritten letters), ordered in time (time series) or space (graphs, grammar) samples (Fig. 13.4). The recurrent neural network of the simplest form was introduced by Hopfield; It is built from N neurons connected with each each, all the neurons are weekends.

The networks of such a design are mainly used as associative memory, as well as in the problems of nonlinear filtering data and grammatical output. In addition, they recently were used to predict and to recognize patterns in the behavior of stock prices.

The "self-organizing sign of signs" introduced by Kohonen can be considered as an option of a neural network. The network of this type is designed for selfeducation: During learning to inform her the right answers optionally. In the process of training on the entrance of the network, various samples are served. The network catches the features of their structure and separates samples on 436 clusters, and the network has already received each newly incoming example to one of the clusters, guided by some criterion of "intimacy."

The network consists of one input and one output layer. The number of elements in the output layer directly determines how many clusters network can recognize. Each of the output elements receives the entire input vector on the entrance. As in every neural network, some synoptic weight is attributed. In most cases, each output element is also connected with its neighbors. These internal connections Play an important role in the learning process, since the adjustment of the weights occurs only in the surroundings of that element that bestly responds to the next entrance.

The output elements compete with each other for the right to enter into force and "get a lesson." That of them wins, whose vector of scales will be closer to the input vector in the sense of the distance determined, for example, by the Euclidean metric. At the element-winner, this distance will be less than everyone else. At the current step of learning, changing weight is allowed only to the element - the winner (and maybe its immediate neighbors); The weights of the remaining elements at the same time are frozen. The winning element replaces its weight vector, slightly moving it towards the input vector. After learning on a sufficient number of examples, the set of weight vectors with greater accuracy comes in accordance with the structure of input examples - weight vectors literally simulate the distribution of input samples.

Fig. 13.5. Self-organizing Coonen network. Depicted only connections going to i-th node. The neighborhood of the node is shown by the dotted line

Obviously, for the correct understanding of the input distribution network, it is necessary that each network element becomes the winner of the same number - the weight vectors must be equalious.

Before the start of the network, Kohonen needs to do two things:

vectors of magnitude must be accidentally distributed by a single sphere;

all weight and input vectors must be normalized by one.

Network with counter distribution(CPN, Counterpropagation Network) combines the properties of the self-organizing network of Kohonen and the concept of Oustar - Grossberg network. As part of this architecture, the elements of the Coonen network layer does not have a direct exit to the outside world, and serve inputs for the output layer, in which the bonds adaptively enjoy the weight of Grossberg. This scheme arose from the works of Heht - Nielsen. The CPN network is aimed at the gradual construction of the desired display of inputs into outputs based on examples of such a display. The network solves the task well, where the ability to adaptively build mathematical reflection by its exact values \u200b\u200bat all points is required.

The networks of this species are successfully applied in such financial and economic applications as consideration of applications for the provision of loans, prediction of trend prices of stocks, goods and exchange rates. Speaking generally, you can expect the successful application of CPN networks in tasks where you need to extract knowledge from large amounts of data.

Practical application of neural networks for classification tasks (clustering)

Solving the classification task is one of the most important applications of neural networks. The task of the classification is the task of classifying the sample to one of several pairs of insequeous sets. An example of such tasks can be, for example, the task of determining the creditworthiness of the Bank's client, medical tasks in which it is necessary to determine, for example, the outcome of the disease, solving the tasks of managing the securities portfolio (sell, to buy or "hold" stocks depending on the market situation), The task of determining viable and inclined to bankruptcy firms.

Purpose of classification

When solving the tasks of classification, it is necessary to attribute Static samples (Characteristics of the situation on the market, medical data, client information) to certain classes. There are several ways to represent data. The most common is the method in which the sample is represented by a vector. Components of this vector are various characteristics Samples that affect the decision on how class can include this sample. For example, data from a patient's medical card may be for medical problems as components of this vector. Thus, on the basis of some information about the example, it is necessary to determine how class it can be attributed. The classifier thus refers to the object to one of the classes in accordance with a certain partition of the N-dimensional space, which is called space inputs, and the dimension of this space is the number of vector components.

First of all, it is necessary to determine the level of complexity of the system. In real tasks, the situation often arises when the number of samples is limited, which complicates the determination of the complexity of the task. It is possible to distinguish three main levels of complexity. The first (the easiest) - when classes can be divided by straight lines (or hyperplanes, if the input space has dimensions more than two) - the so-called linear separation. In the second case, classes cannot be divided by lines (planes), but it is possible to separate them with a more complex division - nonlinear separation. In the third case, classes intersect, and you can talk only about probabilistic dressing.

Fig. 13.6. Linear and nonlinearly separable classes

In the perfect embodiment, after pretreatment, we must obtain a linearly separable task, since after that the construction of the classifier is greatly simplified. Unfortunately, when solving real problems, we have a limited number of samples, on the basis of which the classifier is built. At the same time, we cannot carry out such a precession of the data at which the linear separation of samples will be achieved.

Using neural networks as a classifier

Direct communication networks are a universal tool for approximation of functions, which allows them to be used in solving classification tasks. As a rule, neural networks turn out to be the most effective way to classify, because they actually generate big number Regression models (which are used in solving the tasks of classification by statistical methods).

Unfortunately, a number of problems arise in the use of neural networks in practical tasks. First, it is not known in advance what kind of complexity (size) may require a network for a fairly accurate implementation of the display. This difficulty may be excessively high, which will require a complex network architecture. So, Minsk in his work "Persispetron" proved that the simplest single-layer neural networks are able to solve only linearly separated tasks. This limitation is overcome using multilayer neural networks. In general, it can be said that in a network with one hidden layer, the vector corresponding to the input sample is converted by a hidden layer in some new space, which may have a different dimension, and then hyperplane corresponding to the neurons of the output layer, divide it into classes. Thus, the network recognizes not only the characteristics of the source data, but also "characteristics of characteristics" formed by a hidden layer.

Preparation of source data

To build a classifier, it is necessary to determine which parameters affect the decision on how the sample belongs to which class. At the same time there may be two problems. First, if the number of parameters is not enough, then a situation may occur in which the same set of source data complies with the examples in different classes. Then it is impossible to train the neural network, and the system will not work correctly (it is impossible to find a minimum that corresponds to such a set of source data). The source data must be consistent. To solve this problem, it is necessary to increase the dimension of the signs of signs (the number of the input vector component corresponding to the sample). But with an increase in the dimension of the signs of signs, the situation may occur when the number of examples can be insufficient to teach the network, and it simply will simply remember the examples from the training sample and cannot function correctly. Thus, in determining the signs, it is necessary to find a compromise with their number.

Next, it is necessary to determine the method of representing the input data for the neural network, i.e. Determine the method of rationing. The ignition is necessary, since neural networks operate with data shown in the ranges in the range 0..1, and the initial data may have an arbitrary range or in general to be non-numeric data. In this case, various methods are possible, ranging from simple linear transformation to the desired range and ending with multidimensional analysis of parameters and nonlinear rationalization depending on the effect of parameters on each other.

Coding output values

The task of classification in the presence of two classes can be solved on a network with one neuron in the output layer, which can take one of the two values \u200b\u200b0 or 1, depending on which class belongs to the sample. If there are multiple classes, a problem arises associated with the presentation of this data to output the network. The simplest way to represent the output data in this case is the vector whose components correspond to different classes numbers. Wherein i-i component The vector corresponds to the i-th class. All other components are installed in 0. Then, for example, the second class will correspond to 1 on 2 network output and 0 to the rest. When interpreting the result, it is usually considered that the class number is determined by the network output number on which the maximum value appeared. For example, if on a network with three outputs, we have vector output values \u200b\u200b(0.2; 0.6; 0.4), and we see that the second component of the vector has the maximum value, which means the class to which this example belongs is 2 . With this coding method, the concept of network confidence is sometimes introduced in the fact that the example relates to this class. The easiest way to determine the confidence is to determine the difference between the maximum value of the output and the value of another exit, which is closest to the maximum. For example, for the example discussed above, the confidence of the network is that the example refers to the second class is determined as a difference between the second and third component of the vector and is 0.6-0.4 \u003d 0.2. Accordingly, the higher the confidence, the greater the likelihood that the network gave the correct answer. This coding method is the easiest, but not always the most optimal way to submit data.

Other ways are also known. For example, the output vector is a cluster number recorded in binary form. Then, if there are 8 classes, we need a vector of 3 elements, and, say, the 3 class will correspond to the vector 011. But in case of obtaining incorrect value on one of the outputs, we can get an incorrect classification (incorrect cluster number), so it makes sense to increase The distance between two clusters through the use of coding of the exit to the chemming code, which will increase the reliability of the classification.

Another approach is to break the task C to classes on K * (K-L) / 2 subtasks with two classes (2 for 2 coding) each. Under subtasks in this case it is understood that the network determines the presence of one of the components of the vector. Those. The source vector is divided into groups of two components in each in such a way that all possible combinations of the output vector component are included in them. The number of these groups can be defined as the number of disordered samples of two from the source component.

352 "Border \u003d" 0 "\u003e

Substitutional number (exit) Output components 1 1-2 2 1-3 3 1-4 4 2-3 5 2-4 6 3-4

Where 1 at the output speaks about the presence of one of the components. Then we can go to the class number according to the calculation of the network as follows: We determine which combinations received a single (more accurately close to one) the output value (i.e., what subtasks we activated), and we believe that the class number will be the one that Entered B. the greatest number activated subtasks (see table).

Document Untitled

This coding in many tasks gives the best result than classic way coding.

Probabilistic classification

With statistical recognition of images, the optimal classifier refers the sample of the formula "src \u003d" http://hi-du.ru/e-books/xbook725/files/1.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! Lang:

take the formula "src \u003d" http://hi-du.ru/e-books/xbook725/files/4.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! lang:refers to a group having the largest posteriority probability. This rule is optimally in the sense that it minimizes the average number of incorrect classifications..gif "Border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! Lang:

then the Bayesian ratio between a priori and a posteriori probability remains strength, and therefore these functions can be used as simplified decisive functions. So it makes sense to do if these functions are built and are calculated simply.

Although the rule looks very simple, it turns out to be applied in practice, since there are unknown a posteriori probability (or even the values \u200b\u200bof simplified decisive functions). Their values \u200b\u200bcan be estimated. By virtue of the Bayes theorem, the recipient probabilities can be expressed through the a priori probabilities and the function of the density of the formula formula "src \u003d" http://hi-edu.ru/e-books/xbook725/files/8.gif "border \u003d" 0 "Align \u003d "absmiddle" alt \u003d "(! lang:.

Classifiers images

A priori probability density can be estimated different ways. IN parametric methods It is assumed that the probability density (PDF) is a function of a certain species with unknown parameters. For example, you can try closer PDF using a Gaussian function. In order to produce a classification, it is necessary to pre-obtain estimated values \u200b\u200bfor the middle and matrix of covariances for each of the data classes and then use them in the decisive rule. The result is a polynomial crucial rule, containing only squares and pairwise variables. The whole procedure described is called Quadratic discriminant analysis (QDA). In the assumption that the matrices of covariances in all classes are the same, QDA comes down to linear discriminant analysis(LDA).

In the methods of another type - non-parametric- No preliminary assumptions about probability density required. In the "To the nearest neighbors" (Nn) method, the distance between the newly received sample and the vector of the training set, after which the sample refers to the class to which most of the neighbors belong to its nearest neighbors. As a result of this border, separating classes, are obtained piecewise linear. In various modifications of this method, various distances and special methods of finding neighbors are used. Sometimes instead of the most set of samples, the totality of centroids corresponding to clusters in the adaptive vector quantization method (LVQ) is taken.

In other methods, the classifier breaks data into groups according to the tree scheme. At each step, the subgroup is broken in two, and as a result, the hierarchical structure of the binary tree is obtained. Separating boundaries are usually partly linear and correspond to classes consisting of one or more tree leaves. This method is good in that it generates a classification method based on logical decisive rules. The ideas of tree classifiers are used in the methods of building self-activating neural classifiers.

A neural network with a straight link as a classifier

Since direct communication networks are a universal tool for approximation of functions, with their help you can evaluate the posteriori probabilities in this classification task. Due to the flexibility in constructing a mapping, such accuracy of approximation of a posteriori probability can be achieved that they will practically coincide with the values \u200b\u200bcalculated by the Bayes rule (the so-called optimal classification procedures.

The task of analyzing time series

The time series is an ordered sequence of real numbers of the formula "src \u003d" http://hi-edu.ru/e-books/xbook725/files/10.gif "border \u003d" 0 "align \u003d" absmiddle "alt \u003d" (! Lang:in n-dimensional space shifted in time of values, or delay space.

The purpose of the analysis of temporary series is to extract useful information from this number. To do this, it is necessary to build a mathematical model of the phenomenon. Such a model should explain the creature of the process generating data, in particular, describe the nature of the data (random, having a trend, periodic, stationary, etc.). After that, you can apply various data filtering methods (smoothing, emission removal, etc.) with the ultimate goal - to predict future values.

Thus, this approach is based on the assumption that the time series has some mathematical structure (which, for example, may be a consequence of the physical essence of the phenomenon). This structure exists in the so-called phase spacewhose coordinates are independent variables describing the state of the dynamic system. Therefore, the first task with which you will have to encounter when modeling is to suitably determine the phase space. To do this, choose some characteristics of the system as phase variables. After that, it is already possible to raise the question of prediction or extrapolation. As a rule, in the time rows obtained as a result of measurements, there are random fluctuations and noise in different proportions. Therefore, the quality of the model is largely determined by its ability to approximate the intended data structure, separating it from noise.

Statistical analysis of temporary series

A detailed description of the methods of statistical analysis of time series goes beyond this book. We will briefly consider traditional approaches, while highlighting the circumstances that are directly related to the subject of our presentation. Starting with the pioneer work of Yula, the central place in the statistical analysis of the time series was taken by the linear models of Arima. Over time, this area took shape into a complete theory with a set of methods - the Boxing Jenkins theory.

The presence in the ARIMA model autorgetic member expresses the fact that the current values \u200b\u200bof the variable depends on its past values. Such models are called one-dimensional. Often, however, the values \u200b\u200bof the investigated target variable are associated with several different temporal rows.

Fig. 13.7. Implementation of arima (p, q) models on the simplest neural network

So it will be, for example, if the target variable is currency exchange rate, and other participating variables are interest rates (in each of the two currencies).

The corresponding methods are called multidimensional. The mathematical structure of linear models is pretty simple, and calculations may be made without much difficulties with the help of standard packages of numerical methods. The next step in analyzing time series was the development of models capable of considering nonlinearities present, as a rule, in real processes and systems. One of the first such models was proposed by Tong and is called a threshold autorgetic model (TAR).

In it, when determined (established in advance) threshold values, switching from one linear AR model to another. Thus, there are several modes of operation in the system.

Then STAR-, or "smooth" tar models are offered. Such a model is a linear combination of several models taken with coefficients that are continuous time functions.

Models based on neural networks with direct link

It is curious to notice that all models described in the previous paragraph can be implemented by neural networks. Any dependence of the view

selection "\u003e Fig. 13.8

Actions on the first stage Stage P reductive data processing- Obviously, strongly depend on the specifics of the problem. You need to choose the right number and type of indicators characterizing the process, including the delay structure. After that, you need to choose the network topology. If networks are applied with direct connection, you need to determine the number of hidden items. Next, to find the model parameters, you need to select an error criterion and an optimizing (learning) algorithm. Then, using diagnostic tools, you should check the various properties of the model. Finally, it is necessary to interpret the output network information and maybe to submit it to some other decision support system. Next, we will consider issues that you have to solve at the pre-processing, optimization and analysis stages (refractory).

DATA COLLECTION

The most important decision that the analyst should take is the choice of a set of variables to describe the simulated process. To imagine possible links between different variables, you need to well understand the creature of the problem. In this regard, it will be very useful to talk with an experienced specialist in this subject area. With regarding the variables you choose, you need to understand, are they meaningful in themselves, or only other things are reflected in them, really substantial variables. Significance check includes cross-correlation analysis. With it, it is possible, for example, to identify the temporary connection of the type of delay (lag) between the two rows. How much the phenomenon can be described by the linear model is checked by regression by the least squares method (OLS).

Received after optimization Inbutting the subtitle "\u003e

Neural networks as a means of data extraction

Sometimes there is a task of analyzing data that can hardly be represented in mathematical numerical form. This is the case when you need to extract the data, the principles of the selection of which are specified in fuzzy: to highlight reliable partners, determine the promising product, etc. Consider a typical situation for the tasks of this kind - the prediction of bankruptcies. Suppose we have information about the activities of several dozen banks (their open financial statements) for a certain period of time. At the end of this period, we know which of these banks went bankrupt, which were withdrawn by a license, and which continue to work stably (at the time of the end of the period). And now we need to solve the issue in which of the banks it is worth placing funds. Naturally, it is unlikely that we want to place funds in a jar that can go bankrupt soon. It means that we need to somehow solve the task of analyzing the risks of investments in various commercial structures.

At first glance, it is easy to solve this problem - because we have data on the work of banks and the results of their activities. But, in fact, this task is not so simple. There is a problem due to the fact that the data we have described the past period, and we are interested in what will be in the future. Thus, we need to obtain a forecast for the existing a priori data on the basis of the a priori. To solve this task, you can use various methods.

So, the most obvious is the use of methods of mathematical statistics. But there is a problem with the amount of data, because the statistical methods work well with a large amount of a priori data, and we may have a limited number. In this case, statistical methods cannot guarantee a successful result.

Another way to solve this task can be the use of neural networks that can be trained on a set of data. In this case, these financial reports of various banks are used as the initial information, and the result is the result of their activities as a target field. But when using the methods described above, we impose the result without trying to find patterns in the source data. In principle, all bankrupt banks are similar to each other at least what they went bankrupt. It means that there should be something more common in their activities, which led them to this outcome, and you can try to find these patterns in order to use them in the future. And here we have the question of how to find these patterns. For this, if we use the methods of statistics, we need to determine which criteria for "similar" to us to use what may require any additional knowledge of us about the nature of the task.

However, there is a method that allows you to automate all these actions on the search for regularities - the method of analysis using self-organizing Kohonen cards. Consider how such tasks are solved and as Kohonen cards find patterns in the source data. For generality consideration, we will use the term object (for example, an object can be a bank, as in the example above, but the described technique without changes is suitable for solving and other tasks - for example, the analysis of the creditworthiness of the client, finding an optimal behavior strategy on the market, etc. ). Each object is characterized by a set of different parameters that describe its condition. For example, for our example, the parameters will be data from financial reports. These parameters often have a numeric form or can be brought to it. Thus, we need based on the analysis of object parameters to highlight similar objects and present the result in a form convenient for perception.

All these tasks are solved by self-organizing Kohonen cards. Consider in more detail how they work. To simplify consideration, we assume that objects have 3 signs (in fact, there may be any number).

Now imagine that all these three parameters of objects are their coordinates in three-dimensional space (in the very space that surrounds us in everyday life). Then each object can be represented as a point in this space that we will do (so that we have no problems with a different scale on the axes, correct all these signs in the interval in any suitable way), as a result of which all points fall into a single-sized cube Fig. 13.9. Display these points. Looking at this picture, we can see how objects are located in space, and it is easy to notice areas where objects are grouped, i.e. They have similar parameters, which means that these objects themselves are most likely belonging to one group. We need to find a way to which this system can be converted to a simple for perception, preferably a two-dimensional system (because there is already a three-dimensional picture to be correctly displayed on the plane) so that the facilities in the articular space are near and on the resulting picture. To do this, use a self-organizing Kohonen map. In the first approximation, it can be represented as a network made of rubber rice. 13.10.

We, pre-"commercial", throw this network into the signs of signs where we already have objects, and then we do as follows: we take one object (point in this space) and we find the nearest network node to it. After that, this node is tightened to the object (since the grid "rubber", together with this node the same, but with a smaller force and adjacent nodes are tightened).

Then another object is selected (point), and the procedure is repeated. As a result, we will get a map, the location of the nodes of which coincides with the location of the main accumulations of objects in the source space Fig.13.11. In addition, the resulting card has the following wonderful property - it nodes are located in such a way that the facilities that are similar to each other correspond to the adjacent nodes of the card. Now we determine which objects we have in which nodes of the card. It is also determined by the nearest node - the object falls into that node that is closer to it. As a result of all these operations, objects with similar parameters will fall into one node or in adjacent nodes. Thus, we can assume that we were able to solve the task of finding similar objects and their grouping.

But on this capabilities of Kohonen cards do not end. They also allow the obtained information in a simple and visual form by applying coloring. To do this, we paint the received card (more precisely, its nodes) with flowers corresponding to the features of interest to us. Returning to example with the classification of banks, you can paint into one color those nodes where at least one of the banks that have been withdrawn by a license. Then, after applying the coloring, we get a zone that can be called a zone of risk, and the bank of interest to us in this zone speaks of his unreliability.

But that's not all. We can also get information about dependencies between parameters. Applying a coloring card that meets various reports to the card, you can get the so-called Atlas, which stores information about the state of the market. When analyzing, comparing the arrangement of colors on coloring generated by various parameters, you can get complete information about the financial portrait of banks - losers, prosperous banks, etc.

With all this, the described technology is universal method Analysis. With it, it is possible to analyze various strategies of activity, to analyze the results of marketing research, check the creditworthiness of customers, etc.

Having a map and knowing information about some of the objects under study, we can reliably judge the objects with which we are not familiar with. Need to know what is the new partner? Display it on the map and look at the neighbors. As a result, you can extract information from the database based on fuzzy characteristics.

Cleaning and converting a database

The preliminary, before submitting to the input of the network, the conversion of data using standard statistical techniques can significantly improve both parameters of learning (duration, complexity) and system operation. For example, if the input series has a distinct exponential form, then after its logarithming, it turns out a simpler row, and if there are complex dependences in it, it will be much easier to detect them. Very often, non-standardly distributed data is pre-subjected to nonlinear conversion: the initial number of the values \u200b\u200bof the variable is converted by some function, and the row obtained at the output is received for a new input variable. Typical transformation methods are the construction of root extraction, reverse quantities, exponentials or logarithms.

In order to improve the information structure of the data, certain combinations of variables can be useful - works, private, etc. For example, when you are trying to predict changes in stock prices in the options market options, the ratio of the number of options Put Options, i.e. options for sale) to the number of options call (call options, i.e. purchase options) Informative than both of these indicators separately. In addition, with the help of such intermediate combinations, you can often get more simple modelthat is especially important when the number of degrees of freedom is limited.

Finally, for some conversion functions implemented in the output node, problems arise with scaling. Sigmoid is defined on the segment, so the output variable must be scaled so that it takes the values \u200b\u200bin this interval. There are several ways to scale: a shift on a constant, a proportional change in values \u200b\u200bwith a new minimum and maximum, centering by subtracting the average value, bringing the standard deviation to one, standardization (two recent action together). It makes sense to make the values \u200b\u200bof all input and output values \u200b\u200bin the network always lay, for example, in the interval (or [-1,1]), then it will be possible to use any conversion functions without risk.

Building model

The values \u200b\u200bof the target series (this is the number to find, for example, income per day for ahead) depend on N factors, among which there may be a combination of variables, the past target variable values \u200b\u200bencoded quality indicators.

Evaluation of the model quality is usually based on the criteria for consent of the type of average quadratic error (MSE) or square root from it (RMSE). These criteria are shown how predicted values \u200b\u200bwere close to a training confirming or test set.

In linear analysis of time series, it is possible to obtain an indispensable assessment of the ability to generalize, exploring the results of work on the learning set (MSE), the number of free parameters (W) and the volume of the training set (N). Estimates of this type called information criteria (1C) and include a component corresponding to the criterion of consent and a component of a fine, which takes into account the complexity of the model. The following information criteria were proposed: Normalized (NAIC), Normalized Bayesovsky (NBIC) and final forecast error (FPE):

subtitle "\u003e

SOFTWARE

To date, many software packages that implement neural networks have been developed. Here are some, the most famous: neural network simulators presented on the software market: Nestor, Cascade Correlation, Neudisk, Mimenice, Nu Web, Brain, Dana, NeuralWorks Professional II Plus, Brain Maker, HNet, Explorer, Explorenet 3000, Neuro Solutions, Prapagator, Matlab Toolbox. It is also worth mentioning the Si-Muilants freely distributed through university servers (for example, SNNS (Stuttgart) or Nevada QuickPropagation). An important quality package is its compatibility with other programs involved in data processing. In addition, the friendly interface and productivity, which can reach many megaphlops (million floating point operations per second) are important. Board-accelerators allow you to reduce learning time when working on ordinary personal computers. However, to obtain reliable results using neural networks, as a rule, a powerful computer is required.

The established paradigms of financial science, such as the model of random wandering and hypothesis of an efficient market, suggest that financial markets react to information rationally and smoothly. In this case, you can hardly come up with something better than linear connections and inpatient behavior with a reversible trend. Unfortunately, in the real behavior of financial markets, we see not just the reversibility of trends, but constantly emerging inconsistencies of courses, volatility that clearly does not meet the incoming information, and periodically occurring races of price and volatility. To describe the behavior of financial markets, some new models have been developed and had a certain success.

Financial analysis in the securities market

Financial analysis on the securities market using neural network technologies in this work is carried out relative to oil and petroleum products.

The macroeconomic growth and welfare of the country in great extent depends on the level of development of basic industries, among which the oil-producing and oil refining industry is played an extremely important role. The situation in the oil industry largely determines the state of the entire economy of Russia. Due to the established price situation in the global oil market, for Russia the most profitable side in the activities of the oil industry is export. Oil exports are one of the most important and rapid sources of foreign exchange earnings. One of the best representatives of the oil industry is LUKOIL oil company. NK "LUKOIL" is a leading vertical-integrated oil company in Russia, which specializes in the extraction and processing of oil, production and sale of petroleum products. The company works not only in Russia, but also abroad, actively participating in promising projects.

The company's financial and production activities are described in Table 13.1.

Table. 13.1.

The main financial and production indicators for 1998

Document Untitled

Oil mining (including gas condensate)		64192 1284
Commercial gas production	million cubic meters. m / year million cubic meters feet / day	3748 369
Oil refining (own refineries, including foreign)	thousand tons / year thousand bar. / day	17947 359
Export oil	thousand tons / year	24711
Export of petroleum products	thousand tons / year	3426
Revenue-net	million rubles USD million *	81660 8393
Profit from sales	million rubles USD million *	5032 517
Profit before taxation (according to the report)	million rubles USD million *	2032 209
Profit before taxation (without a course difference)	million rubles USD million *	5134 528
Retained earnings (according to the report)	million rubles USD million *	118 12
Retained earnings (excluding courseways)	million rubles USD million *	3220 331
Assets (at the end of the year)	million rubles USD million *	136482 6638

In connection with the fall of world prices continued in 1998, their exports amounted to 3.4 million tons against 6.3 million in 1997. To preserve the company's conquered positions in the global market for oil products, exports are planned to be brought to 5-6 million in 1999 to improving market conditions. A priority is to create stimulating conditions for export growth and extraction of the maximum possible profit.

An important component of the process of selling oil and petroleum products to export, including all forms of contracts, the procedure for establishing prices, the responsibility of the parties and the other is the stock exchange. It accumulates all the processes occurring at the stage of buying a sale of this product, and helps to insure against concomitant risks.

Exchanges on which petroleum and petroleum product futures contracts are conducted: New York Commodity Exchange (NYMEX) and London International Oil Exchange (IPE). Exchange is a wholesale market, legally executed in the form of the organization of merchants. The development of mechanisms for trade in futures contracts and the introduction of the latter on all assets that commodity, futures and currency exchanges have previously traded, led to the erasure of the differences between the specified types of stock exchanges and the appearance of either futures exchanges on which only futures contracts are traded or universal exchanges on which Trading both futures contracts and traditional stock assets, such as stocks, currency and even individual goods.

Exchange functions are as follows:

organization of exchange meetings for public bidding;

development of stock contracts;

exchange arbitration, or resolving disputes arising from concluded exchange transactions during exchange trading;

value Exchange Function. This feature has two aspects. The first is that the task of the Exchange becomes the identification of "truly" market prices, but at the same time their regulation to prevent illegal manipulations with prices on the stock exchange. The second is a price-prognosis of the exchange;

hedge function, or exchange insurance of participants in stock trading from the price fluctuations unfavorable for them. The hedging function is based on the use of the mechanism of trading futures contracts. The essence of this feature is that the trader is Hedger (that is, the one who is insured) - should be both the seller and its buyer. In this case, any change in the price of its product is neutralized, as the seller's winnings have simultaneously losing the buyer and vice versa. This situation is achieved by the fact that Hedger, occupying, for example, the position of the buyer in the usual market should take the opposite position, in this case of the seller, in the market of exchange futures contracts. Typically, producers of goods are hedged from lower prices for their products, and buyers - from raising prices for purchased products:

speculative stock activity;

function guaranteeing transactions. Achieved with the help of stocking clearing systems and calculations;

information function of the exchange.

The main sources of information about the status and prospects for the development of the global oil and petroleum products market are the publications of Piatt's quotation agencies (the structural division of the largest American Publishing Corporation McGraw-Hill) and Argus Petroleum (Independent Company, United Kingdom).

Quotes give an idea of \u200b\u200bthe price range for a specific variety of oil for a certain day. Accordingly, they consist of a minimum price (minimum prices of transactions or the minimum weighted average price of buying a given variety of oil) and prices maximum (maximum transaction price or maximum weighted price offers for sale).

The accuracy of quotations depends on the amount of information collected. The first data on quotes are given in real time (they can be obtained in the presence of access to the appropriate equipment) at 21.00-22.00 Moscow time. These data can be adjusted in the case of receipt by the end of the day of new information on transactions clarifying preliminary quotes. The final version of quotations is provided in the official print publications of these agencies.

Quotes are given as transactions with immediate delivery - the price "SPOT" (delivery within two weeks, and for some varieties of oil - within three weeks) and transactions with deferred delivery (by key varieties of oil) - Prices "Forward" (Supply in a month, two months and three months).

Information on SPOT and Forward quotes is a key element in oil trading on the free market. Quotes "Spot" are used to assess the correctness of the selected price of the previously concluded "forward" transaction; to extract invoices for supply, calculations on which are carried out on the basis of formulas based on SPOT quotes at the time of shipment of goods; And also as a source point, from which counterparties begin to discuss the price conditions for the next quotation day.

Quotes "Forward", reflecting fixed prices of delayed delivery transactions, are essentially a forecast assessment of the market participants for a month, two and three months ahead. In combination with SPOT quotes, Forward quotes show the most likely at the moment the trend of price changes for this area of \u200b\u200boil to perspective in one, two and three months.

Quotes are given for oil of the standard quality class. If the quality of a particular batch of oil differs from the standard, then when entering into a transaction, the price of the party is set on the basis of quotations, taking into account the discount or award for quality.

The magnitude of the discount or award for quality depends on the extent to which the price of the NetBack of a particular batch of goods differs from the price of NEBEK oil of this class of standard quality.

Summarizing the content of all the above, we note that in order to ensure efficient oil exports, the supplier should have data on the quotations "spot", "Forward", prices for petroleum products and futures positions, information at Nethekk prices, freight and insurance rates, project dynamics and stocks Oil. The minimum information requirements are reduced to the knowledge of SPOT and Forward quotes on the exported oil and competitive oil varieties, the dynamics of spreads, freight and insurance rates. The main types of urgent contracts consisting on the stock exchange include:

Futures contract - Contract for the purchase and sale of goods in the future at the price at the time of the transaction.

The option is a contract that gives the right, but not an obligation to buy or sell a futures contract for oil or petroleum products in the future at the desired price. Options traded on the same exchanges where they trade futures contracts.

Forward deal - The deal, the term of execution of which does not coincide with the moment of its conclusion on the stock exchange and stipulates in the contract.

The "spot" deal is characterized by the fact that the term of its conclusion coincides with the period of execution, and with such a transaction, the currency must be put immediately (as a rule, no later than two working days after the conclusion of the transaction).

When concluding the contract, the special role is played accuracy of the forecast situation On the market in this type of goods, as well as the price forecast for it. Therefore, we consider it important to consider the role of forecast estimates in achieving the effect of oil and petroleum products.

When committing listed transactions have one key point - this is the accuracy of forecasts. Of course, from the point of view of the theory, it would seem, we still, where there will be prices in the future. Opening the position, we closed for themselves the price of the sale of oil, for us it can no longer be higher, nor below. Therefore, the exact forecast gives us the option to the necessary actions when changing the price. Invalid forecast means losses. There are many ways to forecast the market, but only some of them deserve special attention. Over the years, the forecasting of financial markets has been based on the theory of rational expectations, analyzing temporary rows and technical analysis.

According to the theory of rational expectations, prices increase or decrease due to the fact that investors are rationally and immediately respond to new information: any differences between investors in relation to, for example, investment purposes or information available to them are ignored as statistically insignificant. Such an approach is based on the assumption of the full information openness of the market, i.e. The fact that none of his participants has information that other participants would have possessed. At the same time, there can be no competitive advantage, since, having information that is not affordable, it is impossible to increase the chances for profit.

The purpose of analyzing time series is to identify a certain number of factors affecting the change in prices using statistical methods. This approach allows you to identify market development trends, however, if there are repeatability or homogeneous cycles in the ranks of the data, its use may be associated with serious difficulties.

Technical analysisit is a collection of methods for analyzing and making decisions based only on the study of the internal parameters of the stock market: prices, transactions and open interest values \u200b\u200b(the number of open contracts for the purchase and for sale). All variety of methods for predicting technical analysis can be divided into two large groups: graphic methods and analytical methods.

Graphic technical analysis is an analysis of various market graphic modelsgenerated by certain patterns of price movements on schedules in order to assume the likelihood of continuing or changing an existing trend. Consider the main types of graphs:

Linear. On linear graphics, only the closing price for each subsequent period is celebrated. Recommended on short segments (up to a few minutes).

Schedule segments (bars) - on bars graphics depict the maximum price (top point of the column), the minimum price (bottom point of the column), the opening price (to the left of the vertical column) and the closing price (the abbreet of the right from the vertical column). Recommended for periods of time from 5 minutes and more.

Japanese candles (built by analogy with bars).

Cross-tickers - no time axis, and a new price column is built after the appearance of another dynamics. The cross is drawn if prices fell to a certain amount of Points (criterion of rivers) if prices increased by a certain number of items, it is drawn a zolik.

Arithmetic and logarithmic scales. For some types of analysis, especially if we are talking about the analysis of long-term trends, it is convenient to use a logarithmic scale. In an arithmetic scale, the distance between divisions is unchanged. In the logarithmic scale the same distance corresponds to the same as a percentage of changes.

Graphs volume.

The postulates of this type of technical analysis are the following basic concepts of technical analysis: the trends lines, levels of market resistance and support, levels of correction of the current trend. For example:

Resistance line (resistance):

arise when buyers no longer can or do not want to buy this product on more high prices. The pressure of sellers exceeds the pressure from the buyers, as a result, growth stops and is replaced by a drop;

connect important maxima (vertices) of the market.

Support Line:

combine important minima (bottoms) of the market;

there are no more sellers anymore or do not want to sell this product at lower prices. For this level Prices The desire to buy quite strongly and can resist pressure from the sellers. The fall is suspended, and prices begin to go up again.

Going down, the support line turns into resistance. Rising, the resistance line turns into support.

If prices ranges between two parallel straight lines (channel lines), you can talk about the presence of a booze (down or horizontal) channel.

Distinguish two types of graphic models:

1. The fracture models tend to be modulated on the graphs of the model, which, when performing certain conditions, can anticipate the change of the trend existing on the market. These include such models like "head-shoulders", "double vertex", "double base", "Triple vertex", "Triple base".

Consider some of them.

"Head - shoulders" - confirms the trend turn.

Fig.13.22. 1-first vertex; 2 second vertex; 3-line neck

Headshoulders - head - shoulders.

Fig.13.23. 1 pin left shoulder; 2 pin; 3 pin of the right shoulder; 4-line neck.

2. Models continue the trend - the models formed on schedules, which, when performing certain conditions, it makes it necessary to assert that there is a chance of continuing the current trend. Perhaps the tendency developed too quickly and temporarily entered into a state of overbought or oversold. Then after the intermediate correction, it will continue its development towards the former trend. In this group, there are such models such as "triangles", "diamonds", "flags", "pennants" and others. For example:

As a rule, these figures finish their formation at a distance from the vertex P (stern) equal:

i define "\u003e

Triangle

Triangles in the market should be afraid. R-price base. T-temporary base. The breakdown of the figure takes place at a distance: "\u003e

collecting and storing data - possible participants in the forecast (or as a criterion, or as a predicted value, or both as something else);

the definition for the trend or a set of criteria (and the data directly stored in the database cannot always be used, it is often necessary to make some data transformations, for example, rationally as criteria to use relative changes in values);

detection of the dependence between the predicted value and the set of criteria in the form of some function;

calculating the value of interest in accordance with a specific function, the values \u200b\u200bof the criteria for the predicted moment and the type of forecast - short-term or long-term).

In the practical part of the work based on historical data of any trend for some time interval (month, year, a few years), also presented at some time (minute, 5-minute, half-hour, daytime, etc. Quotes) We need to get a forecast Development of quotations for several time discretes ahead. Information on assets quotes is represented by all or part of the standard parameters describing the quotes for the discrete time: opening prices, closures, maximum, minimum, trading volume at the time of closure, open interest.

The use of neural networks to obtain a quick and high-quality forecast can be considered in Fig. 13.27 "Technological scheme for predicting in the stock market using neural networks."

For a full forecast of the trends of the three markets most developed in our country, including many financial instruments, a sufficient amount of source for data prediction is necessary. As can be seen from the scheme, the following information is currently implemented:

information and trading data agencies "Reuters", "Dow Jones Telerate", "Bloomberg";

trading data from MICEX and RTS sites;

other data via manual input.

All necessary data enroll in the database (MS database SQL Server). The following is the choice and preparation of data to participate in the forecast. At this preliminary stage, the task of choice of more than 200 types of information and trading data of the most significant criteria is faced for the forecast of the value of interest of some financial instrument or a group of financial instruments. The primary choice of criteria is carried out by the analyst and depends on the experience and intuition of the latter. Analytics is given tools for technical analysis, represented in the form of graphs, analyzing which you can catch hidden interrelations. A temporary prediction range is allocated.

The processed data is then enrolled in the neural network Pack of Statis-Tica Neural Networks, where 5 days periods are recognized using a trained perceptron. Each of the periods of the network assigns one of four indicators characterizing the trend changes (as charts in technical analysis): a stable period, ascending, downward, indefinite. Based on the processed data, the network builds a forecast, but to achieve clarification of the results obtained, we complicate the forecasting process. Further processing occurs in the Statist1CA system. Data does not need to be converted, since the same by type.

In the process of processing the time series in the Statistica package in the Time Series / forecasting module using an exponential smoothing (Exponential Smoothing forecasting), a trend is distinguished that is divided into equal (5-day periods) for subsequent short-term forecasting. Trend setting is performed according to one of the four methods presented (linear, exponential, horizontal, polynomial). For our experiment, we chose an exponential method. Trend trend and received data on smoothing it. These data come back to neural networks using a multi-layer perceptron. Training is made by exponential smoothing, as a result of which the network confirms the correctness of the previously obtained forecast. View results using the archiving feature.

The obtained predicted values \u200b\u200bare analyzed by a trader, as a result of which the right decision to carry out operations with securities is made.

One approach to solving the problem of analyzing and forecasting the stock market is based on the cyclical nature of the development of economic processes. The manifestation of cyclicity is the wave-like development of economic periods. When predicting time series in the economy, it is impossible to correctly assess the situation and make a fairly accurate forecast without taking into account that cyclic oscillations are superimposed on the trend line. In modern economic science, more than 1380 types of cyclicity are known. The economy operates the benefit with the following four:

Kitin cycles - stock cycles. Kitchin (1926) focused on the study of short waves in length from 2 to 4 years on the basis of the analysis of financial accounts and sales prices when driving inventory.

Jewor's cycles. This cycle has other names: business cycle, industrial cycle, etc. The cycles were discovered when studying the nature of industrial oscillations in France, Great Britain and in the United States based on the fundamental analysis of the percent and price rates. As it turned out, these fluctuations coincided with the investment cycle, which in turn initiated changes in GNP, inflation and employment.

Cycles blacksmith. J. Ryghalman, V.Nuven in the 1930s. And some other analysts built the first statistical indices of the cumulative annual volume of housing and found in them the long-term surveys of rapid growth and deep recession or stagnation. Then the term "building cycles" appeared for the first time.

Condratyev cycles. Large cycles can be considered as a violation and restoration of economic equilibrium of a long period. Their core lies in the mechanism of accumulation, accumulation and scattering of capital sufficient to create the main production forces. However, this main reason increases the effect of secondary factors. In accordance with the above, the development of a large cycle takes the following lighting. The beginning of the rise coincides with the moment when the accumulation and accumulation of capital reaches such a voltage at which the cost-effective investment of capital is becoming possible for the purposes of productive forces and radical rehabservation of technology. Moreover, according to the main "truths" of Kondratyev, during the rapid wave of a large cycle, medium and short waves are characterized by the brevity of dispersions and the intensity of raises, and during periods of a large wave of a large cycle there is a reverse picture.

In the stock market, the vibration data is manifested in the following lines and recession levels of business activity for a certain period of time: the peak of the cycle, the decline, the lowest point and the phase of revitalization.

In this paper, we proceed from the fact that price fluctuations in the securities market are the result of the superposition of the various above waves and rows of random, stochastic factors. An attempt is made to identify the presence of cycles and determine the phase in which the process is located. Depending on this, the forecast of the further development of the process using the ARIMA funds under suitable assumptions about the process parameter is constructed.

System transitions are a superposition of waves of different lengths. As you know, the waves have several phases that replace each other. It can be a phase of revitalization, recession or stagnation. If these phases are assigned symbolic values \u200b\u200bA, B, C, then they can be represented as a sequence of primitives (similar charts in technical analysis) and, recognizing these sequences (which are also lifting periods, recession, stagnation, i.e. A, B, C, only smaller scale), we can, based on the rules of recognizing grammar with some probability of formula "SRC \u003d" http://hi-edu.ru/e-books/xbook725/files/28.gif "Border \u003d "0" Align \u003d "absmiddle" alt \u003d "(! Lang:. Then we can also consider the sequences of the form of aaabbcd .... It turns out that we recognized the wave itself, and its phase.

Now we can make not only a more accurate short-term forecast, but we can trace the overall dynamics of the stock market in the future (determining the phase of a long wave, we can judge the character of the next, because phases flow in a certain sequence). In our experiment, we tried to train the percepton recognition of the phases of waves (A, B, C, D).

For the experiment, these results of trading with RTS on LUKOIL shares (LKON) were taken on the period from June 1, 1998. At December 31, 1999, the following variables were included in the original database: the weighted average purchase price, the weighted average selling price, the maximum day price, the minimum day price, the number of transactions. The database with the values \u200b\u200bof the variables listed was imported on Excel Wednesday from the Internet, and then transferred to the SNW package. More This procedure is reviewed by fig. 13.28 - Weights that take individual observations of the series.

The width of the smoothing interval was taken equal to 4 observations. Then the numeric expression of the trend (Figure 13.30) was added to the new variable smoothing.

Thus, we formed the database with variables that will be fed to the neural network input. And because SNW is fully compatible with SNN specifically import data in SNN was not necessary. At the entrance it was assumed to get the values \u200b\u200bof type A, B, C, D, but for this purpose, the Perseceptron had to recognize the phase in which we are and on its foundation to make a more accurate short-term forecast. In other words, he must consider the sequences of primitives and identify them with the phases of the cycle. Moreover, the phase of the cycle type A, B, C, D perceptron to the output does not produce.

To implement the task of learning perceptron, a mobile window is allocated width of 5 days. One temporary window consists of a sequence of primitives A, B, C or D. Thus, a more accurate forecast can be obtained by piecewise approximation of a numerical trend at a weighted average purchase price.

To train the Pershepton to recognize five-day sequences and identify them as a, in, s or d, we had to determine their phase for a certain number of options and add our results to a new variable of the original database (state). Thus, the finally formed database contains the values \u200b\u200bof the following variables: the weighted average price of the purchase, the weighted average selling price, the maximum day price, the minimum day price, the number of transactions, the trend allocated relative to the weighted price of the purchase and, finally, the variable defining the state of the economic process. All variables except the latter were applied to the entrance. It was supposed to be obtained only at the exit so that the perceptron in training did not react to the value of this variable, and caught weights so that only four values \u200b\u200bcould be obtained: a, in, s, d, and then, according to the recognized state, as well as It took into account that after the lifting phase follows the constancy phase, and then recession again, and the short-term forecast was capable of making a short-term forecast. Thus, all data are collected for the forecast. Now only one question remains: what parameters to select for the network and how to teach. In this regard, a number of experiments were carried out and the following conclusions were made as a result.

Initially, it was intended to train a multilayer perceptron by the method of reverse error dissemination. 7 variables were applied to the input (they are listed above), on the output only one- state. In addition to the input and output layers, one intermediate layer was built, consisting of 6, and then from 8 neurons. The learning error was approximately 0.2-0.4, however, the perceptron reacts weakly. Therefore, we decided to first increase the number of neurons on the middle layer to 14, and then changed the method of learning perceptron ("conjugate gradients"). The error began to fluctuate in the range of 0.12-0.14, and all the many variable values \u200b\u200bwere considered as a training.

As a result of experiments, the optimal network turned out to be a neural network with the following parameters: 7 variables are applied to the input: smoothly, average, open_buy, voltrad. Val_q, min_pr, max_pr, on the output - state. Training was carried out in increments of 6, by the method of conjugate gradients, only 3 layers (on the first 7 neurons, on the second-14, on the third - 3) (Fig. 13.29), as a result, the Perseceptron clearly reacted to the state of the trend (ascending - 1 output neuron The layer, descending Ndash; 2 neuro output layer and horizontal - 3 neurons) (Fig. 13.31).

As a result of the studies, there was a selection of data-possible objects of the forecast, identified the projected values \u200b\u200band sets of criteria, and also revealed the relationship between them.

In the process of the experiment, it was found that the selection of the trend increases the rate of learning a multilayer perceptron, and with a certain regulation of the network recognizes ascending, downward and horizontal trends.

The obtained positive results make it possible to switch to a deeper study of cyclic dependencies in the markets, and use other methods of neural technologies during financial operations (Kohonen card).

Good day, my name is Natalia Efremova, and I research Scientist in Ntechlab. Today I will tell about the types of neural networks and their use.

First I will say a few words about our company. The company is new, maybe many of you do not know what we do. Last year, we won the MegaFace contest. This is an international party recognition contest. In the same year, our company was opened, that is, in the market for about a year, even a little more. Accordingly, we are one of the leading companies in the recognition of individuals and processing biometric images.

The first part of my report will be sent to those who are unfamiliar with neural networks. I am doing directly Deep Learning. In this area I work for more than 10 years. Although it appeared slightly less than a decade ago, there used to be some kind of neural networks that were similar to the Deep Learning system.

In the past 10 years, Deep Learning and computer vision developed in incredible pace. All that is made significant in this area has occurred in the last 6 years.

I will talk about practical aspects: Where, when, to apply in terms of Deep Learning to process images and video, to recognize images and individuals, as I work in the company that does it. I'll tell you a little about the recognition of emotions, which approaches are used in games and robotics. Also, I will tell about non-standard use of Deep Learning, what only comes out of scientific institutions and so far it is still applied little in practice, as it can be applied, and why it is difficult to apply.

The report will consist of two parts. Since most familiar with neural networks, first I will quickly tell you how neural networks work, what is biological neural networks, why it is important for us to know how artificial neural networks is, and which architectures in which areas are applied.

Immediately I apologize, I will jump a little on English terminology, because most of how it is called in Russian, I don't even know. Perhaps you too.

So, the first part of the report will be devoted to convolutional neural networks. I will tell you how Convolutional Neural Network (CNN) works, recognition of images on the example from face recognition. A little tell about the recurrent neural networks Recurrent Neural Network (RNN) and learning with reinforcement on the example of Deep Learning systems.

As a non-standard use of neural networks, I will tell you how CNN works in medicine to recognize voxel images, how neural networks are used to recognize poverty in Africa.

What is neural networks

The prototype to create neural networks served as not enough, biological neural networks. Perhaps many of you know how to program the neural network, but where it came from, I think some do not know. Two thirds of all sensory information, which comes to us, comes with visual bodies of perception. More than one third of the surface of our brain is engaged in two most important visual zones - a dorsal visual way and a ventral visual way.

The dorsal visual path begins in the primary visual zone, in our Temkok and lasts upstairs, while the ventral path begins on our head and ends at about the ears. All important recognition of the images that happens to us is all the meaningless, what we realize, goes exactly there, behind the ears.

Why is it important? Because it is often necessary to understand neural networks. Firstly, everyone is told about it, and I have already got used to what is happening, and secondly, the fact is that all areas that are used in neural networks to recognize images came to us from the ventral visual way, where every The small zone is responsible for its strictly defined function.

The image gets to us from the retina of the eye, the series of visual zones passes and ends in the temporal area.

In the distant 60s of the last century, when the study of the visual zones of the brain began, the first experiments were carried out on animals, because there was no FMRI. The brain was investigated using electrodes burned in various visual zones.

The first visual zone was investigated by David Humebel and Torsten Wester in 1962. They conducted experiments on cats. Cats showed various moving objects. What the brain cells react, it was the stimulus that recognized the animal. Even now, many experiments are carried out by these draconic ways. Nevertheless, this is the most effective way to find out what makes every smallest cell in our brain.

In the same way, many more important properties of the visual zones that we use in Deep Learning are open now. One of the most important properties is an increase in the receptive fields of our cells as moving from the primary visual zones to temporal fractions, that is, later visual zones. The receptive field is the part of the image that each cell of our brain is processed. Each cell has its own recipe field. This property is preserved in neural networks, as you probably know everything.

Also with an increase in recipe fields, complex incentives are increasing, which usually recognize neural networks.

Here you see examples of the complexity of stimuli, various two-dimensional forms that are recognized in zones V2, V4 and various parts of the temporal fields in the macaque. There are also a number of experiments on MRI.

Here you see how such experiments are being held. This is 1 nanometer part of IT Cortex zones "A Martexes when recognizing various objects. Distorted by what is recognized.

Summing up. An important property that we want to be taken in visual zones is something that the size of the receptive fields increases, and the complexity of the objects that we recognize are increasing.

Computer vision

Before we learned to apply it to computer vision - in general, it was not there. In any case, it worked not as good as it works now.

All these properties are transferred to the neural network, and now it has earned if it does not include a slight retreat to the datasets, which will tell later.

But first a little about the simplest percepton. It is also formed in the image and likeness of our brain. The simplest element resembling the brain cell is neuron. It has input elements that are located by default from left to right, occasionally from bottom to top. On the left is the input parts of the neuron, on the right output parts of the neuron.

The simplest perceptron is able to perform only the simplest operations. In order to perform more complex calculations, we need a structure with a large number of hidden layers.

In the case of computer vision, we need even more hidden layers. And only then the system will be able to recognize what she sees.

So, what happens when recognizing the image, I will tell on the example of persons.

For us to look at this picture and say that the face of the statue is shown on it, just enough. However, until 2010, for computer vision, it was an incredibly challenge. Those who dealt with this question, probably, know how hard it was to describe the object we want to find in the picture without words.

We needed it to make some geometric way, describe the object, describe the relationship of the object, how can these parts refer to each other, then find this image on the object, compare them and get that we recognized badly. Usually it was a little better than throwing a coin. Slightly better than Chance Level.

Now it happens wrong. We divide our image either on pixels or on some patches: 2x2, 3x3, 5x5, 11x11 pixels - as convenient to the creators of the system in which they serve as an inlet layer in the neural network.

The signals from these input layers are transmitted from the layer to the layer using synapses, each of the layers has its own specific coefficients. So, we pass from the layer to the layer, from the layer to the layer until we get that we recognized the face.

Conditionally, all these parts can be divided into three classes, we denote them with x, w and y, where x is our input image, Y is a set of labels, and we need to get our weights. How do we calculate w?

With our x and y, it seems simple. However, what is indicated by an asterisk, a very complex nonlinear operation, which, unfortunately, has no reverse. Even having 2 given components of the equation, it is very difficult to calculate it. Therefore, we need gradually, by the method of trial and error, the selection of weight w make sure that the error is as reduced as much as possible, it is desirable to be equal to zero.

This process is happening iteratively, we are constantly reduced until we find the value of the W weight w, which is enough for us.

By the way, no neural network with which I worked did not reach a mistake equal to zero, but it worked quite well.

Before you, the first network that has won the International Imaging Competition in 2012. This is the so-called alexnet. This network, which for the first time declared itself, that there is CONVOLUTIONAL Neural Networks and from the same time on all international competitions already CONVOLUTIONAL NEURAL NETS did not give up their positions ever.

Despite the fact that this network is quite small (there are only 7 hidden layers in it), it contains 650 thousand neurons with 60 million parameters. In order to it be iterative to learn to find the right weights, we need a lot of examples.

The neural network is studying on the example of the picture and label. As for childhood, "this is a cat, and this is a dog", also neural networks are trained on a large number of pictures. But the fact is that until 2010 did not exist quite large Data Set'a, which could teach such a number of parameters to recognize images.

The largest databases that existed before this time are: Pascal Voc, in which there were only 20 categories of objects, and Caltech 101, which was designed in California Institute of Technology. In the latter was 101 category, and it was a lot. The same who failed to find their objects in any of these databases, had to cost their databases that, I would say, terribly painfully.

However, in 2010, the base of ImageNet appeared, in which there were 15 million images separated by 22 thousand categories. It solved our problem of learning neural networks. Now everyone who has any academic address, who can quietly go to the base site, request access and get this database to train their neural networks. They respond quite quickly, in my opinion, the next day.

Compared to previous Data Set'am, this is a very large database.

On the example it is seen how little it was all that was before it. Simultaneously with the basis of ImageNet, the Imagnet competition appeared, international challenge, in which all teams wishing to compete can take part.

This year, the network was defeated in China, it was 269 layers. I do not know how many parameters, I suspect, too much.

Depth neural network architecture

Conditionally, it can be divided into 2 parts: those who learn, and those that do not learn.

Black marked those parts that do not learn, all other layers are capable of learning. There are many definitions that is inside each convolutional layer. One of the adopted designations is one layer with three components shared on the Convolution Stage, Detector Stage and Pooling Stage.

I will not go into details, there will still be many reports in which it is discussed in detail how it works. I'll tell you the example.

Since the organizers asked me not to mention a lot of formulas, I threw them at all.

So, the input image enters a network of layers, which can be called filters of different sizes and the different complexity of the elements that they recognize. These filters make up a certain index or a set of features, which then falls into the classifier. This is usually either SVM or MLP - a multi-layer perceptron, who is convenient.

In the image and likeness with the biological neural network, objects are recognized by varying complexity. As the number of layers increase, it all lost contact with Cortex, because the number of zones in the neural network is limited. 269 \u200b\u200bor many-many zones of abstraction, so only an increase in the complexity, the number of elements and recipe fields is preserved.

If we consider on the example of the recognition of individuals, then we have a receptive field of the first layer will be small, then a little more, more, and so until we finally cannot recognize the face of the entire face.

From the point of view of what is in our filters inside the filters, first there will be sloping sticks plus a bit of color, then part of the persons, and then the entire face will be recognized by each cell of the layer.

There are people who claim that a person always recognizes better than the network. Is it so?

In 2014, scientists decided to check how well we recognize in comparison with neural networks. They took the 2 best network at the moment - this is alexnet and network Matthew Ziller and Fergus, and compared with the response of various McAki brain zones, which was also sciorable to recognize some objects. Objects were from the animal world so that the monkey is not confused, and experiments were carried out, who recognizes better.

Since it is clearly impossible to get a response from the monkey, it was granted electrodes and measured directly by the response of each neuron.

It turned out that under normal conditions, the brain cells reacted as well as the State of the Art Model at that time, that is, Matthew Ziller's network.

However, with an increase in the speed of displaying objects, increasing the number of noise and objects in the image, the speed of recognition and its quality of our brain and the brain of primates are much falling. Even the simplest convolutional neural network recognizes objects better. That is, officially neural networks work better than our brain.

Classic tasks of convolutional neural networks

They are actually not so much, they relate to three classes. Among them are tasks such as identification of an object, semantic segmentation, individual recognition, recognition of human body parts, semantic definition of boundaries, allocation of objects of attention on the image and allocating normal to the surface. They can be divided into 3 levels: from the lowest levels to the highest level tasks.

By the example of this image, consider what makes each of the tasks.

Definition of borders - This is the lowest-level task for which convolutional neural networks are already classically applied.
Determination of vector to normal Allows us to reconstruct a three-dimensional image from two-dimensional.
SALIENCY, defining objects - This is what the person would pay attention to considering this picture.
Semantic segmentation Allows you to divide objects into classes by their structure, nothing know about these objects, that is, even before their recognition.
Semantic selection of borders - This is the allocation of borders broken into classes.
Human body parts.
And the highest level task - recognition of objects themselveswhich we now consider on the example of individual recognition.

Face Recognition

The first thing we make - run Face Detector "Ohm in the image in order to find a face. Next, we normalize, center your face and launch it to handle in the neural network. After that, we get a set or vector of signs uniquely describing the features of this person.

Then we can this vector signs compare with all the vectors of the signs that are stored in our database, and get a reference to a particular person, in his name, on his profile - everything that can be stored in the database.

It is thus our product FindFace works - this is a free service that helps look for people's profiles in the Basic "VKontakte".

In addition, we have an API for companies that they want to try our products. We provide services for detecting persons, verification and user identification.

We now have 2 scenarios. The first is identification, search in the database. The second is verification, this is a comparison of two images with a certain probability that this is the same person. In addition, we have now in developing emotion recognition, image recognition on video and Liveness Detection is an understanding, whether a person lives in front of the camera or a photo.

Some statistics. When identifying, when searching for 10 thousand photos, we have accuracy of about 95% depending on the quality of the base, 99% verification accuracy. And besides this, this algorithm is very resistant to change - we are not necessarily looking into the chamber, we can have some blinking items: glasses, sunglasses, beard, medical mask. In some cases, we can even defeat such incredible difficulties for computer vision, like glasses, and mask.

Very quick search, 0.5 seconds are spent on the processing of 1 billion photos. We have developed a unique quick search index. We can also work with low quality images obtained from CCTV cameras. We can handle it all in real time. You can upload photos through a web interface, via Android, iOS and search for 100 million users and there are 250 million photos.

As I said, we took the first place on MegaFace Competition - analogue for ImageNET, but for individual recognition. It has been held for several years, last year we were the best among 100 teams from all over the world, including Google.

Recurrent neural networks

We use the Recurrent Neural Networks when we are not enough to recognize only the image. In cases where it is important for us to comply with the sequence, we need the order of what happens to us, we use conventional recurrent neural networks.

This is used to recognize the natural language, for video processing, even used to recognize images.

I will not tell about the recognition of a natural language - after my report will still have two, which will be directed to the recognition of a natural language. Therefore, I will tell you about the work of recurrence networks on the example of the recognition of emotions.

What are recurrent neural networks? It is about the same as the usual neural networks, but with feedback. Feedback We need to transfer to the input of the neural network or for some of its layers the previous state of the system.

Suppose we process emotions. Even in a smile - one of the most simple emotions - there are several moments: from the neutral expression of the face until the moment when we have a complete smile. They go together in each other. So that it is good to understand, we need to be able to observe how this happens, to transfer what was on the previous frame in the next step of the system.

In 2005, the Montreal team made a recurrent system, which looked very simple to recognize the Emotion Recognition in The Wild. She had only a few sweep layers, and she worked exclusively with the video. This year, they also added recognition of audio and logged by framework data that are obtained from Convolutional Neural Networks, audio data with the operation of the recurrent neural network (with the return of the state) and received the first place on the contest.

Training with reinforcement

The next type of neural networks, which is very often used lately, but did not receive such a broad publicity as previous 2 types are Deep Reinforcement Learning, learning with reinforcement.

The fact is that in previous two cases we use databases. We have either data from individuals or data from pictures or data with emotions from videos. If we don't have it, if we cannot count this, how to teach a robot to take objects? This we do automatically - we do not know how it works. Another example: to compile large databases in computer games is difficult, and no need to be done much easier.

All, probably heard about the success of Deep Reinforcement Learning in Atari and in Guo.

Who heard about Atari? Well, someone heard, good. About Alphago I think everyone heard, so I will not even tell what exactly there is going on.

What happens in Atari? On the left, the architecture of this neural network is depicted. She is studying, playing with me to get the maximum reward. Maximum remuneration is the highest possible outcome of the game with the most as much as possible.

On the right above - the last layer of the neural network, which depicts the entire number of state states, which played itself against itself only for two hours. Red shows the desired outcomes of the game with the maximum remuneration, and blue is unwanted. The network builds a certain field and moves along its trained layers to the state she wants to achieve.

In robotics, the situation consists of a little differently. Why? Here we have several difficulties. First, we have not so many databases. Secondly, we need to coordinate three systems at once: the perception of the robot, its actions with the help of manipulators and its memory - what was done in the previous step and how it was done. In general, it is all very difficult.

The fact is that no neural network, even Deep Learning at the moment, cannot cope with this task quite efficiently, so Deep Learning is only extremely pieces of what needs to make robots. For example, Sergey Levin recently provided a system that teaches the robot to have enough objects.

Here shown the experiences that he spent on his 14 robots-manipulators.

What's going on here? In these basins that you see, various objects in front of you are: handles, erasers, smaller and more, rags, different textures, different stiffness. It is unclear how to teach the robot to capture them. For hours, and even, it seems, weeks, robots trained to be able to capture these items, were compiled on this subject of the database.

Databases are a certain response of the environment that we need to accumulate in order to be able to train a robot to do something in the future. In the future, robots will be trained on this set of system states.

Non-standard use of neural networks

This is unfortunately, the end, I have not much time. I will tell about those non-standard solutions that are now and which, in many forecasts, will have a certain application in the future.

So, Stanford's scientists recently invented a very unusual application of the CNN neural network for poverty prediction. What did they do?

In fact, the concept is very simple. The fact is that in Africa, the level of poverty raises for all imaginable and unthinkable limits. They do not even have the opportunity to collect social demographic data. Therefore, since 2005, we have no data at all about what is happening there.

Scientists gathered day and night cards from satellites and fought their neural network for some time.

The neural network was pretentiated on the imagenet "E. That is, the first filter layers were configured so that it knew how to recognize any very simple things, for example, roofs of houses, to search for settlements on day cards. Then day cards were mapped with night cards. Illumination of the same section of the surface in order to say how much money has the population to at least illuminate their homes over night.

Here you see the results of the forecast built by the neural network. The forecast was made with different resolution. And you see - the latest frame - real data collected by the Uganda Government in 2005.

It can be noted that the neural network has amounted to a fairly accurate forecast, even with a small shift since 2005.

Of course there are side effects. Scientists who are engaged in Deep Learning are always surprised to discover different side effects. For example, like those that the network learned to recognize water, forests, large construction sites, roads - all this without teachers, without pre-built databases. In general, completely independently. There were some layers that reacted, for example, on the road.

And the last application of which I would like to talk is the semantic segmentation of 3D images in medicine. In general, Medical Imaging is a complex area with which it is very difficult to work.

There are several reasons for this.

We have very few databases. It's not so easy to find a picture of the brain, besides damaged, and it is also impossible to take it.
Even if we have such a picture, you need to take a physician and make it manually post all the multi-layered images, which is very long and extremely inefficient. Not all doctors have resources to do it.
Need very high accuracy. The medical system cannot be wrong. When recognizing, for example, the cats, not recognized - nothing terrible. And if we did not recognize the tumor, then this is not very good. There are particularly ferocious requirements for the reliability of the system.
Images in three-dimensional elements - voxels, not in pixels, which delivers additional complexities to system developers.

But how did you get around this question in this case? CNN was a biscuit. One part handled a more normal resolution, the other is a slightly more deteriorated permission in order to reduce the number of layers that we need to train. Due to this, a little reduced time on the network training.

Where it applies: Define damage after impact, to search for a tumor in the brain, in cardiology to determine how the heart works.

Here is an example to determine the volume of the placenta.

Automatically works well, but not so much so that it has been released into production, therefore it only begins. There are several startups to create such medical vision systems. In general, in Deep Learning a lot of startups in the near future. It is said that Venture Capitalists in the last six months allocated more budget for startups to get the DEEP Learning than in the past 5 years.

This area is actively developing, many interesting destinations. We live with you at an interesting time. If you are engaged in Deep Learning, then you probably have to open your startup.

Well, at this, I probably again round. Thank you very much.

Neural networks (NS) are one of the most recent scientific approaches to the study of the behavior of the market. The idea of \u200b\u200bneural network is modeling (repetition) of behavior of various processes based on historical information.

Recently, active attempts have been made to combine artificial neural networks and expert systems. In such a system, an artificial neural network can react to most relatively simple cases, and all other are transmitted to consider the expert system. As a result, complex cases are accepted at a higher level, while it is possible to collect additional data or even with the involvement of experts.

The choice of the structure of neural networks is carried out in accordance with the characteristics and complexity of the task. If the task cannot be reduced to any of the known types, the developer has to solve the complex problem of the synthesis of the new configuration.

The neural network itself is a set of special mathematical functions with a variety of parameters that are configured in the course of training on past data. Then, trained neural network processes the source real data and gives its forecast of the future behavior of the system being studied. The main disadvantage of programs based on neural networks is the problem of proper teaching of the neural network and the exclusion of excess learning, which can very much affect the adequacy of the market model.

The advantage of neurocomputing is the unified principle of teaching neural network - minimizing an empirical error. The error function that estimates this network configuration is set from the outside - depending on which the goal is haunted. Next, the network begins to gradually modify its configuration - the state of all its synaptic scales is in such a way as to minimize this error. As a result, in the learning process, the network is better coping with the task assigned to it.

There are many different learning algorithms that are divided into two large classes: deterministic and stochastic. In the first of them, the adjustment of the scales is a rigid sequence of actions, in the second - it is made on the basis of actions obeying a certain random process.

Multilayer neural networks (several single-layer neural networks connected by each other) began to be used much later than one-layer, because Previously there were no techniques for learning such networks. Multilayer networks are able to recognize more complex objects, i.e. Have more optimal approximating abilities than single-layer. Already three layer neural network can recognize any image! If you create a recurrent (with feedback between layers) neural network, the network begins to work independently. To teach such a network, it is enough to sue an input signal and it will be able to classify the object specified by these signals.

In the economic sphere of neural networks, they are used to solve with the help of neurocomputers of the following tasks: predicting time series based on neural processing methods (exchange rate, demand and quotes of shares, etc.); Insurance activities of banks; Prediction of bankruptcies based on the neural network system; Determination of bonded courses and enterprise shares in order to invest in these enterprises; Application of neural networks to the tasks of stock activity; Forecasting the economic efficiency of financing economic and innovative projects.

All calculations in neural networks are conducted by the developed special application packages. Neuroopackets have now become a more or less classic means in the field of new computing and information technologies. Therefore, very many firms engaged in the development of new products use neurothechnology. Having such a product, you install it, then teach and run. The packages themselves are updated several times a year, so all of them are quite modern.

Neuralware application packages developed by a number of companies allow users to work with different types of neural networks and with different ways to study them. They can be both specialized (for example, to predict a course of stocks), and quite universal.

In particular, there is a package of Statistica Neural Networks. A noticeable advantage of this package is that it is naturally built into a huge arsenal of methods of statistical analysis and visualization of data, which is presented in the Statistica system.

Neurroshell Daytrader is the most famous program for creating neural networks for market analysis. In addition to neural networks, contains classical instruments and indicators of technical analysis. Understands MetaStock format.

Excel Neural Package is a Russian program for creating a neural network and analyzing them in Microsoft Excel.

Literature: Osovo S. Neural networks for information processing. M., Finance and Statistics, 2002. Nazarov A.V., Loskutov A.I. Neural network algorithms for predicting and optimizing systems. - SPb.: Science and Technology, 2003.