the Internet Windows Android

Construction of decades. We study the AMD Bulldozer processor architecture

Sysadmin (he is in English sysadmin., system Administrator) - Abbreviated name of the profession, the full name of which in Russian sounds like system Administrator. This profession has recently become very popular for most young, and not so, people, she is trained, they work on it, they receive good money for it. This is connected with the rapid development of various computer technology And their penetration into all spheres of human life. The word sysadmin is often used in spoken speech, in vacancies and resumes when looking for work, in one word - everywhere. Below will be discussed that the profession of the system administrator is.

In modern realities, the system administrator can be called actually any person who is engaged in servicing and maintaining a certain work computer network, including all its hardware and / or software components that may be included:

  • Personal computers such as workstations and servers;
  • Network equipment, such as switches, routers, firewalls and much more;
  • Web Servers, Mail Servers, Database Servers, and Other.

Also in certain cases, on the shoulders of the system administrator can lie the obligations to ensure proper information security.

Depending on its specialization, the system administrator can engage in the following activities:

  • The workstation and server administrators are most often maintained as hardware (farewell motherboards, chanting power supplies) and software (Windows not loaded, commas are not printed in Word "E ...).
  • Administrator corporate network Based on a domain Active Directory. Very popular occupation, given the prevalence operating systems Windows, as well as the need for them somehow centrally controlled. Such a specialist should be able to create, distribute in groups, edit users, give them the appropriate rights in the AD domain, as well as be able to manage group politicians For users, their computers and groups in which they are all consist.
  • Administration of networks and network equipment. Its responsibilities include knowledge topology of networks, the ability to work with both not customizable and with customizable network equipment, local planning computing network, as well as the possibility of combining into one network of several jobs distant from each other, by setting up NAT "OV and VPN" s. It is not necessary to forget the access and control of access within this network, and beyond the proxy setting.
  • The web server administrator, which should at least be able to install, customize and serve one of the following web servers - Apache, IIS, Nginx, follow the hosting (which can be located both within the organization's network and outside it). In addition, a good administrator should be able to customize the normal distribution of resources at high loads, clustering and many other specific things.
  • The postal server administration is also a common task for the sysadmin, its task includes working with such popular solutions as Exim, Microsoft Exchange, Postfix, Sendmail, or corporate mail solutions from Google or, for example, Yandex. In addition to obvious control of accounts (creation, deletion, configuration), it is also necessary to be able to configure the antispam system and so on.
  • Site administrator. These duties can include as simply some filling of the contents of the site, but since it is about the system administrator, then in theory it should be able to be able to configure hosting (including a web server, which has already been mentioned above), install and configure The desired site, such as any content management system (CMS).
  • The task of creating or maintaining a video surveillance system may become quite rarely for the obligations of the system administrator. In tasks, installing and configuring cameras, responding to various events, saving and playing records. Refers to system administration weakly, and often falls into its duties on part-time to any other duties.

Over the board of the system administrator class described above, such possible things are left as database administration ( Microsoft SQL., MySQL and its multiple branches, Oracle, etc.), administration 1C (not to be confused with "Programmer 1C"), PBX and much more.

Recent announcement of the newest aMD processors became one of the most vivid events of the current year. The intense expectation that was heated with numerous leaks of information and secret slides, did not leave not only the fans of a white-green camp, but also competing company's products. The performance data was received by the most controversial: from the overwhelming advantage over competitors to complete failure. No one will argue with the statement that the Stars microarchitecture underlying all the current AMD desktop solutions today is outdated by order. The capabilities of the heirs of the legendary K8, AMD Phenom II and Athlon II processors are no longer responding to modern realities. That is why the conclusion to the processor market based on a fundamentally new architecture of the Bulldozer was extremely necessary. This would be allowed to compete or even overtake solutions for competitors on performance and energy efficiency. The advantage in speed should ensure a fundamentally new eight-year architecture, and the introduction of a subtle 32-nm of the technological process together with the "advanced" capabilities of the voltages and frequencies of individual functional blocks promise a significant reduction in energy consumption in comparison with the decisions of the previous generation.

Finally, on October 12, the cover of mystery was rugged: it was then that the long-awaited announcement of AMD FX processors was held, which are based on the Bulldozer microarchitecture. Chipmeaker presented a whole line of CPU - carriers of this microarchitecture, which includes four-, six and eight-year-old models. Besides other aMD. Revived the trademark "FX", whose name in the past wearing products for enthusiasts. Indeed, all AMD FX processors of the current generation have a unlocked multiplication coefficient that, in theory, should make them attractive for overclocking lovers. Variating the number of functional blocks and working frequencies, AMD managed to fill in almost all major market niches, ranging from low-cost game systems and ending with suggestions for the configurations of the upper price range. Full the lineup The newest AMD processors in comparison with four- and six-core Phenom II looks like this:

FX 8150. FX 8120. FX 6100. FX 4100. Phenom II x6 Phenom II X4
Core Zambezi. Zambezi. Zambezi. Zambezi. Thuban. Deneb.
Connector Socket AM3 / AM3 + Socket AM3 / AM3 + Socket AM3 / AM3 + Socket AM3 / AM3 + Socket AM2 + / AM3 Socket AM2 + / AM3
CPU technical process, nm 32 32 32 32 45 45
Number of transistors, million 2000 2000 2000 2000 904 758
Crystal area, square. MM. 315 315 315 315 346 243
Number of kernels 8 8 6 4 6 4
Nominal frequency, MHz 3600 3100 3600 3100 2600 — 3300 3200 — 3700
TURBO CORE frequency, MHz 3900/4200* 3400/4000* 3300/3900* 3600/3800* 3100 — 3700
NB frequency, MHz 2200 2200 2200 2200 2000 2000/1800
Volume l1 cache, kb 16 x 8 + 64 x 4 16 x 8 + 64 x 4 16 x 6 + 64 x 3
16 x 4 + 64 x 2 128 x 6. 128 x 4.
Volume l2 cache, kb 2048 x 4.
2048 x 4.
2048 x 4.
2048 x 4.
512 x 6.
512 x 4.
Volume L3 cache, MB 8 8 8 8 6 6
Factor 18 15,5 16,5 18 13 — 16,5 16 — 18,5
Memory channels 2 2 2 2 2 2
Supported memory type DDR3 1333/1600/1866. DDR3 1333/1600/1866. DDR3 1333/1600/1866. DDR3 1333/1600/1866. DDR2 800/1066, DDR3 1333/1600
Tire for communication with chipset HYPER TRANSPORT 3.1. HYPER TRANSPORT 3.1. HYPER TRANSPORT 3.1. HYPER TRANSPORT 3.1. HYPER TRANSPORT 3.0. HYPER TRANSPORT 3.0.
Frequency Hyper Transport, MHz 5200 5200 5200 5200 4000 4000
Working voltage in 0,825-1,4 0,825-1,4 0,825-1,4 0,825-1,4 0,825-1,4 0,825-1,4
TDP, W. 125 125 95 95 125 125
Recommended value, $ 245 205 165 115 165 — 205 117 — 185

If you close your eyes on the number of computational cores, in comparison with the predecessors, the FX processors received a faster Hyper Transport 3.1 bus, support for high-speed DDR3 1866 MHz and increased up to 8 MB of 3-level cache. In addition, we draw your attention to sufficiently high clock frequencies that closely approached, and in some cases even overcame the frontier of 4000 MHz. If you proceed from the recommended price, the FX 4100 quad-core processor must compete with dual-core Sandy Bridge and the younger Phenom II X4; The rivals of the six-core FX 6100 will be the younger models of Core i5 and six-core Phenom II X6. The eight-core models FX 8120 and FX 8150 play in the "Higher League", where the Senior Core i5 and Core i7 will rule the ball, which still showed a magnificent level of performance. As you can see, the positioning of new AMD FX processors obliges them to hold on at the level of very serious rivals, so that newcomers will have to be ourselves as difficult!

Microarchitecture Bulldozer: structure and functioning features

First of all, it should be noted that AMD FX is purebred central processors and do not have a graphics core in their composition. Of course, in this regard, you can blame AMD in inconsistency, because the promotion of the APU (ACCELED PROCESSING UNIT) is one of the main strategic initiatives of the company. Instead of the built-in video adapter, users get full AMD FX compatibility with productive socket Platform AM3 / AM3 + for which many excellent system Plasteps and supports all current expansion opportunities. Especially for FX processors, AMD has released an updated 9th series of system logic sets.


Recall the main features of the flagship chipset AMD 990FX. So, it allows you to build graphic configurations AMD Crossfirex and NVIDIA SLI, thanks to the SA SB950 bridge supports SATA 6 Gb / s standard, but deprived of connection capabilities uSB devices 3.0. Concerning motherboards Socket AM3 based on the sets of system logic of previous generations, then after updating the firmware microcode, they will also need to support Bulldozer. But it already depends on the specific model.

One of key features Processors based on the Bulldozer microarchitecture began the transition to a 32-nm lithographic process, which for almost two years it is very successfully used by the main competitor - Intel. In addition to the potential reduction of energy consumption and improve overclocking potential, this fact has positively affected the cost of production of semiconductor crystals. AMD can no longer be called a newcomer in the development of 32-nm technical process: it is with such a level of detail that are quite successful APU Llano, which, though they did not gain recognition among enthusiasts, but are great for building inexpensive and compact universal PCs. Thanks to application modern technologies Production of the chip (despite the fact that it contains almost 2000 million transistors) was released very compact. The eight-core AMD FX 8150 have a kernel area of \u200b\u200bonly 315 mm², which is less than the flagships of the previous generation - Phenom II X6, whose crystal occupies as many as 346 mm². However, before the indicators of quad-core Sandy Bridge, AMD FX processors are still far, since the first chip, despite the presence of a built-in graphic accelerator, takes only 216 mm².

The main innovations that were made in the Bulldozer microarchitecture, affected the algorithm for performing multi-threaded computing. For a long time, the central processors were able to perform the only computational flow at one point in time. The so-called simultaneous work of several programs was carried out with the help of interrupt handler, that is, computational tasks different applications Early received short-term access to processor resources. It was due to this that the work of multitasking operating systems became possible. Is it worth saying that the speed of work in this mode was low. At the same time, the CPU developers began to notice that in the load different functional blocks of the processor may simply stand without work while others are occupied by the processing of the computing stream. This is exactly what it came across them sharing The same processor resources for processing several computing flows. Intel has implemented such an opportunity called Hyper-Threading to its processors back in 2002. This principle gives some increase in a specific type of task. At the same time, an AMD approach to the implementation of multi-threaded computing for a long time It remained unchanged: each stream should be performed on a separate core. Now, after optimizing the performance of individual processor nodes and careful load analysis, the AMD developers considered that the speed of some nodes is enough to maintain two independent computing flows at once. This approach has allowed to greatly reduce the number of transistors, but maintain high productivity. Now, in the light of increasing the requirements for speed while maintaining acceptable power consumption parameters, the developers are forced to look for ways to increase the number of instructions executed per tact.

So, at the base of all central processors AMD FX is a semiconductor crystal consisting of four computing modules, each of which is equipped with its own array of 2-level cache, the total level of the level of 8 MB, a two-channel DDR3 memory controller, a Hypertransport bus controllers and a built-in northern bridge.


Obviously, younger models are obtained from full chips by disconnecting individual function blocks. Looking at the structure of the Zambezi crystal involuntarily, it seems that we have an ordinary quad-core processor. In fact, this is not the case, and most of all, this fact demonstrates the structure of the computing module - a structural unit of AMD FX processors.

The computing module includes two integer computing blocks (ALU), each of which is capable of performing up to four instructions for the tact, equipped with its own level 1-level cache. All other blocks, such as branch predictor, instructions decoder, buffer memory for storing instructions and an array of 2 MB Cache-memory array, are presented in a single instance. Obviously, the developers considered that the performance of these blocks is enough to serve two ALU.


In addition, each of the computing modules has a floating point computing unit (FPU), which also has been subjected to significant refinement. So the standard SIMD extensions added SSE4.1 and SSE4.2 kits, as well as specific Xop, AES and AVX instructions, which allow significantly improving the speed subject to their support from software. It is interesting to perform the possibility of performing 256-bit AVX instructions, for this, the resources of two blocks at once are involved, each of which is capable of processing 128-bit FMAC commands. In this case, the FPU block is able to perform two short AVX instructions at the same time.

As can be seen, the Bulldozer microarchitecture has very advanced calculation capabilities, especially in comparison with the AMD processors of previous generations. However, such a technological advantage has to pay the need to carefully optimize the program code. Otherwise, especially in old applications, the speed level may be far from the expected.

A couple of words should be said about the organization internal memory AMD FX, which have become champions not only by the number of cores, but also by the total volume of the cache. As we have already spoken, each of the blocks of integer calculations has a buffer for storing data with a volume of 16 KB, while both buffer can be used to work FPU block. To store instructions, each computing module has a separate L1 cache with a volume of 64 KB, and the intermediate data is accumulated in the second-level cache, the dimensions of which are impressive 2 MB. A total of 3-level cache array for all four computing blocks has a volume of 8 MB and has associativity in 64 lines to each module. Thanks to the use of the exclusive organization of the 2nd and 3rd levels, you can talk about their total volume in 16 MB. It is not surprising that the crystal Bulldozer turned out to be so difficult, the lion's share of the transistor budget is assigned to the organization of the internal memory of the processor. Note that the operating frequency of the L3 cache can be 2000 MHz or 2200 MHz, depending on the processor model.

As can be seen brief description The design of the kernel, the microarchitecture of the Bulldozer, despite all its innovations, is not devoid of some drawbacks. Still, each computing module accounts for only one branch predictor, instruction sampling unit and one decoder of instructions, which, by the way, is able to process no more than four instructions for the tact. Let's see how AMD FX behaves in real applications, but intuition suggests that in applications that actively use FPUs, but not having software optimization for new sets of SIMD instructions, newest processors The level of performance characteristic of quad-core models will be displayed.

In addition to the change architecture, energy management mechanisms have undergone. Despite the larger amount of transistors and high clock frequencies, even the older eight-year AMD FX have a thermal package that does not exceed 125 W. Sure, a certain role This was also played by a 32-nm technological process, thanks to which the regular supply voltage does not exceed 1.4 V, but the main merit still belongs to advanced mechanisms for adjusting the clock frequencies and supply voltages. The first generation of this concept was implemented in Phenom II X6, where in the case of a computational load of no more than three streams, the frequencies of three active nuclei could increase by 400 MHz. AMD FX processors offer a much more flexible hike to key performance control parameters. Thus, thanks to the use of power valve transistors, the processor power saving dispatcher is able to disconnect the entire function blocks. In the absence of a load, the computing module, together with the array of the 2nd level cache, can be completely disconnected, released part of the TDP budget. At the same time, the clock frequency and the voltage of active computing modules may increase, and the frequency increase in Max Turbo mode reaches 900 MHz solid. Agree, such an aggressive algorithm of the work of automatic overclocking has not yet met. Moreover, with a uniform load of all computing modules, there is the possibility of increasing the clock frequency of about 300 MHz. Actually, this is the operation of the Turbo Core operation, and it will be active until the processor's power consumption goes beyond the heat pack. In other words, the most concept of "full-time clock frequency" for AMD FX loses its initial meaning.


And everything would be very good if it were not so sad. And the fact is that the planner of operational processes windows systems It is not sufficiently optimized for AMD FX processors. There is a possibility that two streams of one application will be performed on integer computing blocks of different modules, which will not allow the processor to go to Max Turbo mode and requires re-loading data and instructions to the cache memory. In the ideal case, the operating system planner must take into account the architectural features of the Bulldozer, in this case the combination of the use of Turbo Core and Max Turbo should give the maximum positive effect.


It is already known that the scheduler of the tasks of the future Microsoft Windows. 8 will be optimized for work on the Bulldozer processors. As for the day of today, it may be issued an update for the current operating systems, or the AMD programmers will finally develop a "miracle driver" ...

What makes the performance of the processor? Previously, in the go, there was a formula describing the speed as a product of the number of instructions and frequency executables performed in one clock on which this processor operates. Now the third factor appeared in this formula - the number of computational nuclei. Therefore, the developer of processors wishing to release a fast product has several ways for this.

However, not all so simple. An increase in the number of instructions executed by the computing core is a rather difficult task. Classic X86 program code involves consistent execution of commands, and therefore, to achieve their parallel processing, the processor needs to lay highly efficient blocks of transition prediction and reordering instructions, the implementation of which requires considerable engineering efforts. At the same time, the complication of the microarchitecture affects the physical sizes of the crystal and leads to restrictions when increasing the number of cores. So if the manufacturer is going to make a processor with a large number of nuclei, then the microarchitecture is needed, on the contrary, try to simplify. It is not easy and with a clock frequency. The rate on its growth will again require changes to the internal blocks of the processor and elongation of its executive conveyor. As a result, the following is: so that the processor can win the medal for performance, its developers must be pretty much over the simultaneous optimization of a number of parameters.

The problem is also in the fact that any of the selected ways to improve the performance of the processor may be successful only for special cases. Not all programs can effectively work with a large number of cores. Some algorithms do not allow correctly predict transitions and reordering the instructions. And in some cases, performance does not grow with an increase in the clock frequency, because there are some other bottlenecks in the system.

Pick up the optimal balance is not easy, and what to consider the optimality criterion? We can only compare the performance of processors in the end of the programs and choose from them fastest for this particular case. However, this does not guarantee that by applying another set of test tools, we will not get completely opposite assessments. Such a spare entry is given here because today we have to get acquainted with the new series of AMD FX processors - the flagship product of AMD, widely known under the code name Zambezi. The basis of this processor is a very ambiguous microarchitecture of the Bulldozer, which has already managed to collect a considerable bouquet of non-extended reviews. But the point is not at all that this microarchitecture is very bad. Selecting the best balance of characteristics, the developers incorrectly appreciated the needs of most users and made the main emphasis in the "basic formula". As a result, the initial idea of \u200b\u200bthe release of high-performance solution of the new generation went to the Kuyrkom and intrigued breakthrough adherents AMD was not at all expected. However, is this a serious and objective reason for disappointment? Take this and talk in this material.

⇡ We consider the kernel: eight or four?

Working on a new design for productive processors, AMD decided to put the number of computational nuclei under the head of the corner. This is a completely logical choice based on the fact that over the years the multi-threaded software is becoming more and more and the development of a microarchitecture designed for many years of development should be considered primarily not the current state of the market, but the observed trends. Eight cores provided for in the basic version of the new processor - this is what AMD was going to conquer the market on which only chips were represented, maximum amount The nuclei in which was limited to six. ( Here we are talking only about desk computers. — approx. ed. )

At the same time, the kernels of the old microarchitecture of the K10 did not want the developers. They not only have too large physical size, but also, how can you judge by Llano, are not prone to functioning at high clock frequencies Even after the translation to modern 32-nm technology. In addition, they do not support many modern opportunities, such as, for example, AVX instructions. Therefore, for the assembly of eight-year AMD made a new microarchitecture - Bulldozer. Representatives of the company prefer to say that its development was carried out with a clean sheet, but in fact, in the Bulldozer nuclei, it is possible to find a lot of reference to another microarchitecture presented this year - Bobcat, focused on applying in compact and energy efficient devices. However, the relationship between Bulldozer and Bobcat is quite remote, and we mention it only to become clear the overall idea - in Bulldozer a lot of relatively simple nuclei is united.

At the same time, we are not at all about primitive alignment on one semiconductor crystal of eight simple cores. With this scenario, the resulting processor would have a very low single-threaded performance, and this would be a sufficiently serious problem, since programs that do not crush the load into several computing flows, not so much. Therefore, firstly, the kernels were optimized on high clock frequencies. And secondly, they were paired in dual-core modules that are able to share their resources for the benefit of servicing one stream. As a result, a rather curious design was obtained: the input part of the executive conveyor in such a dual-core module is common, and in the future the instructions are processed between the two sets of actuators.

Bulldozer design base - conventionally called dual-core module

Recall the data processing process in modern processor Includes multiple steps: Selecting X86 instructions from the cache memory, their decoding is the translation into internal macropration, execution, recording of results. The first two stages in the BULLDOZER module are produced for a pair of cores together, and then for integer instructions, execution is distributed over two cluster nucleus or, in the case of real arithmetic, it is carried out in general for two cores of a floating point operations.

The BullDozer modules are designed to process four tact instructions, and thanks to the macrosium technology, some x86 instruction pairs can be considered as a processor as one operation. This means that, in general, the Bulldozer dual-core module is similar in its capacity to one of the core of modern Intel processors, which can also process four instructions for the clock and at the same time also support macros.

However, between the Bulldozer module and the Sandy Bridge core there are significant differences that can deliver them about the same theoretical rate of doubt. In view of the fact that the Module of new AMD processors contains the remains of two equal nuclei, maximum productivity It can only demonstrate when processing a pair of streams. If the single-flow load lies on it, then its service speed will be limited to the number of actuators within one such cluster. And they are there, given the desire to AMD to simplify individual nuclei, not so much - one and a half times less than in processors with microarchitecture Sandy Bridge or K10. That is, two arithmetic ALU and two targeted AGUs.

So looks like functional device The module built on the microarchitecture of the Bulldozer. Only two sets of integer executive devices remained from two nuclei.

The relatively low complexity has a common floating point operation unit on the processor module. It includes two 128-bit FMAC performing devices, which for processing 256-bit instructions can be combined into a single integer. It would seem that there are not much executive devices here, especially with regard to what they are divided into a couple of cores. But they are more versatile than in previous and competing microarchitets, which use individual multipliers and adders. And thanks to this, at certain cases, when working with real numbers, the Bulldozer dual-core module can provide comparable and even higher performance than, for example, one Sandy Bridge core.

A similar idea of \u200b\u200bcombining 128-bit devices to work with 256-bit instructions is used in Sandy Bridge

However, the BullDozer module must exhibit their strongest sides with a two-position load. One Sandy Bridge core is also capable of processing two computing streams, for this it has Hyper-Threading technology. However, all instructions are directed to one set of executive devices, which in practice causes numerous collisions. In the BullDozer module, two independent integer clusters are stored, which can execute streams in parallel, and the total number of actuators in them exceeds the number of such devices in the Sandy Bridge kernel one and a half times.

On the left - the Bulldozer module, on the right - a kind of competing core with Hyper-Threading support. In fact, it is not very similar to Sandy Bridge, but the essence of the problem is transmitted.

As a result, the Bulldozer module has a higher peak performance than the Sandy Bridge kernel, but it is somewhat more complicated to reveal this performance. The Sandy Bridge core intellectually loads its own resources due to advanced intra-processor logic, independently disassembled single-flow code and executing it in parallel to the full set of its executive devices. In the BULLDOZER, the task of effective use of actuators is partially shifted on a programmer, which must split its code for two streams - the full load of all the power of the module will become possible only then.

And this is characteristic. Considering the Bulldozer processor dual-core module, we all compared it with one Sandy Bridge core, and at the same time we managed to spend quite correct parallels. It makes you think - is it worth it to consider the "eight-year" new microarchitecture by the generation of fantasy marketers? AMD says that the nucleus should be considered by the number of intelligent clusters, arguing that the module is able to provide up to 80% of the performance of two independent cores. However, we should not forget that the kernels based on the Bulldozer is significantly easier than the nuclei of other processors. Therefore, the number of dual-core modules is the characteristic reflecting the performance of the Bulldozer is much more adequate.

Find the maximum number of processor nuclei and get a job in the Marketing Department AMD

⇡ Kesh memory

The Kesh memory organization in the Bulldozer processors is also "tied" not so much to separate kernels as to dual-core modules. In fact, each kernel is allocated only its own first-level data cache, all other levels of cache memory refer either to the module as a whole or to the processor:

  • Each kernel has its own first-level kesh for data. Its volume is 16 KB, and the architecture assumes the presence of four associativity channels. This cache works according to the algorithm with a pass-through record, which means its inclusiveness.
  • The first level cache for instructions is presented in a single copy for each two-processor module. Its volume is 64 KB, and the number of associativity channels is two.
  • The second-level cache is also implemented in a single instance module. Its size is an impressive 2 MB, associativity - 16 channel, and the work algorithm is exclusive.
  • In addition, the eight-core processor as a whole has an 8-megabyte L3 cashem with a 64-channel associativity. The peculiarity of this Kesha is to work on a significantly smaller compared to the frequency processor itself, which is about 2 GHz.

The following table describes the ratio of the volume of the oxide of the processors of eight-casual Bulldozer, four-core Sandy Bridge and Thuban (six-core Phenom II X6, built on the microarchitecture K10).

Type Kesha Bulldozer (8 nuclei / 4 modules) Sandy Bridge (4 kernels) Thuban (6 cores)
L1i (instructions) 4x64 KB 4x32 KB 6x64 KB
L1D (data) 8x16 KB 4x32 KB 6x64 KB
L2. 4x2 MB 4x256 KB 6x512 KB
L3. 8 MB, 2.0-2.2 GHz 8 MB, works at the processor frequency 6 MB, 2.0 GHz

As can be seen on the table, AMD made a bet on spacious keach upper levelsWhat can be really useful in case of serious multi-threaded load. However, the cache memory in new processors as a whole works more slowly than in previous and competing products. It is easily detected when measuring practical latency.

Big delays when accessing data in Bulldozer can be compensated only by a high clock frequency of these CPUs. What, however, was planned initially - in frequencies, new eight cores should exceed Phenom II by 30%. However, AMD was not able to design semiconductor crystals capable of working steadily at such high frequency values. As a result, the high latency of cache memory is able to apply to the BullDozer system defined damage.

AMD processors with a fundamentally new architecture of the Bulldozer frankly frank not only fans of the company's products, but also many users who follow IT progress. In the past few years, offering interesting solutions for the price / performance ratio, AMD is mainly concentrated on the devices of the initial and middle levels. Resting the FX line, it is obvious that the company expects to draw attention and more discerning enthusiasts ready to experiment and require maximum speeds. We will study the possibilities of a new family on the example of the world's first eight-cherished processor for desktops - AMD FX-8150. Let's see if the manufacturer will be able to justify the expectations of their fans.

Unlike the main competitor who can afford to follow the pendulum principle of the development of CPU, conducting a change of architectures and technological processes with annual periodicity, AMD does not outline for its projects of a certain time frame, relying on the march of the market and its own technological potential. The story with the architecture of Bulldozer began a long time ago. It was assumed that she was introduced back in 2009, but due to the different circumstances, the practical embodiment of bold engineering solutions in silicon was only possible now.

Bulldozer for AMD is serious and for a long time. This microarchitecture for the next few years will be the basis for future processors from various segments: server, desktop and mobile. This concerns both discrete CPUs and hybrid - APU also over time is planned to be transformed under the Bulldozer. Only for compact AMD systems are going to use chips on economical Bobcat and its upgraded versions. With the announcement of the Bulldozer, the company decided to revive the legendary series, submitting the AMD FX line processors, which received a new architecture and are manufactured using the most advanced 32-nanometer technological process.

Features of architecture

Bulldozer chips are based on modules with two X86 computing blocks. At the same time, the latter are not completely autonomous - some resources are common to both nuclei. In particular, the preliminary sample unit, instruction decoder, FPU and second-level cache memory (L2). The monolithic dual-core module ensures the simultaneous execution of two streams, but with certain reservations. According to the manufacturer's calculations, this approach is quite acquitted and allows you to get about 80% of the effectiveness of full physical nuclei. However, the number of transistors, and, accordingly, the crystal area and its power consumption are significantly reduced.

Taking into account the new structure, the internal architecture was very seriously reworked, which actually affected all the executive blocks. Similarities with K10, which was used for Phenom II and Athlon II chips, is practically no. AMD introduced support for AVX, SSE 4.2 and AES-NI instructions and added Own FMA4 and XOP sets.

Like the top processors, Phenom, FX chips received a three-tier caching system. However, its organization is also noticeably different about the one that was in predecessors. L1 cache data decreased from 64 KB to 16 kb, at the same time increased its significantly bandwidth. L2 2 MB is common to both cores of each module. Depending on the number of the last, the total tank of the second-level cache memory in the AMD FX processor can be from 4 to 8 MB. Its latency is somewhat increased - the fee for optimization to work at elevated frequencies. The chips with the architecture of the Bulldozer are also equipped with a L3-cashem of 8 MB. Given the exclusive work scheme, the total volume of the buffer is rather impressive as for desktop models. The enhanced data pre-election algorithm allows you to hope that the speed of the memory subsystem will be increased. As for RAM directly, the CPU FX supports DDR3-1866 modules in two-channel mode.

For the production of AMD FX, a 32-nanometer technical process with SOI technology is used, similar to the manufacture of APU Llano. Chips are produced at the facilities of the related company GlobalFoundries. CPU is based on an eight-cherished crystal area of \u200b\u200b315 mm2. According to the topology, its most is discharged to cache memory, therefore it is not surprising that the total number of transistors in this case is an impressive 2 billion. For comparison: Six-core Phenom II X6 (Thuban) include "total" 904 million transistors, but due to 45 - Nonometer technical process The crystal area is 346 mm2. Given the difference in the area, it can be assumed that the cost of FX chips is lower than that of the predecessors. However, the transition to 32 nm is not easy to give GlobalFoundries. AMD has already reported difficulties with the release of suitable blanks, in view of which the company cannot fully meet the demand for hybrid Llano. Let's hope that this will not affect the availability of FX, and everyone will be able to purchase them.

For four- and six-core models, the same crystal will be used, which will make it possible to efficiently manage chips that have certain defects. Meanwhile, it is likely that for the production of CPU data will be used and fully working crystals with deactivated modules will be applied. And in this case, you can count on the next lottery with unlocking disconnected nuclei. Perfect would be a way to warm up interest in AMD FX processors.

Technical characteristics of processors
Model FX-8150. Phenom II x6 1075t Phenom II x4 975 Core i7-2600K. Core i5-2500K.
Code name Bulldozer Thuban. Deneb. Sandy Bridge. Sandy Bridge.
Number of nuclear / streams, pcs. 8/8 6/6 4/4 4/8 4/4
Basic clock frequency, GHz 3,6 3 3,6 3,4 3,3
Clock frequency after autorano, GHz 3,9/4,2 3,5 3,8 3,7
Volume of cache memory L2 / L3, MB 8/8 6 × 0.5 / 6 4 × 0.5 / 6 4 × 0.25 / 8 4 × 0.25 / 6
Production technology, nm 32 45 45 32 32
Processor connector AM3 +. AM3. AM3. LGA1155. LGA1155.
Power consumption (TDP), W 125 125 125 95 95
Recommended price, $ 245 181(162*) 175 (160*) 317 (315*) 216 (225*)
* According to the catalog Hotline.ua.

Turbo Core.

The technology of dynamic increasing frequency Turbo Core was previously used by AMD for Six-core Thuban and APU Llano. FX processors have new mechanism and the operation algorithm for this function. In the case when the power consumption of the chip is laid in the framework of its TDP, and the temperature does not exceed the specified value, the frequency can automatically increase (100-300 MHz) even in a situation where all the kernel are active (all Core Boost.). If at least half of the modules are idle, the AMD FX can switch to MAX TURBO Boost mode, raising the supply voltage and a very significant clock frequency of working blocks (up to 900 MHz).

AMD also threw out the improvement in the efficiency of new chips. Given the increase in the number of computational nuclei, relying only on the effect of the use of a more subtle process cannot be. In the absence of a load on both processor kernels within one module and switching them to the C6 power saving state, power transistors allow you to turn off the power from this node, reducing the overall CPU consumption.

Logic support

Like the previous AMD desktop platform, tire controller PCI EXPRESS. 2.0 remained the prerogative of the northern bridge of the chipset, and did not move under the cover of the processor. It is the number of supported lines this interfaceAs a result, the ability to build configurations with multiple video cards has become the determining differences between new logic sets for ZAMBEZI chips. At the disposal of the top AMD 990FX there are 42 liter with the possibility of layout on graphic needs as 2 × 16x or 4 × 8x. AMD 990X has 26 lines and allows only two video cards in CrossFirex or SLI mode in a 2 × 8x configuration. Well, AMD 970 at the same number of the PCI-E links offers to be content with one adapter. In all cases, the periphery serves south Most SB950, which does not carry any interesting innovations: Six SATA ports 6 GB / C with the ability creating raid (0,1,5,10), up to 14 USB 2.0 connectors, work with PCI. Alas, in contrast to AMD A75 chipset for the FM1 platform, the USB 3.0 speed tire support is not here.

AM3 + platform

To work with the FX series processors, you need a motherboard with AM3 + connector. This can be like a model on the "new" chipset AMD 9xx and the product with the logic of previous generations. Compatibility with AM3 is theoretically possible, but not guaranteed neither AMD itself or manufacturers of motherboards. It is possible that the latter will release firmware for their top solutions, but these are rather single cases. And even in such situations, FX chips will function with a reduced switching speed of TURBO Boost and Cool'n'quiet. In this case, everything possible problems With the operation of the system will fall on the shoulders of users. Therefore, it is not necessary to count on a trouble-free upgrade in this case.

Cards with AM3 + easy to distinguish in black processor connector, while AM3 is white connector. Fortunately, the design of the fastening elements is not changed, because any cooler compatible with AM2 / AM2 + / AM3 is suitable for cooling AMD FX.

The lineup

3DMark 11, Test CPU (Physics), points
3DMark Vantage, scores
PC Mark 7, COMPUTIATION test, scores
Cinebench 11.5, points
x264 HD BENCHMARK 4.0, Frames / C
7-Zip 9.20, MIPS
Far Cry 2, 1920 × 1080, DX10, high quality, Frames / C
Hard Reset, 1920 × 1080, High Mode, Frames / C
Metro 2033, 1920 × 1080, DX11, Physx, High Quality, Frames / C
Colin McRea: Dirt 3, 1920 × 1080, high quality, frames / c
Lost Planet 2, 1920 × 1080, DX11, High Quality, Test B, Frames / C
Crysis 2, 1920 × 1080, DX9, High Quality, Downtown Test, Frames / C
Power consumption system, W

Thanks modular structure The company's processors are easy to build their model range by offering devices with various quantities computing blocks and clock frequencies. At the start of the desktop chip ruler, called Zambezi, includes four CPUs. The flagship is an eight-core solution FX-8150 with a frequency formula of 3.6 / 3.9 / 4.2 GHz. 8 MB of Kesh memory L2 and L3, as well as TDP at 125 W. Similar to equipment and FX-8120, difference only in the frequency mode of operation - 3.1 / 3.4 / 4.0 GHz. The six-core FX-6100 has 6 MB of second-level cache memory and all the same 8 MB L3, but its thermal package is 95 W. Most affordable version With two modules and four computing blocks x86 FX-4100 operates by 3.6 / 3.7 / 3.8 GHz, 4 MB L2, capacious L3 (8 MB) and TDP in 95 watts are content. As for the cost of devices, then recommended bulk prices For listed models are at the level of $ 245/205/165/115, respectively.

Acceleration

The possibility of unhindered overclocking processors is one of the key parameters of FX chips. On this feature, AMD makes a separate accent. The free multiplier is available to all line models, and the possibility of its change will be present on any card with AM3 +.

The FX architecture was originally created taking into account the functioning at high clock frequencies. The craftsmen armed with liquid nitrogen vessels were able to obtain the CPU-Z screenshot in a situation where the processor worked almost 8.5 GHz. At the same time, truth, it took only one module from four active. All eight nuclei managed to function at 8.1 GHz. Previously similar frequencies achieved except as possible intel version Celeron for LGA775. Now the enthusiasts will have a much more interesting object for overclocker experiments.

In the case of the air cooler system, you will have to be content with more modest results. With increasing supply voltage up to 1.45 in the CPU, it was stably worked by 4.6 GHz. Maybe not so impressive, but the potential is obvious better than the 45-nanometer chips of Phenom II.

RESULTS

Performance test results are presented in diagrams. The picture is quite indicative in order to generally make an opinion on the possibilities of the new AMD development. FX processors expected received speed gain in multi-threaded tasks - archiving, HD video encoding, rendering. Here, an eight-year chip is completely forces to hide with Core i5-2500K, and with more expensive Core i7-2600K. However, as soon as it comes to applications with unimportant optimization for parallel execution of code, AMD FX pass positions - the specific performance of their X86 blocks is even somewhat lower than that of products with the architecture of the K10. In games that, at best, use 3-4 streams, a noticeable advantage in Intel processors. If using maximum settings Chart quality, where the video card becomes the limiter, system indicators are aligned, but it is impossible to estimate the real potential of the CPU in such conditions.

The transition to a 32-nanometer technical process, rather, allowed to keep power consumption at the same level in the increased speed. Probably a priority in this case was the performance, and not improved CPU economy.

Even judging by the cost AMD FX, it is obvious that the company primarily plans to entrust in the average price category, consciously giving the Intel segment of top expensive solutions. IN current conditions Adopted in the League of "Superheldel" objectively the manufacturer cannot now. Making a bet on multi-core calculations, to obtain outstanding results in a weakly optimized software very problematic. At the same time, only five years ago, we were sincerely surprised to whom a quad-core processor may need on the desktop and how to effectively use the resources of such a CPU. Today it is commonplace, and the benefits of chips with so many computing blocks no longer cause issues. Perhaps such recognition for some time will receive eight-year models later.

Fortunately, AMD will not be folded to watch what kind of fate will comprehend her processors. Voiced plans for further development inspire although discreet, but still optimism. The company will continue to actively refine the current architecture, improving both energy efficiency and CPU performance, but the specified pace is 10-15% per year - not very impressive. With such indicators, it is possible to count on a cardinal change in the situation if Intel will slow down the development of its products, but there are no prerequisites for this - the mechanism "Tik-so" has not yet given failures. In the spring of 2012, chips will be presented Ivy Bridge.Completed on 22 nanometer technology and using 3D transistors.

The final assessment of the considered architecture and the AMD FX-8150 processor on it is ambiguous, and this already suggests that the revolution did not happen. At least at this stage, it is invisible for the end user. A high-quality leap of performance takes place on well-paralleled applications, whereas in single-threaded tasks are not observed. The big expectations imposed on the Bulldozer were justified only in part. AMD still has over what to work to offer interesting solutions and compete for a place in the hearts of demanding enthusiasts.