the Internet Windows Android

Standard part of the ASCII code. ASCII Coding (American Standard Code for Information Interchange) - Basic Latiza Text Encoding

Hello, dear blog readers Website. Today we will talk to you about where Krakoyarbra come from and in programs, which text encodings exist and which of them should be used. Let us consider in detail the history of their development, ranging from the basic ASCII, as well as its extended versions of CP866, KOI8-R, Windows 1251 and ending with modern codes of the Unicode UTF 16 and 8 consortium.

Someone this information may seem unnecessary, but you would know how much questions come to me exactly concerned the cracks (not reading a set of characters). Now I will have the opportunity to send everyone to the text of this article and independently search for your shoals. Well, get ready to absorb the information and try to monitor the narration.

ASCII - Basic Latiza Text Encoding

The development of text encodings occurs simultaneously with the formation of the IT industry, and during this time they had time to undergo quite a few changes. Historically, it all started with a rather harmful in Russian pronunciation of EBCDIC, which made it possible to encode the letters of the Latin alphabet, Arabic numbers and punctuation marks with control symbols.

But still the starting point for the development of modern text encodings should be considered a famous ASCII. (American Standard Code for Information Interchange, which in Russian is usually pronounced as "Aski"). It describes the first 128 characters from the most commonly used English-speaking users - Latin letters, Arabic numbers and punctuation marks.

Even in these 128 characters described in ASCII, some service symbols were crushed by brackets, lattices, asterisks, etc. Actually, you yourself can see them:

It is these 128 characters from the initial version of the ASCII have become the standard, and in any other encoding you will definitely meet and stand that they will be in such a manner.

But the fact is that with the help of one byte of the information, it is not 128, but as many as 256 different values \u200b\u200b(two to the degree eight equals 256), so after base version Aski appeared a number of advanced encodings ASCIIIn addition to 128 main signs, it was also possible to encode the national encoding symbols (for example, Russian).

Here, probably, it is worth a little more about the number system that are used in the description. First, as you know everything, the computer works only with numbers in a binary system, namely with zeros and units ("Boulev Algebra", if anyone held at the Institute or at School). Each of which is a decend to a degree, starting with zero, and to twos in the seventh:

It is not difficult to understand that all possible combinations of zeros and units in such a design can only be 256. Translate the number from the binary system in decimal is quite simple. It is necessary to simply fold all the degrees of twos above that one stands.

In our example, it turns out 1 (2 to the degree of zero) plus 8 (two to degrees 3), plus 32 (twice in the fifth degree), plus 64 (in the sixth), plus 128 (in the seventh). Total receives 233 in a decimal number system. As you can see, everything is very simple.

But if you look at the table with ASCII characters, you will see that they are presented in hexadecimal encoding. For example, the "asterisk" corresponds to the paradise of a hexadecimal number 2a. Probably, you know that in a hexadecimal number system, the Latin letters from A (mean ten) to F (means fifteen) are used in a hexadecimal number system.

Well, so for transfer binary numbers In hexadecimal Resort to the next simple and visual way. Each byte of information is broken into two parts of four bits, as shown in the screenshot above. So In each half of the byte, the binary code can only be encode for sixteen values \u200b\u200b(two in the fourth degree), which can be easily represented by hexadecimal.

Moreover, in the left half of the byte, it will be necessary to consider extent again from zero, and not as shown in the screenshot. As a result, by non-good computing, we get that the number E9 is encoded in the screenshot. I hope that the course of my reasoning and the solidification of this rebus you were understandable. Well, now we will continue, actually talk about the text encoding.

Extended versions of ASKI - CP866 and KOI8-R encoding with pseudograph

So, we started talking about ASCII, which was like a starting point for the development of all modern encodings (Windows 1251, Unicode, UTF 8).

Initially, it was laid only 128 signs of the Latin alphabet, Arabic numbers and something else there, but in the extended version it was possible to use all 256 values \u200b\u200bthat can be encoded in one pate information. Those. An opportunity to add symbols of the letters of his tongue to Aska.

Here it will be necessary to once again be distracted to clarify - why do you need encoding texts and why it is so important. The characters on the screen of your computer are formed on the basis of two things - sets of vector forms (representations) of all kinds of characters (they are in files CO) and code that allows you to pull out this set of vector shapes (font file) it is the character to be inserted into Right place.

It is clear that the fonts are responsible for the vector forms, but the operating system and programs used in it are responsible for encoding. Those. Any text on your computer will be a set of bytes in each of which one single symbol of this text is encoded.

The program that displays this text on the screen (text editor, browser, etc.), when parsing the code, reads the encoding of the next sign and searches for the corresponding vector form in the desired file The font that is connected to display this text document. Everything is simple and trite.

So, to encode any symbol we need (for example, from the National Alphabet), two conditions must be completed - the vector form of this sign should be in the font used and this symbol could be encoded in the extended ASCII encodings into one byte. Therefore, there is a whole bunch of such options. Only for coding of the symbols of the Russian language there are several varieties of extended ASSS.

For example, initially appeared CP866.In which it was possible to use the symbols of the Russian alphabet and it was an extended version of ASCII.

Those. Its upper part completely coincided with the basic version of Aska (128 symbols of Latin, numbers and anyone else), which is presented on the screenshot given a little higher, but already bottom part The CP866 encoding tables had the specified in the screenshot slightly lower and allowed to encode another 128 characters (Russian letters and every pseudographic):

See, in the right column, the numbers begin with 8, because The numbers from 0 to 7 refer to the base part of the ASCII (see the first screenshot). So The Russian letter "M" in the CP866 will have code 9C (it is on the intersection of the corresponding rows with 9 and the column with a number C in a hexadecimal number system), which can be written in one byte information, and if there is a suitable font with Russian characters, this letter without Problems will be displayed in the text.

Where did this quantity come from pseudographers in CP866.? It's all the fact that this encoding for Russian text was developed in those bright years, when there was no such distribution of graphic operating systems as now. And in the doss, and similar text operations, the pseudographic allowed at least somehow diversify the design of texts and therefore it abounds with CP866 and all its other rows from the discharge of extended Versions of Aska.

CP866 distributed IBM company, but in addition, a number of encodings were developed for the symbols of the Russian language, for example, the same type (extended ASCII) can be attributed Koi8-R.:

The principle of its work remained the same as the CP866 described later - each text symbol is encoded by one single byte. The screenshot shows the second half of the KOI8-R table, because The first half is fully consistent with the base asus, which is shown on the first screenshot in this article.

Among the features of KOI8-R encoding, it can be noted that the Russian letters in its table are not in alphabetical order, like this, for example, made in CP866.

If you look at the very first screenshot (base part, which enters all extended encodings), then notice that in Koi8-R, Russian letters are located in the same tables of the table as the letters of the Latin alphabet from the first part of the table. This was done for the convenience of switching from Russian symbols to Latin by discarding only one bit (two in the seventh degree or 128).

Windows 1251 - a modern version of ASCII and why crackels get out

Further development of text encodings was due to the fact that graphic operating systems and the need to use pseudographics in them were gaining popularity. As a result, a whole group arose, which, at their essence, was still advanced versions of ASKI (one text symbol is encoded with only one byput of information), but without using pseudographic characters.

They treated the so-called ANSI coding, which were developed by the American Institute for Standardization. The name of Cyrillic was still used in the surchanting for an option with the support of the Russian language. An example of such an example.

It was favorably different from the previously used CP866 and Koi8-R in that the place of the characters of the pseudographic in it took the missing symbols of the Russian typography (the decreasing sign), as well as the symbols used in close to Russian Slavic languages \u200b\u200b(Ukrainian, Belarusian, etc. ):

Because of this abundance of the codings of the Russian language, manufacturers of fonts and manufacturers software He constantly arose a headache, and with you, dear readers, often got those the most notorious krakoyabryWhen the confusion was taught with the version used in the text.

Very often they got out when sending and receiving messages on e-mailWhat caused the creation of very complex transcoding tables, which, in fact, could not solve this problem in the root, and often users for correspondence were used to avoid notorious krakozyabs when using Russian encodings of such CP866, KOI8-R or Windows 1251.

In essence, krakoyarbra, imparting instead of Russian text, were the result of the incorrect use of encoding of this languagewhich did not match the one in which the text message was encoded initially.

Suppose if symbols encoded with CP866, try to display using the Windows 1251 code table, then these most cracked (meaningless set of characters) and get out, completely replacing the message text.

A similar situation is very often occurring at, forums or blogs, when text with Russian characters by mistake is not saved in that encoding that is used on the default website, or not in that text editorwhich adds to the code sebestin not visible to the naked eye.

In the end, such a situation with many encodings and constantly crawling cranebrams, many tired, there were prerequisites for creating a new universal variation, which would have replaced all existing and solve, finally, to the root of the problem with the advent of not readable texts. In addition, there was a problem of languages \u200b\u200bof similar Chinese, where the symbols of the language were much more than 256.

Unicode (Unicode) - Universal Codes UTF 8, 16 and 32

These thousands of signs of the Language group of Southeast Asia could not be described in one pape information that was allocated for encoding characters in advanced ASCII versions. As a result, a consortium was created called Unicode (Unicode - Unicode Consortium) In the collaboration of many IT leaders of the industry (those who produce a software that encodes iron, who creates fonts) who were interested in the appearance of a universal text encoding.

The first variation published under the auspices of the Unicode Consortium was UTF 32.. The digit in the name of the encoding means the number of bits that is used to encode one symbol. 32 bits are 4 bytes of information that will be needed to encode one single sign in the new Universal UTF encoding.

As a result, the same file with the text encoded in the extended version of ASCII and UTF-32 will in the latter case will have the size (weigh) four times more. It is bad, but now we have the opportunity to encode the number of signs equal to two to thirty second degrees with the help of UTF ( billions of characterswhich will cover any real value with a colossal margin).

But many countries with the languages \u200b\u200bof the European Group have such a huge number of signs to use in the encoding at all and there was no need, however, when using UTF-32, they didn't receive a four-time increase in the weight of text documents, and as a result, an increase in Internet traffic and volume stored data. This is a lot, and no one could afford such waste.

As a result of the development of Unicode appeared UTF-16which turned out so successful that was adopted by default as a basic space for all the characters that we use. It uses two bytes to encode one sign. Let's see how this thing looks like.

In the Windows operating system, you can pass along the path "Start" - "Programs" - "Standard" - "Service" - "Character Table". As a result, a table opens with vector forms of all installed in your fonts. If you choose in "additional parameters" a set of unicode characters, you can see for each font separately the entire range of characters included in it.

By the way, clicking on any of them, you can see it two-by code in UTF-16 formatconsisting of four hexadecimal digits:

How many characters can be encoded in UTF-16 using 16 bits? 65 536 (two to sixteen), and this number was taken for the basic space in Unicode. In addition, there are ways to encode with it and about two million characters, but limited to the expanded space in a million text symbols.

But even this successful version of Unicode's encoding did not bring much satisfaction with those who wrote, for example, programs only in English, because they have, after switching from the extended version of ASCII to UTF-16, the weight of the documents increased twice (one byte per one The symbol in ASKI and two bytes on the same symbol in UTF-16).

That's it precisely to satisfy everyone and all in the Unicode consortium was decided to come up with encoding variable length. She was called UTF-8. Despite the eight in the title, it really has a variable length, i.e. Each text symbol can be encoded into a sequence of one to six bytes.

In practice, the UTF-8 uses only a range from one to four bytes, because there is nothing even theoretically possible to submit anything to the four bytes of the code. All Latin signs are encoded in one byte, as well as in the old good ASCII.

What is noteworthy, in the case of coding only Latin, even those programs that do not understand Unicode will still read what is encoded in UTF-8. Those. The basic part of Aska simply switched to this off the Unicode Consortium.

Cyrillic signs in UTF-8 are encoded into two bytes, and, for example, Georgian - in three bytes. The Unicode Consortium after the creation of UTF 16 and 8 decided the main problem - now we have in the fonts there is a single code space. And now their manufacturers remain only on the basis of their forces and opportunities to fill it with vector forms of text symbols. Now in the sets even.

In the Symbol table below, it can be seen that different fonts support a different number of characters. Some symbols of Unicode fonts can weigh very well. But now they are not distinguished by the fact that they are created for different encodings, but by the fact that the font manufacturer filled or not filled the single code space by those or other vector forms to the end.

Krakoyabry instead of Russian letters - how to fix

Let's now see how the Crakozyabe text appears instead of the text or, in other words, how the correct encoding is selected for Russian text. Actually, it is set in the program in which you create or edit this same text, or code using text fragments.

For editing and creating text files, I personally use very good, in my opinion. However, it can highlight the syntax still good hundreds of programming languages \u200b\u200band markup, and also has the ability to expand with plugins. Read detailed review This wonderful program according to the link.

In the NotePad ++ top menu, there is an "encoding" item, where you will have the ability to convert an existing option to one that is used on your default site:

In the case of a site on Joomla 1.5 and above, as well as in the case of a blog on Wordpress, you should choose the option in order to avoid the appearance of krakoyar UTF 8 without BOM. What is the BOM prefix?

The fact is that when the ETF-16 encoding was developed, for some reason decided to fasten such a thing to it as the ability to record a symbol code, both in direct sequence (for example, 0a15) and in the reverse (150a). And in order for the programs to understand which sequence reading codes, and was invented BOM. (Byte Order Mark or, in other words, signature), which was expressed in adding three additional bytes to the very beginning of documents.

In the utf-8 encoding, there were no BOM in the Unicode Consortium and therefore adding signature (these most notorious additional three bytes to the beginning of the document) Some programs simply prevent reading the code. Therefore, we always, when saving files in UTF, you must select an option without BOM (without signature). So you are in advance mustrase yourself from crackering.

What is noteworthy, some programs in Windows do not know how to do this (do not be able to save text in UTF-8 without BOM), for example, the same notorious notebook Windows. It saves the document in UTF-8, but still adds signature to its beginning (three additional bytes). Moreover, these bytes will always be the same - read the code in direct sequence. But on the servers, because of this little things, there may be a problem - crackels will get out.

Therefore, in no case do not use the usual notebook Windows To edit documents of your site, if you do not want the appearance of krakoyarbra. I consider the latest and easiest option for the already mentioned NotePad ++ editor, which practically does not have drawbacks and consists of one of the advantages.

In NotePad ++ when choosing an encoding, you will have the ability to convert text to the UCS-2 encoding, which is very close to the Unicode standard in essence. Also in a non-type can be encoded in ANSI, i.e. With reference to the Russian language, this will be already described by us just above Windows 1251. Where does this information come from?

She is spelled out in the registry of your operating room windows systems - What encoding to choose in the case of ANSI, what to choose in the case of OEM (for the Russian language it will be CP866). If you install another default language on your computer, then these encodings will be replaced with similar to ANSI or OEM discharge for the same language.

After you in NotePad ++, save the document in the encoding you need or open a document from the site to edit, then in the lower right corner of the editor you can see its name:

To avoid krakoyarbrovexcept the actions described above will be useful to register in his cap source code All site pages information about this coding, so that the server or local host does not occur.

In general, in all languages \u200b\u200bof hypertext marking other than HTML, a special XML ad is used, which specifies the text encoding.

Before starting to disassemble the code, the browser will find out which version is used and how exactly you need to interpret the codes of the characters of this language. But what is noteworthy, in case you save the document in the default Unicode, this XML declaration can be omitted (the encoding will be considered UTF-8, if there is no BOM or UTF-16 if there is a BOM).

In the case of a document hTML language To specify the encoding used meta elementwhich is prescribed between the opening and closing HEAD tag:

... ...

This entry is quite different from the accepted B, but fully complies with the newly introduced slowly by the HTML 5 standard, and it will be absolutely correctly understood by anyone used on this moment browsers.

In theory, META element with an indication of encoding HTML document Better to put as high as possible in the dock headerSo that at the time of the meeting in the text of the first sign is not from the basic ANSI (which always read always and in any variation) the browser must already have information on how to interpret the codes of these characters.

Good luck to you! To ambiguous meetings on the blog pages Website

see more Rollers you can go on
");">

You may be interested

What is the URL addresses than the difference between absolute and relative links for the site
OpenServer - modern local server and an example of its use for wordPress installations on computer
What is chmod, which access rights to assign files and folders (777, 755, 666) and how to do through PHP
Search Yandex on site and online store

Symbol overlay

Thanks to the BS symbol (return to step), one character over the other can be printed on the printer. In ASCII, it has been addressed to add diacritic to letters, for example:

  • a bs "→ Á
  • a BS `→ à
  • a bs ^ → Â
  • o bs / → Ø
  • c BS, → ç
  • n BS ~ → ñ

Note: In the old fonts apostrophe "drew a slope to the left, and Tilda ~ was shifted up, so that they just fit the role of Akut and Tilde from above.

If the same symbol is superimposed on the symbol, the effect of the bold font is obtained, and if emphasis is superimposed on the symbol, it turns out underdend the text.

  • a BS A → a.
  • a BS _ → a.

Note: This is used, for example, in the MAN reference system.

National ASCII options

ISO 646 (ECMA-6) Standard provides for the possibility of placing national characters in place @ [ \ ] ^ ` { | } ~ . In addition to this, in place # May be placed £ , and in place $ - ¤ . Such a system is well suited for European languages, where only a few additional characters are needed. An ASCII version without national symbols is called US-ASCII, or "International Reference Version".

Subsequently, it turned out more convenient to use 8-bit encodings (code pages), where the lower half of the code table (0-127) occupy US-ASCII characters, and the upper (128-255) are additional characters, including a set of national symbols. Thus, the upper half of the ASCII table to the ubiquitous implementation of Unicode was actively used to represent localized symbols, local letters. The absence of a single standard for placing Cyrillic characters in the ASCII table delivered many encoding problems (koi-8, Windows-1251 and others). Other languages \u200b\u200bwith nonlaining writing also suffered due to the presence of several different encodings.

.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A.a. .B .C.c. .D. .E.e. .F.
0. Nul. SOM EOA. EOM EQT. Wru. Ru Bell. BKSP. HT LF. Vt. FF. CR SO. SI
1. DC 0. DC 1. DC 2. DC 3. DC 4. Err. Sync. Lem. S 0. S 1 S 2. S 3. S 4. S 5. S 6. S 7.
2.
3.
4. Blank ! " # $ % & " ( ) * + , - . /
5. 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
6.
7.
8.
9.
A. @ A. B. C. D. E. F. G. H. I. J. K. L. M. N. O.
B. P. Q. R. S. T. U. V. W. X. Y. Z. [ \ ]
C.
D.
E. a. b. c. d. e. f. g. h. i. j. k. l. m. n. o.
F. p. q. r. s. t. u. v. w. x. y. z. ESC Del.

On those computers where the minimally addressable unit of memory was a 36-bit word, initially 6-bit characters were used (1 word \u003d 6 characters). After switching to ASCII on such computers, either 5 seven-bit characters began to be placed in one word (1 bit remained superfluous) or 4 ninebitant characters.

ASCII codes are also used to determine the key under programming. For standard QWERTY keypad, the code table looks like this:

Many characters with which the text is written, called alphabet.

The number of characters in the alphabet is its power.

Formula for determining the number of information: N \u003d 2 b,

where n is the power of the alphabet (the number of characters),

b - the number of bits (symbol information).

Alphabet with a power of 256 characters can be placed almost all the necessary characters. Such an alphabet is called sufficient.

Because 256 \u003d 2 8, then weight 1 symbol - 8 bits.

Unit of measurement 8 bits appropriated name 1 byte:

1 byte \u003d 8 bits.

The binary code of each symbol in the computer text takes 1 memory byte.

What way text information is presented in the computer's memory?

Convenience of over-off symbol encoding is obvious, because bytes - the smallest addressable part of the memory and, therefore, the processor can refer to each character separately by performing text processing. On the other hand, 256 characters are quite enough to represent the most varied symbolic information.

Now the question arises which eight-bit binary code to put in line with each symbol.

It is clear that this is a conditional matter, you can come up with many encoding methods.

All symbols of the computer alphabet are numbered from 0 to 255. Eight-bit sorts of each number. binary code from 00000000 to 11111111. This code is simply the sequence number of the symbol in the binary number system.

A table in which all the characters of the computer alphabet are made in compliance with the sequence numbers, is called the encoding table.

For different types EUM uses various encoding tables.

The International Standard for PC has become a table ASCII.(Reading ASKI) (American standard code for information exchange).

The ASCII code table is divided into two parts.

The international standard is only the first half of the table, i.e. Symbols with numbers from 0 (00000000), before 127 (01111111).

ASCII Encoding Table Structure

Serial number

The code

Symbol

0 - 31

00000000 - 00011111

Symbols with numbers from 0 to 31 are called managers.
Their function is to control the process of outputting text on the screen or print, the sound signal, text markup, etc.

32 - 127

00100000 - 01111111

Standard part of the table (English). This includes lowercase and capital letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of brackets, commercial and other characters.
Symbol 32 - space, i.e. Empty position in the text.
All other are reflected by certain signs.

128 - 255

10000000 - 11111111

Alternative part of the table (Russian).
The second half of the ASCII code table, called the code page (128 codes, starting with 10,000,000 and ending 11111111), may have different options, each option has its own number.
The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, the symbols of the Russian alphabet are placed in this part of the table.

The first half of the ASCII codes table


I draw your attention to the fact that in the table of encoding the letter (uppercase and lowercase) are arranged in alphabetical order, and the numbers are ordered by increasing values. Such adherence to the lexicographic order in the arrangement of the symbols is called the principle of sequential coding of the alphabet.

For the letters of the Russian alphabet, the principle of serial coding is also observed.

The second half of the ASCII codes table


Unfortunately, there are currently five different Cyrillic encodings (koi8-p, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with the transfer of Russian text from one computer to another, from one software system to another.

Chronologically one of the first standards of coding Russian letters on computers was koi8 ("information exchange code, 8-bit"). This encoding was used in the 70s on the computers of the EU EU series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the beginning of the 90s, the time of domination of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization, ISO) approved another encoding called ISO 8859-5 as a standard for Russian language.

The most common is currently encoding. Microsoft Windows.denoted by the reduction of CP1251.

Since the end of the 90s, the problem of standardization of symbol coding is solved by the introduction of a new international standard called Unicode.. This is a 16-bit encoding, i.e. In it, each symbol is given 2 byte of memory. Of course, the amount of memory occupied by 2 times. But this code table allows you to enable up to 65536 characters. The complete specification of the Unicode standard includes all existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Let's try using the ASCII table to imagine how words will look in the computer's memory.

Internal word view in computer memory

Sometimes it happens that the text consisting of the letters of the Russian alphabet, obtained from another computer, cannot be read - some "abrakadabra" can be visible on the monitor screen. This happens because the computers use different encoding of the symbols of the Russian language.

DEC Hex. Symbol DEC Hex. Symbol
000 00 specialist. Nop. 128 80 Ђ
001 01 specialist. Soh. 129 81 Ѓ
002 02 specialist. STX. 130 82
003 03 specialist. Etx 131 83 ѓ
004 04 specialist. EOT. 132 84
005 05 specialist. Enq 133 85
006 06 specialist. ACK. 134 86
007 07 specialist. Bel. 135 87
008 08 specialist. BS. 136 88
009 09 specialist. Tab. 137 89
010 0a.specialist. LF. 138 8A.Љ
011 0b.specialist. Vt. 139 8b.‹ ‹
012 0c.specialist. FF. 140 8c.Њ
013 0d.specialist. CR 141 8d.Ќ
014 0E.specialist. SO. 142 8e.Ћ
015 0F.specialist. SI 143 8f.Џ
016 10 specialist. DLE. 144 90 ђ
017 11 specialist. DC1 145 91
018 12 specialist. DC2. 146 92
019 13 specialist. DC3 147 93
020 14 specialist. DC4 148 94
021 15 specialist. NAK. 149 95
022 16 specialist. Syn. 150 96
023 17 specialist. Etb. 151 97
024 18 specialist. CAN 152 98
025 19 specialist. Em. 153 99
026 1A.specialist. Sub. 154 9A.љ
027 1b.specialist. ESC 155 9b.
028 1C.specialist. FS. 156 9C.њ
029 1d.specialist. GS. 157 9d.ќ
030 1e.specialist. RS. 158 9e.ћ
031 1f.specialist. US. 159 9f.џ
032 20 cup. SP (space) 160 A0.
033 21 ! 161 A1 Ў
034 22 " 162 A2.ў
035 23 # 163 A3.Ћ
036 24 $ 164 A4.¤
037 25 % 165 A5.Ґ
038 26 & 166 A6.¦
039 27 " 167 A7.§
040 28 ( 168 A8.E.
041 29 ) 169 A9.©
042 2a.* 170 AA.Є
043 2b.+ 171 AB«
044 2c., 172 AC¬
045 2d.- 173 AD­
046 2e.. 174 AE®
047 2f./ 175 AFЇ
048 30 0 176 B0.°
049 31 1 177 B1.±
050 32 2 178 B2.І
051 33 3 179 B3.і
052 34 4 180 B4.ґ
053 35 5 181 B5.µ
054 36 6 182 B6.
055 37 7 183 B7.·
056 38 8 184 B8.e.
057 39 9 185 B9.
058 3A: 186 BA.є
059 3b.; 187 BB.»
060 3c.< 188 BC.ј
061 3D= 189 BD.Ѕ
062 3E.> 190 BE.ѕ
063 3F.? 191 BF.ї
064 40 @ 192 C0. BUT
065 41 A. 193 C1. B.
066 42 B. 194 C2. IN
067 43 C. 195 C3. G.
068 44 D. 196 C4. D.
069 45 E. 197 C5. E.
070 46 F. 198 C6. J.
071 47 G. 199 C7 Z.
072 48 H. 200 C8. AND
073 49 I. 201 C9. J.
074 4A.J. 202 CA. TO
075 4b.K. 203 CB. L.
076 4C.L. 204 CC. M.
077 4d.M. 205 CD N.
078 4E.N. 206 CE ABOUT
079 4f.O. 207 CF. P
080 50 P. 208 D0. R
081 51 Q. 209 D1 FROM
082 52 R. 210 D2. T.
083 53 S. 211 D3. W.
084 54 T. 212 D4. F.
085 55 U. 213 D5. H.
086 56 V. 214 D6. C.
087 57 W. 215 D7. C.
088 58 X. 216 D8. Sh
089 59 Y. 217 D9 Sh
090 5aZ. 218 DA Kommersant
091 5b.[ 219 DB S
092 5C.\ 220 DC B
093 5d.] 221 DD E.
094 5e.^ 222 DE. YU
095 5f._ 223 Df. I
096 60 ` 224 E0 but
097 61 a. 225 E1. b.
098 62 b. 226 E2. in
099 63 c. 227 E3. g.
100 64 d. 228 E4. d.
101 65 e. 229 E5 e.
102 66 f. 230 E6. j.
103 67 g. 231 E7. z.
104 68 h. 232 E8. and
105 69 i. 233 E9. j.
106 6Aj. 234 EA. to
107 6b.k. 235 EB. l.
108 6C.l. 236 EC m.
109 6d.m. 237 ED n.
110 6E.n. 238 EE about
111 6f.o. 239 EF. p
112 70 p. 240 F0. r
113 71 q. 241 F1 from
114 72 r. 242 F2. t.
115 73 s. 243 F3. w.
116 74 t. 244 F4. f.
117 75 u. 245 F5 h.
118 76 v. 246 F6. c.
119 77 w. 247 F7. c.
120 78 x. 248 F8. sh
121 79 y. 249 F9. sh
122 7A.z. 250 FA. kommersant
123 7b.{ 251 FB. s
124 7c.| 252 FC. b
125 7d.} 253 FD. e.
126 7e.~ 254 FE. yu
127 7f.Specialist. Del. 255 FF. i

ASCII Windows Symbols Table.
Description of special (managing) characters

It should be noted that the initially control symbols of the ASCII table were used to ensure the exchange of telethip data, data entry with punctuents and for the simplest management of external devices.
Currently, most of the managers symbols ascii. Tables no longer carry this load and can be used for other purposes.
The code Description
NUL, 00.NULL, empty
SOH, 01.Start of Heading, header start
STX, 02.Start of Text, the beginning of the text.
ETX, 03. End of Text, end text
EOT, 04.End of TRANSMISSION, end of the transfer
ENQ, 05.Enquire. Please confirm
Ack, 06.Acknowledgment. I confirm
Bel, 07.Bell, bell
BS, 08.Backspace, return to one character back
Tab, 09.Tab, horizontal tabulation
LF, 0A.Line feed, row translation.
Now in most programming languages \u200b\u200bis indicated as \\ n
VT, 0B.Vertical Tab, vertical tab.
FF, 0C.Form Feed, page run, new page
CR, 0D.CARRIAGE RETURN, RETURN CARETS.
Now in most programming languages \u200b\u200bis indicated as \\ r
SO, 0E. SHIFT OUT, change the color of the coloring ribbon in the printing device
Si, 0F. SHIFT IN, Return the color of the coloring ribbon in the printing device back
DLE, 10. Data Link Escape, switching channel to data transmission
DC1, 11.
DC2, 12.
DC3, 13.
DC4, 14.
Device Control, Device Management Symbols
NAK, 15.Negative Acknowledgment, do not confirm.
SYN, 16.Synchronization. Synchronization symbol
ETB, 17.End of Text Block, End of Text Block
CAN, 18.Cancel, canceled transmitted earlier
EM, 19.End of Medium, end of the data carrier
SUB, 1A.Substitute, substitute. It is placed on the site of the symbol, the value of which was lost or spoiled during transmission
ESC, 1B.Escape control sequence
FS, 1C.File Separator, File Separator
GS, 1D.Group Separator, Group Separator
RS, 1e.Record Separator, Record Separator
US, 1F.Unit Separator, Unit Separator
Del, 7f.Delete, erase the last symbol.

[8-bit encodings: ASCII, Koi-8R and CP1251] The first encoding tables created in the United States did not use the eighth bit in the pate. The text was presented as a sequence of bytes, but the eighth bit was not taken into account (it was applied for official purposes).

The table has become a generally accepted standard. ASCII. American Standard Code for Information Interchange). The first 32 ASCII table symbols (from 00 to 1f) were used for non-printable characters. They were designed to control the printing device, etc. The rest is from 20 to 7f - the usual (printed) characters.

Table 1 - ASCII encoding

DECHex.Oct.Char.Description.
0 0 000 nULL
1 1 001 start of Heading.
2 2 002 start of Text.
3 3 003 end of TEXT.
4 4 004 end of TRANSMISSION.
5 5 005 enquiry.
6 6 006 acknowledge.
7 7 007 bell.
8 8 010 backspace.
9 9 011 horizontal Tab.
10 A. 012 nEW LINE
11 B. 013 vertical Tab.
12 C. 014 new Page
13 D. 015 carriage Return.
14 E. 016 sHIFT OUT.
15 F. 017 sHIFT IN.
16 10 020 data Link Escape.
17 11 021 device Control 1.
18 12 022 device Control 2.
19 13 023 device Control 3.
20 14 024 device Control 4.
21 15 025 negative Acknowledge.
22 16 026 synchronous Idle
23 17 027 end of Trans. block
24 18 030 cancel.
25 19 031 end of Medium
26 1A. 032 substitute.
27 1b. 033 escape.
28 1C. 034 file Separator.
29 1d. 035 group separator.
30 1e. 036 record Separator.
31 1f. 037 unit separator.
32 20 040 space.
33 21 041 !
34 22 042 "
35 23 043 #
36 24 044 $
37 25 045 %
38 26 046 &
39 27 047 "
40 28 050 (
41 29 051 )
42 2a. 052 *
43 2b. 053 +
44 2c. 054 ,
45 2d. 055 -
46 2e. 056 .
47 2f. 057 /
48 30 060 0
49 31 061 1
50 32 062 2
51 33 063 3
52 34 064 4
53 35 065 5
54 36 066 6
55 37 067 7
56 38 070 8
57 39 071 9
58 3A 072 :
59 3b. 073 ;
60 3c. 074 <
61 3D 075 =
62 3E. 076 >
63 3F. 077 ?
DECHex.Oct.Char.
64 40 100 @
65 41 101 A.
66 42 102 B.
67 43 103 C.
68 44 104 D.
69 45 105 E.
70 46 106 F.
71 47 107 G.
72 48 110 H.
73 49 111 I.
74 4A. 112 J.
75 4b. 113 K.
76 4C. 114 L.
77 4d. 115 M.
78 4E. 116 N.
79 4f. 117 O.
80 50 120 P.
81 51 121 Q.
82 52 122 R.
83 53 123 S.
84 54 124 T.
85 55 125 U.
86 56 126 V.
87 57 127 W.
88 58 130 X.
89 59 131 Y.
90 5a 132 Z.
91 5b. 133 [
92 5C. 134 \
93 5d. 135 ]
94 5e. 136 ^
95 5f. 137 _
96 60 140 `
97 61 141 a.
98 62 142 b.
99 63 143 c.
100 64 144 d.
101 65 145 e.
102 66 146 f.
103 67 147 g.
104 68 150 h.
105 69 151 i.
106 6A 152 j.
107 6b. 153 k.
108 6C. 154 l.
109 6d. 155 m.
110 6E. 156 n.
111 6f. 157 o.
112 70 160 p.
113 71 161 q.
114 72 162 r.
115 73 163 s.
116 74 164 t.
117 75 165 u.
118 76 166 v.
119 77 167 w.
120 78 170 x.
121 79 171 y.
122 7A. 172 z.
123 7b. 173 {
124 7c. 174 |
125 7d. 175 }
126 7e. 176 ~
127 7f. 177 Del.

It is easy to notice, only Latin letters are presented in this encoding, and those that are used in English. There are also arithmetic and other service characters. But there are neither Russian letters, nor even special Latin for German or French. It is easy to explain - the encoding was developed as an American standard. When computers began to be applied all over the world, it was necessary to encode other characters.

To do this, it was decided to use the eighth bit in each pate. Thus, 128 more values \u200b\u200bwere available (from 80 to FF), which could be used to encode characters. The first of the eight-bit tables is "Advanced ASCII" ( Extended Ascii.) - included various variants of Latin characters used in some languages \u200b\u200bof Western Europe. It also had other additional characters, including pseudographic.

Pseudographic characters allow you to display only text symbols on the screen, provide some similarity graphics. With the help of pseudographics, for example, a program for managing Far Manager files.

Russian letters in the extended ascii table was not. In Russia (previously - the USSR) and in other states, their encodings were created, allowing the specific "national" symbols in 8-bit text files - the Latin letters of Polish and Czech languages, Cyrillic (including Russian letters) and other alphabets.

In all encodings that received the distribution, the first 127 characters (i.e., byte values \u200b\u200bat an eight bit equal to 0) coincide with ASCII. Thus, the ASCII file works in any of these encodings; Letters of English language They are represented equally.

Organization ISO. INTERNATIONAL STANDARDIZATION ORGANIZATION - International Organization for Standards) adopted a group of standards ISO 8859.. It defines 8-bit encodings for different groups languages. So, ISO 8859-1 is an extended ASCII, a table for the USA and Western Europe. And ISO 8859-5 - Table for Cyrillic (including Russian).

However, by historical reasons, ISO 8859-5 encoding did not fit. Really, the following encodings are used for Russian language:

Code Page 866 ( CP866.), she is "dos", it is "alternative GOST coding." It was widely used until the mid-90s; Now it is used limited. Practically does not apply to distribute texts on the Internet.
- Koi-8. Developed in the 70s and 1980s. It is a generally accepted standard for sending mail messages in the Russian Internet. Widely used in operating systems Unix family, including Linux. Option koi-8, calculated in Russian, called Koi-8r.; There are versions for other Cyrillic languages \u200b\u200b(so, Koi8-U is an option for the Ukrainian language).
- Code Page 1251, CP1251, Windows-1251. Developed by Microsoft to support the Russian language in the Windows system.

The main advantage of the CP866 was to preserve the characters of the pseudographic in the same places as in Extended ASCII; Therefore, there could be no changes to work overseas text programsFor example, the famous Norton Commander. Now the CP866 is used for Windows programs running in text windows or full-screen text mode, including Far Manager.

Texts in the CP866 in recent years are quite rare (but it is used to encode Russian file names in Windows). Therefore, we will dwell on two other encodings - Koi-8R and CP1251.



As you can see, in the CP1251 encoding table, Russian letters are arranged in alphabetical order (except, however, the letters of E). Thanks to this location computer Programs Very easy to sort alphabetically.

But in Koi-8R, the order of Russian letters seems random. But actually it is not.

In many old programs, the 8th bit was lost during processing or transferring text. (Now such programs are practically "extinct", but in the late 80s - early 90s they were widespread). To get a 7-bit value from an 8-bit value, it is enough to take away from the senior figure 8; For example, E1 turns into 61.

And now compare Koi-8R with the ASCII table (Table 1). You will find that Russian letters are set in a clear match with Latin. If the eighth bit disappears, the lowercase Russian letters turn into the title Latin, and the capital Russians in the lowercase Latin. So, E1 in Koi-8 is Russian "a", while 61 in ASCII - Latin "A".

So, Koi-8 allows you to maintain the readability of the Russian text with the loss of the 8th bits. "Hello everyone" turns into "Priwet WSEm".

Recently, the alphabetic order of symbols in the encoding table, and readability with the loss of the 8th bit lost crucial importance. Eighth Bit B. modern computers It is not lost during transmission or during processing. And sorting alphabetically is made taking into account the encoding, and not a simple comparison of the codes. (By the way, CP1251 codes are not completely alphabetically - the letter ё is not in its place).

Due to the fact that two common encodings turned out to be two, when working with the Internet (mail, viewing websites), it is sometimes possible to see a meaningless set of letters instead of Russian text. For example, "I will feedfamhel." These are just the words "with respect"; But they were encoded in the encoding CP1251, and the computer decoded the text on the KOO-8 table. If the same words were, on the contrary, are encoded in KOO-8, and the computer decoded the text on the CP1251 table, the result will be "at the HChbceien".

Sometimes it happens that the computer decrypts Russian-speaking letters and at all on a table that is not intended for the Russian language. Then, instead of Russian letters, a meaningless set of characters appear (for example, Latin letters of Eastern European languages); They are often referred to as "crocamers".

In most cases, modern programs are coping with the definition of Internet document encodings ( e-mail and Web pages) on their own. But sometimes they "give a mischief", and then you can see the strange sequences of Russian letters or "crochematical". As a rule, in such a situation, to display this text, it is enough to select the encoding manually in the program menu.

For the article, the page was used http://open-office.edusite.ru/textProcessor/p5aa1.html.

The material is taken from the site: