Standard part of the ASCII code. ASCII Coding (American Standard Code for Information Interchange) - Basic Latiza Text Encoding

Hello, dear blog readers Website. Today we will talk to you about where Krakoyarbra come from and in programs, which text encodings exist and which of them should be used. Let us consider in detail the history of their development, ranging from the basic ASCII, as well as its extended versions of CP866, KOI8-R, Windows 1251 and ending with modern codes of the Unicode UTF 16 and 8 consortium.

Someone this information may seem unnecessary, but you would know how much questions come to me exactly concerned the cracks (not reading a set of characters). Now I will have the opportunity to send everyone to the text of this article and independently search for your shoals. Well, get ready to absorb the information and try to monitor the narration.

ASCII - Basic Latiza Text Encoding

The development of text encodings occurs simultaneously with the formation of the IT industry, and during this time they had time to undergo quite a few changes. Historically, it all started with a rather harmful in Russian pronunciation of EBCDIC, which made it possible to encode the letters of the Latin alphabet, Arabic numbers and punctuation marks with control symbols.

But still the starting point for the development of modern text encodings should be considered a famous ASCII. (American Standard Code for Information Interchange, which in Russian is usually pronounced as "Aski"). It describes the first 128 characters from the most commonly used English-speaking users - Latin letters, Arabic numbers and punctuation marks.

Even in these 128 characters described in ASCII, some service symbols were crushed by brackets, lattices, asterisks, etc. Actually, you yourself can see them:

It is these 128 characters from the initial version of the ASCII have become the standard, and in any other encoding you will definitely meet and stand that they will be in such a manner.

But the fact is that with the help of one byte of the information, it is not 128, but as many as 256 different values \u200b\u200b(two to the degree eight equals 256), so after base version Aski appeared a number of advanced encodings ASCIIIn addition to 128 main signs, it was also possible to encode the national encoding symbols (for example, Russian).

Here, probably, it is worth a little more about the number system that are used in the description. First, as you know everything, the computer works only with numbers in a binary system, namely with zeros and units ("Boulev Algebra", if anyone held at the Institute or at School). Each of which is a decend to a degree, starting with zero, and to twos in the seventh:

It is not difficult to understand that all possible combinations of zeros and units in such a design can only be 256. Translate the number from the binary system in decimal is quite simple. It is necessary to simply fold all the degrees of twos above that one stands.

In our example, it turns out 1 (2 to the degree of zero) plus 8 (two to degrees 3), plus 32 (twice in the fifth degree), plus 64 (in the sixth), plus 128 (in the seventh). Total receives 233 in a decimal number system. As you can see, everything is very simple.

But if you look at the table with ASCII characters, you will see that they are presented in hexadecimal encoding. For example, the "asterisk" corresponds to the paradise of a hexadecimal number 2a. Probably, you know that in a hexadecimal number system, the Latin letters from A (mean ten) to F (means fifteen) are used in a hexadecimal number system.

Well, so for transfer binary numbers In hexadecimal Resort to the next simple and visual way. Each byte of information is broken into two parts of four bits, as shown in the screenshot above. So In each half of the byte, the binary code can only be encode for sixteen values \u200b\u200b(two in the fourth degree), which can be easily represented by hexadecimal.

Moreover, in the left half of the byte, it will be necessary to consider extent again from zero, and not as shown in the screenshot. As a result, by non-good computing, we get that the number E9 is encoded in the screenshot. I hope that the course of my reasoning and the solidification of this rebus you were understandable. Well, now we will continue, actually talk about the text encoding.

Extended versions of ASKI - CP866 and KOI8-R encoding with pseudograph

So, we started talking about ASCII, which was like a starting point for the development of all modern encodings (Windows 1251, Unicode, UTF 8).

Initially, it was laid only 128 signs of the Latin alphabet, Arabic numbers and something else there, but in the extended version it was possible to use all 256 values \u200b\u200bthat can be encoded in one pate information. Those. An opportunity to add symbols of the letters of his tongue to Aska.

Here it will be necessary to once again be distracted to clarify - why do you need encoding texts and why it is so important. The characters on the screen of your computer are formed on the basis of two things - sets of vector forms (representations) of all kinds of characters (they are in files CO) and code that allows you to pull out this set of vector shapes (font file) it is the character to be inserted into Right place.

It is clear that the fonts are responsible for the vector forms, but the operating system and programs used in it are responsible for encoding. Those. Any text on your computer will be a set of bytes in each of which one single symbol of this text is encoded.

The program that displays this text on the screen (text editor, browser, etc.), when parsing the code, reads the encoding of the next sign and searches for the corresponding vector form in the desired file The font that is connected to display this text document. Everything is simple and trite.

So, to encode any symbol we need (for example, from the National Alphabet), two conditions must be completed - the vector form of this sign should be in the font used and this symbol could be encoded in the extended ASCII encodings into one byte. Therefore, there is a whole bunch of such options. Only for coding of the symbols of the Russian language there are several varieties of extended ASSS.

For example, initially appeared CP866.In which it was possible to use the symbols of the Russian alphabet and it was an extended version of ASCII.

Those. Its upper part completely coincided with the basic version of Aska (128 symbols of Latin, numbers and anyone else), which is presented on the screenshot given a little higher, but already bottom part The CP866 encoding tables had the specified in the screenshot slightly lower and allowed to encode another 128 characters (Russian letters and every pseudographic):

See, in the right column, the numbers begin with 8, because The numbers from 0 to 7 refer to the base part of the ASCII (see the first screenshot). So The Russian letter "M" in the CP866 will have code 9C (it is on the intersection of the corresponding rows with 9 and the column with a number C in a hexadecimal number system), which can be written in one byte information, and if there is a suitable font with Russian characters, this letter without Problems will be displayed in the text.

Where did this quantity come from pseudographers in CP866.? It's all the fact that this encoding for Russian text was developed in those bright years, when there was no such distribution of graphic operating systems as now. And in the doss, and similar text operations, the pseudographic allowed at least somehow diversify the design of texts and therefore it abounds with CP866 and all its other rows from the discharge of extended Versions of Aska.

CP866 distributed IBM company, but in addition, a number of encodings were developed for the symbols of the Russian language, for example, the same type (extended ASCII) can be attributed Koi8-R.:

The principle of its work remained the same as the CP866 described later - each text symbol is encoded by one single byte. The screenshot shows the second half of the KOI8-R table, because The first half is fully consistent with the base asus, which is shown on the first screenshot in this article.

Among the features of KOI8-R encoding, it can be noted that the Russian letters in its table are not in alphabetical order, like this, for example, made in CP866.

If you look at the very first screenshot (base part, which enters all extended encodings), then notice that in Koi8-R, Russian letters are located in the same tables of the table as the letters of the Latin alphabet from the first part of the table. This was done for the convenience of switching from Russian symbols to Latin by discarding only one bit (two in the seventh degree or 128).

Windows 1251 - a modern version of ASCII and why crackels get out

Further development of text encodings was due to the fact that graphic operating systems and the need to use pseudographics in them were gaining popularity. As a result, a whole group arose, which, at their essence, was still advanced versions of ASKI (one text symbol is encoded with only one byput of information), but without using pseudographic characters.

They treated the so-called ANSI coding, which were developed by the American Institute for Standardization. The name of Cyrillic was still used in the surchanting for an option with the support of the Russian language. An example of such an example.

It was favorably different from the previously used CP866 and Koi8-R in that the place of the characters of the pseudographic in it took the missing symbols of the Russian typography (the decreasing sign), as well as the symbols used in close to Russian Slavic languages \u200b\u200b(Ukrainian, Belarusian, etc. ):

Because of this abundance of the codings of the Russian language, manufacturers of fonts and manufacturers software He constantly arose a headache, and with you, dear readers, often got those the most notorious krakoyabryWhen the confusion was taught with the version used in the text.

Very often they got out when sending and receiving messages on e-mailWhat caused the creation of very complex transcoding tables, which, in fact, could not solve this problem in the root, and often users for correspondence were used to avoid notorious krakozyabs when using Russian encodings of such CP866, KOI8-R or Windows 1251.

In essence, krakoyarbra, imparting instead of Russian text, were the result of the incorrect use of encoding of this languagewhich did not match the one in which the text message was encoded initially.

Suppose if symbols encoded with CP866, try to display using the Windows 1251 code table, then these most cracked (meaningless set of characters) and get out, completely replacing the message text.

A similar situation is very often occurring at, forums or blogs, when text with Russian characters by mistake is not saved in that encoding that is used on the default website, or not in that text editorwhich adds to the code sebestin not visible to the naked eye.

In the end, such a situation with many encodings and constantly crawling cranebrams, many tired, there were prerequisites for creating a new universal variation, which would have replaced all existing and solve, finally, to the root of the problem with the advent of not readable texts. In addition, there was a problem of languages \u200b\u200bof similar Chinese, where the symbols of the language were much more than 256.

Unicode (Unicode) - Universal Codes UTF 8, 16 and 32

These thousands of signs of the Language group of Southeast Asia could not be described in one pape information that was allocated for encoding characters in advanced ASCII versions. As a result, a consortium was created called Unicode (Unicode - Unicode Consortium) In the collaboration of many IT leaders of the industry (those who produce a software that encodes iron, who creates fonts) who were interested in the appearance of a universal text encoding.

The first variation published under the auspices of the Unicode Consortium was UTF 32.. The digit in the name of the encoding means the number of bits that is used to encode one symbol. 32 bits are 4 bytes of information that will be needed to encode one single sign in the new Universal UTF encoding.

As a result, the same file with the text encoded in the extended version of ASCII and UTF-32 will in the latter case will have the size (weigh) four times more. It is bad, but now we have the opportunity to encode the number of signs equal to two to thirty second degrees with the help of UTF ( billions of characterswhich will cover any real value with a colossal margin).

But many countries with the languages \u200b\u200bof the European Group have such a huge number of signs to use in the encoding at all and there was no need, however, when using UTF-32, they didn't receive a four-time increase in the weight of text documents, and as a result, an increase in Internet traffic and volume stored data. This is a lot, and no one could afford such waste.

As a result of the development of Unicode appeared UTF-16which turned out so successful that was adopted by default as a basic space for all the characters that we use. It uses two bytes to encode one sign. Let's see how this thing looks like.

In the Windows operating system, you can pass along the path "Start" - "Programs" - "Standard" - "Service" - "Character Table". As a result, a table opens with vector forms of all installed in your fonts. If you choose in "additional parameters" a set of unicode characters, you can see for each font separately the entire range of characters included in it.

By the way, clicking on any of them, you can see it two-by code in UTF-16 formatconsisting of four hexadecimal digits:

How many characters can be encoded in UTF-16 using 16 bits? 65 536 (two to sixteen), and this number was taken for the basic space in Unicode. In addition, there are ways to encode with it and about two million characters, but limited to the expanded space in a million text symbols.

But even this successful version of Unicode's encoding did not bring much satisfaction with those who wrote, for example, programs only in English, because they have, after switching from the extended version of ASCII to UTF-16, the weight of the documents increased twice (one byte per one The symbol in ASKI and two bytes on the same symbol in UTF-16).

That's it precisely to satisfy everyone and all in the Unicode consortium was decided to come up with encoding variable length. She was called UTF-8. Despite the eight in the title, it really has a variable length, i.e. Each text symbol can be encoded into a sequence of one to six bytes.

In practice, the UTF-8 uses only a range from one to four bytes, because there is nothing even theoretically possible to submit anything to the four bytes of the code. All Latin signs are encoded in one byte, as well as in the old good ASCII.

What is noteworthy, in the case of coding only Latin, even those programs that do not understand Unicode will still read what is encoded in UTF-8. Those. The basic part of Aska simply switched to this off the Unicode Consortium.

Cyrillic signs in UTF-8 are encoded into two bytes, and, for example, Georgian - in three bytes. The Unicode Consortium after the creation of UTF 16 and 8 decided the main problem - now we have in the fonts there is a single code space. And now their manufacturers remain only on the basis of their forces and opportunities to fill it with vector forms of text symbols. Now in the sets even.

In the Symbol table below, it can be seen that different fonts support a different number of characters. Some symbols of Unicode fonts can weigh very well. But now they are not distinguished by the fact that they are created for different encodings, but by the fact that the font manufacturer filled or not filled the single code space by those or other vector forms to the end.

Krakoyabry instead of Russian letters - how to fix

Let's now see how the Crakozyabe text appears instead of the text or, in other words, how the correct encoding is selected for Russian text. Actually, it is set in the program in which you create or edit this same text, or code using text fragments.

For editing and creating text files, I personally use very good, in my opinion. However, it can highlight the syntax still good hundreds of programming languages \u200b\u200band markup, and also has the ability to expand with plugins. Read detailed review This wonderful program according to the link.

In the NotePad ++ top menu, there is an "encoding" item, where you will have the ability to convert an existing option to one that is used on your default site:

In the case of a site on Joomla 1.5 and above, as well as in the case of a blog on Wordpress, you should choose the option in order to avoid the appearance of krakoyar UTF 8 without BOM. What is the BOM prefix?

The fact is that when the ETF-16 encoding was developed, for some reason decided to fasten such a thing to it as the ability to record a symbol code, both in direct sequence (for example, 0a15) and in the reverse (150a). And in order for the programs to understand which sequence reading codes, and was invented BOM. (Byte Order Mark or, in other words, signature), which was expressed in adding three additional bytes to the very beginning of documents.

In the utf-8 encoding, there were no BOM in the Unicode Consortium and therefore adding signature (these most notorious additional three bytes to the beginning of the document) Some programs simply prevent reading the code. Therefore, we always, when saving files in UTF, you must select an option without BOM (without signature). So you are in advance mustrase yourself from crackering.

What is noteworthy, some programs in Windows do not know how to do this (do not be able to save text in UTF-8 without BOM), for example, the same notorious notebook Windows. It saves the document in UTF-8, but still adds signature to its beginning (three additional bytes). Moreover, these bytes will always be the same - read the code in direct sequence. But on the servers, because of this little things, there may be a problem - crackels will get out.

Therefore, in no case do not use the usual notebook Windows To edit documents of your site, if you do not want the appearance of krakoyarbra. I consider the latest and easiest option for the already mentioned NotePad ++ editor, which practically does not have drawbacks and consists of one of the advantages.

In NotePad ++ when choosing an encoding, you will have the ability to convert text to the UCS-2 encoding, which is very close to the Unicode standard in essence. Also in a non-type can be encoded in ANSI, i.e. With reference to the Russian language, this will be already described by us just above Windows 1251. Where does this information come from?

She is spelled out in the registry of your operating room windows systems - What encoding to choose in the case of ANSI, what to choose in the case of OEM (for the Russian language it will be CP866). If you install another default language on your computer, then these encodings will be replaced with similar to ANSI or OEM discharge for the same language.

After you in NotePad ++, save the document in the encoding you need or open a document from the site to edit, then in the lower right corner of the editor you can see its name:

To avoid krakoyarbrovexcept the actions described above will be useful to register in his cap source code All site pages information about this coding, so that the server or local host does not occur.

In general, in all languages \u200b\u200bof hypertext marking other than HTML, a special XML ad is used, which specifies the text encoding.

Before starting to disassemble the code, the browser will find out which version is used and how exactly you need to interpret the codes of the characters of this language. But what is noteworthy, in case you save the document in the default Unicode, this XML declaration can be omitted (the encoding will be considered UTF-8, if there is no BOM or UTF-16 if there is a BOM).

In the case of a document hTML language To specify the encoding used meta elementwhich is prescribed between the opening and closing HEAD tag:

... ...

This entry is quite different from the accepted B, but fully complies with the newly introduced slowly by the HTML 5 standard, and it will be absolutely correctly understood by anyone used on this moment browsers.

In theory, META element with an indication of encoding HTML document Better to put as high as possible in the dock headerSo that at the time of the meeting in the text of the first sign is not from the basic ANSI (which always read always and in any variation) the browser must already have information on how to interpret the codes of these characters.

Good luck to you! To ambiguous meetings on the blog pages Website

see more Rollers you can go on

");">

Symbol overlay

Thanks to the BS symbol (return to step), one character over the other can be printed on the printer. In ASCII, it has been addressed to add diacritic to letters, for example:

a bs "→ Á
a BS `→ à
a bs ^ → Â
o bs / → Ø
c BS, → ç
n BS ~ → ñ

Note: In the old fonts apostrophe "drew a slope to the left, and Tilda ~ was shifted up, so that they just fit the role of Akut and Tilde from above.

If the same symbol is superimposed on the symbol, the effect of the bold font is obtained, and if emphasis is superimposed on the symbol, it turns out underdend the text.

a BS A → a.
a BS _ → a.

Note: This is used, for example, in the MAN reference system.

National ASCII options

ISO 646 (ECMA-6) Standard provides for the possibility of placing national characters in place @ [ \ ] ^ ` { | } ~ . In addition to this, in place # May be placed £ , and in place $ - ¤ . Such a system is well suited for European languages, where only a few additional characters are needed. An ASCII version without national symbols is called US-ASCII, or "International Reference Version".

Subsequently, it turned out more convenient to use 8-bit encodings (code pages), where the lower half of the code table (0-127) occupy US-ASCII characters, and the upper (128-255) are additional characters, including a set of national symbols. Thus, the upper half of the ASCII table to the ubiquitous implementation of Unicode was actively used to represent localized symbols, local letters. The absence of a single standard for placing Cyrillic characters in the ASCII table delivered many encoding problems (koi-8, Windows-1251 and others). Other languages \u200b\u200bwith nonlaining writing also suffered due to the presence of several different encodings.

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.A.a.	.B	.C.c.	.D.	.E.e.	.F.
0.	Nul.	SOM	EOA.	EOM	EQT.	Wru.	Ru	Bell.	BKSP.	HT	LF.	Vt.	FF.	CR	SO.	SI
1.	DC 0.	DC 1.	DC 2.	DC 3.	DC 4.	Err.	Sync.	Lem.	S 0.	S 1	S 2.	S 3.	S 4.	S 5.	S 6.	S 7.
2.
3.
4.	Blank	!	"	#	$	%	&	"	(	)	*	+	,	-	.	/
5.	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
6.
7.
8.
9.
A.	@	A.	B.	C.	D.	E.	F.	G.	H.	I.	J.	K.	L.	M.	N.	O.
B.	P.	Q.	R.	S.	T.	U.	V.	W.	X.	Y.	Z.	[	\	]		←
C.
D.
E.		a.	b.	c.	d.	e.	f.	g.	h.	i.	j.	k.	l.	m.	n.	o.
F.	p.	q.	r.	s.	t.	u.	v.	w.	x.	y.	z.				ESC	Del.

On those computers where the minimally addressable unit of memory was a 36-bit word, initially 6-bit characters were used (1 word \u003d 6 characters). After switching to ASCII on such computers, either 5 seven-bit characters began to be placed in one word (1 bit remained superfluous) or 4 ninebitant characters.

ASCII codes are also used to determine the key under programming. For standard QWERTY keypad, the code table looks like this:

Many characters with which the text is written, called alphabet.

The number of characters in the alphabet is its power.

Formula for determining the number of information: N \u003d 2 b,

where n is the power of the alphabet (the number of characters),

b - the number of bits (symbol information).

Alphabet with a power of 256 characters can be placed almost all the necessary characters. Such an alphabet is called sufficient.

Because 256 \u003d 2 8, then weight 1 symbol - 8 bits.

Unit of measurement 8 bits appropriated name 1 byte:

1 byte \u003d 8 bits.

The binary code of each symbol in the computer text takes 1 memory byte.

What way text information is presented in the computer's memory?

Convenience of over-off symbol encoding is obvious, because bytes - the smallest addressable part of the memory and, therefore, the processor can refer to each character separately by performing text processing. On the other hand, 256 characters are quite enough to represent the most varied symbolic information.

Now the question arises which eight-bit binary code to put in line with each symbol.

It is clear that this is a conditional matter, you can come up with many encoding methods.

All symbols of the computer alphabet are numbered from 0 to 255. Eight-bit sorts of each number. binary code from 00000000 to 11111111. This code is simply the sequence number of the symbol in the binary number system.

A table in which all the characters of the computer alphabet are made in compliance with the sequence numbers, is called the encoding table.

For different types EUM uses various encoding tables.

The International Standard for PC has become a table ASCII.(Reading ASKI) (American standard code for information exchange).

The ASCII code table is divided into two parts.

The international standard is only the first half of the table, i.e. Symbols with numbers from 0 (00000000), before 127 (01111111).

ASCII Encoding Table Structure

Serial number	The code	Symbol
0 - 31	00000000 - 00011111	Symbols with numbers from 0 to 31 are called managers. Their function is to control the process of outputting text on the screen or print, the sound signal, text markup, etc.
32 - 127	00100000 - 01111111	Standard part of the table (English). This includes lowercase and capital letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of brackets, commercial and other characters. Symbol 32 - space, i.e. Empty position in the text. All other are reflected by certain signs.
128 - 255	10000000 - 11111111	Alternative part of the table (Russian). The second half of the ASCII code table, called the code page (128 codes, starting with 10,000,000 and ending 11111111), may have different options, each option has its own number. The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, the symbols of the Russian alphabet are placed in this part of the table.

The first half of the ASCII codes table

I draw your attention to the fact that in the table of encoding the letter (uppercase and lowercase) are arranged in alphabetical order, and the numbers are ordered by increasing values. Such adherence to the lexicographic order in the arrangement of the symbols is called the principle of sequential coding of the alphabet.

For the letters of the Russian alphabet, the principle of serial coding is also observed.

The second half of the ASCII codes table

Unfortunately, there are currently five different Cyrillic encodings (koi8-p, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with the transfer of Russian text from one computer to another, from one software system to another.

Chronologically one of the first standards of coding Russian letters on computers was koi8 ("information exchange code, 8-bit"). This encoding was used in the 70s on the computers of the EU EU series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the beginning of the 90s, the time of domination of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization, ISO) approved another encoding called ISO 8859-5 as a standard for Russian language.

The most common is currently encoding. Microsoft Windows.denoted by the reduction of CP1251.

Since the end of the 90s, the problem of standardization of symbol coding is solved by the introduction of a new international standard called Unicode.. This is a 16-bit encoding, i.e. In it, each symbol is given 2 byte of memory. Of course, the amount of memory occupied by 2 times. But this code table allows you to enable up to 65536 characters. The complete specification of the Unicode standard includes all existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Let's try using the ASCII table to imagine how words will look in the computer's memory.

Internal word view in computer memory

Sometimes it happens that the text consisting of the letters of the Russian alphabet, obtained from another computer, cannot be read - some "abrakadabra" can be visible on the monitor screen. This happens because the computers use different encoding of the symbols of the Russian language.

DEC	Hex.	Symbol	DEC	Hex.	Symbol
000	00	specialist. Nop.	128	80	Ђ
001	01	specialist. Soh.	129	81	Ѓ
002	02	specialist. STX.	130	82	‚
003	03	specialist. Etx	131	83	ѓ
004	04	specialist. EOT.	132	84	„
005	05	specialist. Enq	133	85	…
006	06	specialist. ACK.	134	86	†
007	07	specialist. Bel.	135	87	‡
008	08	specialist. BS.	136	88	€
009	09	specialist. Tab.	137	89	‰
010	0a.	specialist. LF.	138	8A.	Љ
011	0b.	specialist. Vt.	139	8b.	‹ ‹
012	0c.	specialist. FF.	140	8c.	Њ
013	0d.	specialist. CR	141	8d.	Ќ
014	0E.	specialist. SO.	142	8e.	Ћ
015	0F.	specialist. SI	143	8f.	Џ
016	10	specialist. DLE.	144	90	ђ
017	11	specialist. DC1	145	91	‘
018	12	specialist. DC2.	146	92	’
019	13	specialist. DC3	147	93	“
020	14	specialist. DC4	148	94	”
021	15	specialist. NAK.	149	95
022	16	specialist. Syn.	150	96	–
023	17	specialist. Etb.	151	97	—
024	18	specialist. CAN	152	98
025	19	specialist. Em.	153	99	™
026	1A.	specialist. Sub.	154	9A.	љ
027	1b.	specialist. ESC	155	9b.	›
028	1C.	specialist. FS.	156	9C.	њ
029	1d.	specialist. GS.	157	9d.	ќ
030	1e.	specialist. RS.	158	9e.	ћ
031	1f.	specialist. US.	159	9f.	џ
032	20	cup. SP (space)	160	A0.
033	21	!	161	A1	Ў
034	22	"	162	A2.	ў
035	23	#	163	A3.	Ћ
036	24	$	164	A4.	¤
037	25	%	165	A5.	Ґ
038	26	&	166	A6.	¦
039	27	"	167	A7.	§
040	28	(	168	A8.	E.
041	29	)	169	A9.	©
042	2a.	*	170	AA.	Є
043	2b.	+	171	AB	«
044	2c.	,	172	AC	¬
045	2d.	-	173	AD
046	2e.	.	174	AE	®
047	2f.	/	175	AF	Ї
048	30	0	176	B0.	°
049	31	1	177	B1.	±
050	32	2	178	B2.	І
051	33	3	179	B3.	і
052	34	4	180	B4.	ґ
053	35	5	181	B5.	µ
054	36	6	182	B6.	¶
055	37	7	183	B7.	·
056	38	8	184	B8.	e.
057	39	9	185	B9.	№
058	3A	:	186	BA.	є
059	3b.	;	187	BB.	»
060	3c.	<	188	BC.	ј
061	3D	=	189	BD.	Ѕ
062	3E.	>	190	BE.	ѕ
063	3F.	?	191	BF.	ї
064	40	@	192	C0.	BUT
065	41	A.	193	C1.	B.
066	42	B.	194	C2.	IN
067	43	C.	195	C3.	G.
068	44	D.	196	C4.	D.
069	45	E.	197	C5.	E.
070	46	F.	198	C6.	J.
071	47	G.	199	C7	Z.
072	48	H.	200	C8.	AND
073	49	I.	201	C9.	J.
074	4A.	J.	202	CA.	TO
075	4b.	K.	203	CB.	L.
076	4C.	L.	204	CC.	M.
077	4d.	M.	205	CD	N.
078	4E.	N.	206	CE	ABOUT
079	4f.	O.	207	CF.	P
080	50	P.	208	D0.	R
081	51	Q.	209	D1	FROM
082	52	R.	210	D2.	T.
083	53	S.	211	D3.	W.
084	54	T.	212	D4.	F.
085	55	U.	213	D5.	H.
086	56	V.	214	D6.	C.
087	57	W.	215	D7.	C.
088	58	X.	216	D8.	Sh
089	59	Y.	217	D9	Sh
090	5a	Z.	218	DA	Kommersant
091	5b.	[	219	DB	S
092	5C.	\	220	DC	B
093	5d.	]	221	DD	E.
094	5e.	^	222	DE.	YU
095	5f.	_	223	Df.	I
096	60	`	224	E0	but
097	61	a.	225	E1.	b.
098	62	b.	226	E2.	in
099	63	c.	227	E3.	g.
100	64	d.	228	E4.	d.
101	65	e.	229	E5	e.
102	66	f.	230	E6.	j.
103	67	g.	231	E7.	z.
104	68	h.	232	E8.	and
105	69	i.	233	E9.	j.
106	6A	j.	234	EA.	to
107	6b.	k.	235	EB.	l.
108	6C.	l.	236	EC	m.
109	6d.	m.	237	ED	n.
110	6E.	n.	238	EE	about
111	6f.	o.	239	EF.	p
112	70	p.	240	F0.	r
113	71	q.	241	F1	from
114	72	r.	242	F2.	t.
115	73	s.	243	F3.	w.
116	74	t.	244	F4.	f.
117	75	u.	245	F5	h.
118	76	v.	246	F6.	c.
119	77	w.	247	F7.	c.
120	78	x.	248	F8.	sh
121	79	y.	249	F9.	sh
122	7A.	z.	250	FA.	kommersant
123	7b.	{	251	FB.	s
124	7c.	\|	252	FC.	b
125	7d.	}	253	FD.	e.
126	7e.	~	254	FE.	yu
127	7f.	Specialist. Del.	255	FF.	i

ASCII Windows Symbols Table.
Description of special (managing) characters

It should be noted that the initially control symbols of the ASCII table were used to ensure the exchange of telethip data, data entry with punctuents and for the simplest management of external devices.
Currently, most of the managers symbols ascii. Tables no longer carry this load and can be used for other purposes.

The code	Description
NUL, 00.	NULL, empty
SOH, 01.	Start of Heading, header start
STX, 02.	Start of Text, the beginning of the text.
ETX, 03.	End of Text, end text
EOT, 04.	End of TRANSMISSION, end of the transfer
ENQ, 05.	Enquire. Please confirm
Ack, 06.	Acknowledgment. I confirm
Bel, 07.	Bell, bell
BS, 08.	Backspace, return to one character back
Tab, 09.	Tab, horizontal tabulation
LF, 0A.	Line feed, row translation. Now in most programming languages \u200b\u200bis indicated as \\ n
VT, 0B.	Vertical Tab, vertical tab.
FF, 0C.	Form Feed, page run, new page
CR, 0D.	CARRIAGE RETURN, RETURN CARETS. Now in most programming languages \u200b\u200bis indicated as \\ r
SO, 0E.	SHIFT OUT, change the color of the coloring ribbon in the printing device
Si, 0F.	SHIFT IN, Return the color of the coloring ribbon in the printing device back
DLE, 10.	Data Link Escape, switching channel to data transmission
DC1, 11. DC2, 12. DC3, 13. DC4, 14.	Device Control, Device Management Symbols
NAK, 15.	Negative Acknowledgment, do not confirm.
SYN, 16.	Synchronization. Synchronization symbol
ETB, 17.	End of Text Block, End of Text Block
CAN, 18.	Cancel, canceled transmitted earlier
EM, 19.	End of Medium, end of the data carrier
SUB, 1A.	Substitute, substitute. It is placed on the site of the symbol, the value of which was lost or spoiled during transmission
ESC, 1B.	Escape control sequence
FS, 1C.	File Separator, File Separator
GS, 1D.	Group Separator, Group Separator
RS, 1e.	Record Separator, Record Separator
US, 1F.	Unit Separator, Unit Separator
Del, 7f.	Delete, erase the last symbol.

[8-bit encodings: ASCII, Koi-8R and CP1251] The first encoding tables created in the United States did not use the eighth bit in the pate. The text was presented as a sequence of bytes, but the eighth bit was not taken into account (it was applied for official purposes).

The table has become a generally accepted standard. ASCII. American Standard Code for Information Interchange). The first 32 ASCII table symbols (from 00 to 1f) were used for non-printable characters. They were designed to control the printing device, etc. The rest is from 20 to 7f - the usual (printed) characters.

Table 1 - ASCII encoding

DEC	Hex.	Oct.	Char.	Description.
0	0	000		nULL
1	1	001		start of Heading.
2	2	002		start of Text.
3	3	003		end of TEXT.
4	4	004		end of TRANSMISSION.
5	5	005		enquiry.
6	6	006		acknowledge.
7	7	007		bell.
8	8	010		backspace.
9	9	011		horizontal Tab.
10	A.	012		nEW LINE
11	B.	013		vertical Tab.
12	C.	014		new Page
13	D.	015		carriage Return.
14	E.	016		sHIFT OUT.
15	F.	017		sHIFT IN.
16	10	020		data Link Escape.
17	11	021		device Control 1.
18	12	022		device Control 2.
19	13	023		device Control 3.
20	14	024		device Control 4.
21	15	025		negative Acknowledge.
22	16	026		synchronous Idle
23	17	027		end of Trans. block
24	18	030		cancel.
25	19	031		end of Medium
26	1A.	032		substitute.
27	1b.	033		escape.
28	1C.	034		file Separator.
29	1d.	035		group separator.
30	1e.	036		record Separator.
31	1f.	037		unit separator.
32	20	040		space.
33	21	041	!
34	22	042	"
35	23	043	#
36	24	044	$
37	25	045	%
38	26	046	&
39	27	047	"
40	28	050	(
41	29	051	)
42	2a.	052	*
43	2b.	053	+
44	2c.	054	,
45	2d.	055	-
46	2e.	056	.
47	2f.	057	/
48	30	060	0
49	31	061	1
50	32	062	2
51	33	063	3
52	34	064	4
53	35	065	5
54	36	066	6
55	37	067	7
56	38	070	8
57	39	071	9
58	3A	072	:
59	3b.	073	;
60	3c.	074	<
61	3D	075	=
62	3E.	076	>
63	3F.	077	?

DEC	Hex.	Oct.	Char.
64	40	100	@
65	41	101	A.
66	42	102	B.
67	43	103	C.
68	44	104	D.
69	45	105	E.
70	46	106	F.
71	47	107	G.
72	48	110	H.
73	49	111	I.
74	4A.	112	J.
75	4b.	113	K.
76	4C.	114	L.
77	4d.	115	M.
78	4E.	116	N.
79	4f.	117	O.
80	50	120	P.
81	51	121	Q.
82	52	122	R.
83	53	123	S.
84	54	124	T.
85	55	125	U.
86	56	126	V.
87	57	127	W.
88	58	130	X.
89	59	131	Y.
90	5a	132	Z.
91	5b.	133	[
92	5C.	134	\
93	5d.	135	]
94	5e.	136	^
95	5f.	137	_
96	60	140	`
97	61	141	a.
98	62	142	b.
99	63	143	c.
100	64	144	d.
101	65	145	e.
102	66	146	f.
103	67	147	g.
104	68	150	h.
105	69	151	i.
106	6A	152	j.
107	6b.	153	k.
108	6C.	154	l.
109	6d.	155	m.
110	6E.	156	n.
111	6f.	157	o.
112	70	160	p.
113	71	161	q.
114	72	162	r.
115	73	163	s.
116	74	164	t.
117	75	165	u.
118	76	166	v.
119	77	167	w.
120	78	170	x.
121	79	171	y.
122	7A.	172	z.
123	7b.	173	{
124	7c.	174	\|
125	7d.	175	}
126	7e.	176	~
127	7f.	177	Del.

It is easy to notice, only Latin letters are presented in this encoding, and those that are used in English. There are also arithmetic and other service characters. But there are neither Russian letters, nor even special Latin for German or French. It is easy to explain - the encoding was developed as an American standard. When computers began to be applied all over the world, it was necessary to encode other characters.

To do this, it was decided to use the eighth bit in each pate. Thus, 128 more values \u200b\u200bwere available (from 80 to FF), which could be used to encode characters. The first of the eight-bit tables is "Advanced ASCII" ( Extended Ascii.) - included various variants of Latin characters used in some languages \u200b\u200bof Western Europe. It also had other additional characters, including pseudographic.

Pseudographic characters allow you to display only text symbols on the screen, provide some similarity graphics. With the help of pseudographics, for example, a program for managing Far Manager files.

Russian letters in the extended ascii table was not. In Russia (previously - the USSR) and in other states, their encodings were created, allowing the specific "national" symbols in 8-bit text files - the Latin letters of Polish and Czech languages, Cyrillic (including Russian letters) and other alphabets.

In all encodings that received the distribution, the first 127 characters (i.e., byte values \u200b\u200bat an eight bit equal to 0) coincide with ASCII. Thus, the ASCII file works in any of these encodings; Letters of English language They are represented equally.

Organization ISO. INTERNATIONAL STANDARDIZATION ORGANIZATION - International Organization for Standards) adopted a group of standards ISO 8859.. It defines 8-bit encodings for different groups languages. So, ISO 8859-1 is an extended ASCII, a table for the USA and Western Europe. And ISO 8859-5 - Table for Cyrillic (including Russian).

However, by historical reasons, ISO 8859-5 encoding did not fit. Really, the following encodings are used for Russian language:

Code Page 866 ( CP866.), she is "dos", it is "alternative GOST coding." It was widely used until the mid-90s; Now it is used limited. Practically does not apply to distribute texts on the Internet.
- Koi-8. Developed in the 70s and 1980s. It is a generally accepted standard for sending mail messages in the Russian Internet. Widely used in operating systems Unix family, including Linux. Option koi-8, calculated in Russian, called Koi-8r.; There are versions for other Cyrillic languages \u200b\u200b(so, Koi8-U is an option for the Ukrainian language).
- Code Page 1251, CP1251, Windows-1251. Developed by Microsoft to support the Russian language in the Windows system.

The main advantage of the CP866 was to preserve the characters of the pseudographic in the same places as in Extended ASCII; Therefore, there could be no changes to work overseas text programsFor example, the famous Norton Commander. Now the CP866 is used for Windows programs running in text windows or full-screen text mode, including Far Manager.

Texts in the CP866 in recent years are quite rare (but it is used to encode Russian file names in Windows). Therefore, we will dwell on two other encodings - Koi-8R and CP1251.

As you can see, in the CP1251 encoding table, Russian letters are arranged in alphabetical order (except, however, the letters of E). Thanks to this location computer Programs Very easy to sort alphabetically.

But in Koi-8R, the order of Russian letters seems random. But actually it is not.

In many old programs, the 8th bit was lost during processing or transferring text. (Now such programs are practically "extinct", but in the late 80s - early 90s they were widespread). To get a 7-bit value from an 8-bit value, it is enough to take away from the senior figure 8; For example, E1 turns into 61.

And now compare Koi-8R with the ASCII table (Table 1). You will find that Russian letters are set in a clear match with Latin. If the eighth bit disappears, the lowercase Russian letters turn into the title Latin, and the capital Russians in the lowercase Latin. So, E1 in Koi-8 is Russian "a", while 61 in ASCII - Latin "A".

So, Koi-8 allows you to maintain the readability of the Russian text with the loss of the 8th bits. "Hello everyone" turns into "Priwet WSEm".

Recently, the alphabetic order of symbols in the encoding table, and readability with the loss of the 8th bit lost crucial importance. Eighth Bit B. modern computers It is not lost during transmission or during processing. And sorting alphabetically is made taking into account the encoding, and not a simple comparison of the codes. (By the way, CP1251 codes are not completely alphabetically - the letter ё is not in its place).

Due to the fact that two common encodings turned out to be two, when working with the Internet (mail, viewing websites), it is sometimes possible to see a meaningless set of letters instead of Russian text. For example, "I will feedfamhel." These are just the words "with respect"; But they were encoded in the encoding CP1251, and the computer decoded the text on the KOO-8 table. If the same words were, on the contrary, are encoded in KOO-8, and the computer decoded the text on the CP1251 table, the result will be "at the HChbceien".

Sometimes it happens that the computer decrypts Russian-speaking letters and at all on a table that is not intended for the Russian language. Then, instead of Russian letters, a meaningless set of characters appear (for example, Latin letters of Eastern European languages); They are often referred to as "crocamers".

In most cases, modern programs are coping with the definition of Internet document encodings ( e-mail and Web pages) on their own. But sometimes they "give a mischief", and then you can see the strange sequences of Russian letters or "crochematical". As a rule, in such a situation, to display this text, it is enough to select the encoding manually in the program menu.

For the article, the page was used http://open-office.edusite.ru/textProcessor/p5aa1.html.

The material is taken from the site:

Standard part of the ASCII code. ASCII Coding (American Standard Code for Information Interchange) - Basic Latiza Text Encoding

ASCII - Basic Latiza Text Encoding

Extended versions of ASKI - CP866 and KOI8-R encoding with pseudograph

Windows 1251 - a modern version of ASCII and why crackels get out

Unicode (Unicode) - Universal Codes UTF 8, 16 and 32

Krakoyabry instead of Russian letters - how to fix

Symbol overlay

National ASCII options

What way text information is presented in the computer's memory?

Now the question arises which eight-bit binary code to put in line with each symbol.

A table in which all the characters of the computer alphabet are made in compliance with the sequence numbers, is called the encoding table.

ASCII Encoding Table Structure

Serial number

The code

Symbol

0 - 31

00000000 - 00011111

32 - 127

00100000 - 01111111

128 - 255

10000000 - 11111111

The first half of the ASCII codes table

The second half of the ASCII codes table

Let's try using the ASCII table to imagine how words will look in the computer's memory.

Internal word view in computer memory

ASCII Windows Symbols Table. Description of special (managing) characters

ASCII Windows Symbols Table.
Description of special (managing) characters