What is file size in computer science. Units of information volume measurement

Topic: “Measuring information”

Formulas

To determine the information volume of a message, two formulas are required:

1. \(N= 2^i\)

N - alphabet power

2. \(I = k * i \) 

I - information volume of the message

k - number of characters in the message

i - information volume of one character in the alphabet

Formula for finding k:

Formula for finding i:

Tasks

Task No. 1. The message, written in letters from the 128-character alphabet, contains 30 characters. Find the information volume of the entire message?

Solution.

\(I = ? \)

\(i = ? \)

\(N= 2^i \) = \(128= 2^7 \)

\(i = 7 \) bits. What power of two is the weight of one character in the alphabet. Next, we determine the information volume of the message using the formula:

\(I = k * i \) = 30 * 7 = 210 bits

Answer: 210 bits

Task No. 2. A 4 KB information message contains 4096 characters. How many characters does the alphabet with which this message was written contain?

Solution. Let us write down what is given according to the conditions of the problem and what needs to be found:

\(I = 4\) KB

\(N = ? \)

\(i = ? \)

It is very important to convert all numbers to powers of two:

1 KB = \(2^(13)\) bits

\(I = 4 \) KB = \(2^2 \) * \(2^(13) \) = \(2^(15) \) bits

k = 4096 = \(2^(12)\)

First, let's find the weight of one character using the formula:

\(i = \frac(\mathrm I)(\mathrm k) \) = \(2^(15) \) : \(2^(12) \) = \(2^3 \) = 8 bits

\(N= 2^i \) \(2^8 =256\)

Answer: 256 characters in the alphabet.

Task No. 3. How many characters does a message written using a 16-character alphabet contain if its size is 1/16 MB?

Solution. Let us write down what is given according to the conditions of the problem and what needs to be found:

\(k = ? \)

\(i = ? \)

Let's imagine \(I = \frac(\mathrm 1)(\mathrm 16) \) MB to the power of two:

1 MB = \(2^(23)\) bits

\(I = \frac(\mathrm 1)(\mathrm 16) \) MB = \(2^(23) \) : \(2^4 \) = \(2^(19) \) bits.

First, let's find the weight of one character using the formula:

\(N= 2^i \) = \(2^4 = 16 \)

\(i = 4 \) bits = \(2^2 \)

Now let’s find the number of characters in message k:

\(k = \frac(\mathrm I)(\mathrm i) \) = \(2^{19} \) : \(2^2 \) = \(2^{17} \) = 131072

Answer: 131072 characters per message.

Amount of information

The amount of information as a measure of reducing knowledge uncertainty.
(Substantive approach to determining the amount of information)

The process of cognition of the surrounding world leads to the accumulation of information in the form of knowledge (facts, scientific theories, etc.). Obtaining new information leads to an expansion of knowledge or, as is sometimes said, to a reduction in the uncertainty of knowledge. If some message leads to a decrease in the uncertainty of our knowledge, then we can say that such a message contains information.

For example, after taking a test or completing a test, you are tormented by uncertainty; you do not know what grade you received. Finally, the teacher announces the results, and you receive one of two information messages: "pass" or "fail", and after the test, one of four information messages: "2", "3", "4" or "5".

An information message about a grade for a test leads to a reduction in the uncertainty of your knowledge by half, since one of two possible information messages is received. An information message about a grade for a test leads to a fourfold reduction in the uncertainty of your knowledge, since one of four possible information messages is received.

It is clear that the more uncertain the initial situation is (the more information messages are possible), the more new information we will receive when receiving an information message (the more times the uncertainty of knowledge will decrease).

Amount of information can be considered as a measure of reducing knowledge uncertainty when receiving information messages.

The approach to information discussed above as a measure of reducing the uncertainty of knowledge allows us to quantitatively measure information. There is a formula that relates the number of possible information messages N and the amount of information I carried by the received message:

N=2 i

(1.1)

Bit. To quantify any quantity, you must first determine the unit of measurement. So, to measure length, the meter is selected as the unit, to measure mass - kilogram, etc. Similarly, to determine the amount of information, you must enter a unit of measurement.

Behind unit of information quantity the amount of information that is contained in the information message is accepted, reducing the uncertainty of knowledge by half. This unit is called bit.

If we return to the receipt of an information message about the test results discussed above, then here the uncertainty is reduced by half and, therefore, the amount of information that the message carries is equal to 1 bit.

Derived units for measuring the amount of information. The smallest unit of measurement of the amount of information is a bit, and the next largest unit is a byte, and:

1 byte = 8 bits = 2 3 bits.

In computer science, the system for forming multiple units of measurement is somewhat different from that accepted in most sciences. Traditional metric systems of units, for example the International System of Units SI, use a factor of 10 n as multiples of units, where n = 3, 6, 9, etc., which corresponds to the decimal prefixes “Kilo” (10 3), “Mega” (10 6), "Giga" (10 9), etc.

In a computer, information is encoded using a binary sign system, and therefore, in multiple units of measurement of the amount of information, a factor of 2 n is used

Thus, units of measurement of the amount of information that are multiples of a byte are entered as follows:

1 kilobyte (KB) = 2 10 bytes = 1024 bytes;

1 megabyte (MB) = 2 10 KB = 1024 KB;

1 gigabyte (GB) = 2 10 MB = 1024 MB.

Control questions

Determining the amount of information

Determining the number of information messages. Using formula (1.1), you can easily determine the number of possible information messages if the amount of information is known. For example, in an exam you take an exam card, and the teacher tells you that the visual information message about its number carries 5 bits of information. If you want to determine the number of exam tickets, then it is enough to determine the number of possible information messages about their numbers using formula (1.1):

Thus, the number of exam tickets is 32.

Determining the amount of information. On the contrary, if the possible number of information messages N is known, then to determine the amount of information carried by the message, it is necessary to solve the equation for I.

Imagine that you control the movement of a robot and can set the direction of its movement using information messages: "north", "northeast", "east", "southeast", "south", "southwest", " west" and "northwest" (Fig. 1.11). How much information will the robot receive after each message?

There are 8 possible information messages, so formula (1.1) takes the form of an equation for I:

Let's factor the number 8 on the left side of the equation and present it in power form:

8 = 2 × 2 × 2 = 2 3 .

Our equation:

The equality of the left and right sides of the equation is true if the exponents of the number 2 are equal. Thus, I = 3 bits, i.e., the amount of information that each information message carries to the robot is equal to 3 bits.

Alphabetical approach to determining the amount of information

With the alphabetical approach to determining the amount of information, one abstracts from the content of the information and considers the information message as a sequence of signs of a certain sign system.

Information capacity of a sign. Let's imagine that it is necessary to transmit an information message through an information transmission channel from the sender to the recipient. Let the message be encoded using a sign system whose alphabet consists of N characters (1, ..., N). In the simplest case, when the length of the message code is one character, the sender can send one of N possible messages “1”, “2”, ..., “N”, which will carry the amount of information I (Fig. 1.5).

Rice. 1.5. Transfer of information

Formula (1.1) relates the number of possible information messages N and the amount of information I carried by the received message. Then, in the situation under consideration, N is the number of signs in the alphabet of the sign system, and I is the amount of information that each sign carries:

Using this formula, you can, for example, determine the amount of information that a sign carries in the binary sign system:

N = 2 => 2 = 2 I => 2 1 = 2 I => I=1 bit.

Thus, in a binary signed system, a sign carries 1 bit of information. It is interesting that the very unit of measurement of the amount of information “bit” (bit) got its name FROM the English phrase “Binary digiT” - “binary digit”.

The information capacity of the sign of the binary sign system is 1 bit.

The greater the number of signs the alphabet of a sign system contains, the greater the amount of information carried by one sign. As an example, we will determine the amount of information carried by a letter of the Russian alphabet. The Russian alphabet includes 33 letters, but in practice, only 32 letters are often used to convey messages (the letter “ё” is excluded).

Using formula (1.1), we determine the amount of information carried by a letter of the Russian alphabet:

N = 32 => 32 = 2 I => 2 5 = 2 I => I=5 bits.

Thus, a letter of the Russian alphabet carries 5 bits of information (with an alphabetic approach to measuring the amount of information).

The amount of information a sign carries depends on the likelihood of its receipt. If the recipient knows in advance exactly what sign will come, then the amount of information received will be equal to 0. On the contrary, the less likely it is to receive a sign, the greater its information capacity.

In Russian written speech, the frequency of use of letters in the text is different, so on average, per 1000 characters of a meaningful text there are 200 letters “a” and a hundred times less number of letters “f” (only 2). Thus, from the point of view of information theory, the information capacity of the characters of the Russian alphabet is different (the letter “a” has the smallest, and the letter “f” has the largest).

The amount of information in the message. A message consists of a sequence of characters, each of which carries a certain amount of information.

If the signs carry the same amount of information, then the amount of information I c in the message can be calculated by multiplying the amount of information I z carried by one sign by the code length (number of characters in the message) K:

I c = I × K

Thus, each digit of a binary computer code carries information of 1 bit. Consequently, two digits carry information in 2 bits, three digits - in 3 bits, etc. The amount of information in bits is equal to the number of digits of the binary computer code (Table 1.1).

Table 1.1. The amount of information carried by a binary computer code

To measure length there are units such as millimeter, centimeter, meter, kilometer. It is known that mass is measured in grams, kilograms, centners and tons. The passage of time is expressed in seconds, minutes, hours, days, months, years, centuries. The computer works with information and there are also corresponding units of measurement to measure its volume.

We already know that the computer perceives all information.

Bit is the minimum unit of measurement of information corresponding to one binary digit (“0” or “1”).

Byte consists of eight bits. Using one byte, you can encode one character out of 256 possible (256 = 2 8). Thus, one byte is equal to one character, that is, 8 bits:

1 character = 8 bits = 1 byte.

A letter, number, punctuation mark are symbols. One letter - one symbol. One number is also one symbol. One punctuation mark (either a period, a comma, a question mark, etc.) is again one character. One space is also one character.

The study of computer literacy involves consideration of other, larger units of measurement of information.

Byte table:

1 byte = 8 bits

1 KB (1 Kilobyte) = 2 10 bytes = 2*2*2*2*2*2*2*2*2*2 bytes =
= 1024 bytes (approximately 1 thousand bytes – 10 3 bytes)

1 MB (1 Megabyte) = 2 20 bytes = 1024 kilobytes (approximately 1 million bytes - 10 6 bytes)

1 GB (1 Gigabyte) = 2 30 bytes = 1024 megabytes (approximately 1 billion bytes - 10 9 bytes)

1 TB (1 Terabyte) = 2 40 bytes = 1024 gigabytes (approximately 10 12 bytes). Terabyte is sometimes called ton.

1 Pb (1 Petabyte) = 2 50 bytes = 1024 terabytes (approximately 10 15 bytes).

1 Exabyte= 2 60 bytes = 1024 petabytes (approximately 10 18 bytes).

1 Zettabyte= 2 70 bytes = 1024 exabytes (approximately 10 21 bytes).

1 Yottabyte= 2 80 bytes = 1024 zettabytes (approximately 10 24 bytes).

In the table above, powers of two (2 10, 2 20, 2 30, etc.) are the exact values of kilobytes, megabytes, gigabytes. But the powers of the number 10 (more precisely, 10 3, 10 6, 10 9, etc.) will already be approximate values, rounded down. So 2 10 = 1024 bytes represents the exact value of a kilobyte, and 10 3 = 1000 bytes is the approximate value of a kilobyte.

Such approximation (or rounding) is quite acceptable and generally accepted.

Below is a table of bytes with English abbreviations (in the left column):

1 Kb ~ 10 3 b = 10*10*10 b= 1000 b – kilobyte

1 Mb ~ 10 6 b = 10*10*10*10*10*10 b = 1,000,000 b – megabyte

1 Gb ~ 10 9 b – gigabyte

1 Tb ~ 10 12 b – terabyte

1 Pb ~ 10 15 b – petabyte

1 Eb ~ 10 18 b – exabyte

1 Zb ~ 10 21 b – zettabyte

1 Yb ~ 10 24 b – yottabyte

Above in the right column are the so-called “decimal prefixes”, which are used not only with bytes, but also in other areas of human activity. For example, the prefix “kilo” in the word “kilobyte” means a thousand bytes, just as in the case of a kilometer it corresponds to a thousand meters, and in the example of a kilogram it equals a thousand grams.

To be continued…

The question arises: is there a continuation of the byte table? In mathematics there is a concept of infinity, which is symbolized as an inverted figure eight: ∞.

It is clear that in the byte table you can continue to add zeros, or rather, powers to the number 10 in this way: 10 27, 10 30, 10 33 and so on ad infinitum. But why is this necessary? In principle, terabytes and petabytes are enough for now. In the future, perhaps even a yottabyte will not be enough.

Finally, a couple of examples of devices that can store terabytes and gigabytes of information.

There is a convenient “terabyte” - an external hard drive that connects via a USB port to the computer. You can store a terabyte of information on it. It is especially convenient for laptops (where changing the hard drive can be problematic) and for backing up information. It is better to back up information in advance, rather than after everything is lost.

Flash drives come in 1 GB, 2 GB, 4 GB, 8 GB, 16 GB, 32 GB, 64 GB and even 1 terabyte.

The purpose of the lesson:

Have an idea of the alphabetical approach to determining the amount of information;
Know the formula for determining the number of information messages, the amount of information in messages;
Be able to solve problems to determine the number of information messages and the amount of information that the received message carries.

During the classes

1. Updating knowledge:

Guys, let's watch what we see outside the window. What can you say about nature? (Winter came.)
- But why did you decide that winter had come? (It's cold, it's snowing.)
- But nowhere is it written that these are signs of winter. (But we know what it all means: winter has come.)

Therefore, it turns out that the knowledge that we extract from the surrounding reality is information. (slide 1)

Warm up.

Fill out the table and use arrows to show the matches.

Is it possible to measure the amount of information and how to do it? (Yes)

It turns out that information can also be measured and its quantity found.

There are two approaches to measuring information. We will meet one of them today. (Look at the application slide 2)

2. Studying new material.

How can you find the amount of information?

Let's look at an example.

We have a short text written in Russian. It consists of letters of the Russian alphabet, numbers, and punctuation marks. For simplicity, we will assume that characters are present in the text with equal probability.

The set of characters used in text is called alphabet.

In computer science, the alphabet means not only letters, but also numbers, punctuation marks, and other special characters.

The alphabet has a size (full number of characters) which is called the power of the alphabet. With the alphabetic approach, it is believed that each character of the text has a certain “information weight”. As the power of the alphabet increases, the information weight of the symbols of this alphabet increases.

Let us denote the power of the alphabet by N.

Let's find the relationship between the information weight of the symbol (i) and the power of the alphabet (N). The smallest alphabet contains 2 characters, which are designated “0” and “1”. The information weight of a symbol of the binary alphabet is taken as a unit of information and is called 1 bit. (See attachment slide 3)

N	2	4	8	16	32	64	128	256
i	1bit	2bit	3bit	4bit	5bit	6bit	7bit	8bit

The computer also uses its own alphabet, which can be called computer. The number of characters it includes is 256 characters. This is the power of the computer alphabet.

We also found that 256 different characters can be encoded using 8 bits.

8 bits is such a characteristic value that it was given its own name - byte.

1 byte = 8 bits

Using this fact: you can quickly calculate the amount of information contained in computer text, i.e. in text typed using a computer, given that most articles, books, publications, etc. written using text editors, then in this way you can find the information volume of any message created in a similar way.

Let's see the rule for measuring information from the point of view of the alphabetical approach on the slide. (See attachment slide 4)

Example:

Find the information volume of a page of computer text.

Solution:

Let's use the rule.

1. Find the power: N=256
2. Find the information volume of one character: N= 2 i i = 8 bits = 1 byte.
3. Find the number of characters on the page. Approximately.

(Find the number of characters in a line and multiply by the number of lines)

Explanation:

Let children choose a random string and count the number of characters in it, taking into account all punctuation marks and spaces.

40 characters * 50 lines = 2000 characters.

4. Find the information volume of the entire page: 2000 * 1 = 2000 bytes

Agree that a byte is a small unit of information. To measure large amounts of information, the following units are used (See attachment slide5)

3. Consolidation of the studied material.

On the desk:

Fill in the blanks with numbers and check for accuracy.

1 KB = ___ byte = ______bit,
2 KB = _____ byte =______ bits,
24576 bits =_____bytes =_____Kbytes,
512 KB = ___ bytes = ____bits.

Students are offered tasks:

1) The message is written using an alphabet containing 8 characters. How much information does one letter of this alphabet carry?

Solution: N=8, then i= 3 bits

2) A message written in letters from the 128-character alphabet contains 30 characters. How much information does it carry?

1. N= 128, K=30
2. N= 2 i i= 7 bits (volume of one character)
3. I = 30*7 = 210bit (volume of the entire message)

4. Creative work.

Type text on your computer whose information volume is 240 bytes.

5. Lesson summary.

What new did we learn in class today?
- How is the amount of information determined from an alphabetical point of view?
- How to find the power of the alphabet?
- What is 1 byte equal to?

6. Homework (See attachment slide 6).

Learn a rule for measuring information in terms of the alphabetical approach.

Learn units of measurement of information.

Solve a problem:

1) The capacity of some alphabet is 64 characters. What will be the amount of information in a text consisting of 100 characters?
2) The information volume of the message is 4096 bits. It contains 1024 characters. What is the power of the alphabet with which this message is composed?

And many other concepts have very direct connections with each other. Very few users today are well versed in these issues. Let's try to clarify what the power of the alphabet is, how to calculate it and apply it in practice. In the future, this, without a doubt, may be useful in practice.

How information is measured

Before we begin to study the question of what the power of the alphabet is, and what it is in general, we should start, so to speak, with the basics.

Surely everyone knows that today there are special systems for measuring any quantities based on reference values. For example, for distances and similar quantities these are meters, for mass and weight - kilograms, for time intervals - seconds, etc.

But how do you measure information in terms of text volume? This is precisely why the concept of alphabet power was introduced.

What is the power of the alphabet: an initial concept

So, if we follow the generally accepted rule that the final value of any quantity is a parameter that determines how many times the reference unit is contained in the measured quantity, we can conclude: the power of the alphabet is the total number of symbols used for a particular language.

To make it clearer, let’s leave the question of how to find the power of the alphabet aside for now, and pay attention to the symbols themselves, naturally, from the point of view of information technology. Roughly speaking, the full list of used characters contains letters, numbers, all kinds of brackets, special characters, punctuation marks, etc. However, if we approach the question of what the power of the alphabet is in a computer way, we should also include a space (a single gap between words or other characters).

Let's take the Russian language, or rather, the keyboard layout, as an example. Based on the above, the complete list contains 33 letters, 10 numbers and 11 special characters. Thus, the total power of the alphabet is 54.

Information weight of characters

However, the general concept of the power of the alphabet does not define the essence of computing information volumes of text containing letters, numbers and symbols. This requires a special approach.

Basically, think about it, what could be the minimum set from the point of view of a computer system, how many characters can it contain? Answer: two. And that's why. The fact is that each symbol, be it a letter or a number, has its own information weight, by which the machine recognizes what exactly is in front of it. But the computer only understands representation in the form of ones and zeros, which, in fact, is what all computer science is based on.

Thus, any character can be represented as sequences containing the numbers 1 and 0, that is, the minimum sequence denoting a letter, number or symbol consists of two components.

The information weight itself, taken as a standard information unit of measurement, is called a bit (1 bit). Accordingly, 8 bits make 1 byte.

Representation of characters in binary code

So, what is the power of the alphabet, I think, is already a little clear. Now let's look at another aspect, in particular the practical representation of power using As an example, for simplicity, let's take an alphabet containing only 4 characters.

In a two-digit binary code, the sequence and their information representation can be described as follows:

Serial number
Binary code

Hence the simplest conclusion: with the alphabet power N=4, the weight of a single character is 2 bits.

If we use a three-digit binary code for an alphabet with, for example, 8 characters, the number of combinations will be as follows:

Serial number
Binary code

In other words, with the alphabet power N=8, the weight of one symbol for a three-digit binary code will be equal to 3 bits.

alphabet and use it in a computer expression

Now let's try to look at the relationship expressed by the number of characters in the code and the power of the alphabet. The formula, where N is the alphabetic power of the alphabet, and b is the number of characters in the binary code, will look like this:

That is, 2 1 =2, 2 2 =4, 2 3 =8, 2 4 =16, etc. Roughly speaking, the required number of characters of the binary code itself is the weight of the symbol. In information terms it looks like this:

Measuring information volume

However, these were just the simplest examples, so to speak, for an initial understanding of what the power of the alphabet is. Let's move on to practice.

At this stage of development of computer technology for typing text, taking into account capital letters, uppercase and Cyrillic and Latin letters, punctuation marks, brackets, arithmetic symbols, etc. 256 characters are used. Based on the fact that 256 is 2 8, it is not difficult to guess that the weight of each character in such an alphabet is 8, that is, 8 bits or 1 byte.

Based on all known parameters, we can easily obtain the desired information volume of any text. For example, we have a computer text containing 30 pages. One page contains 50 lines of 60 any characters or symbols, including spaces.

Thus, one page will contain 50 x 60 = 3,000 bytes of information, and the entire text will contain 3,000 x 50 = 150,000 bytes. As you can see, measuring even small texts in bytes is inconvenient. What about entire libraries?

In this case, it is better to convert the volume into more powerful units - kilobytes, megabytes, gigabytes, etc. Based on the fact that, for example, 1 kilobyte is equal to 1024 bytes (2 10), and a megabyte is 2 10 kilobytes (1024 kilobytes), it is easy to calculate that the volume of text in information and mathematical expression for our example will be 150000/1024 = 146, 484375 kilobytes or approximately 0.14305 megabytes.

Instead of an afterword

In general, this is briefly all that concerns the consideration of the question of what the power of the alphabet is. It remains to add that in this description a purely mathematical approach was used. It goes without saying that the semantic load of the text is not taken into account in this case.

But, if we approach issues of consideration precisely from a position that gives a person something to comprehend, a set of meaningless combinations or sequences of symbols in this regard will have zero information load, although, from the point of view of the concept of information volume, the result can still be calculated.

In general, knowledge about the power of the alphabet and related concepts is not so difficult to understand and can simply be applied in the sense of practical actions. Moreover, any user encounters this almost every day. It is enough to give the example of the popular Word editor or any other editor of the same level that uses such a system. But don't confuse it with regular Notepad. Here the power of the alphabet is lower, since typing does not use, say, capital letters.

What is file size in computer science. Units of information volume measurement

Topic: “Measuring information”

Formulas

Tasks

Amount of information

The amount of information as a measure of reducing knowledge uncertainty. (Substantive approach to determining the amount of information)

Alphabetical approach to determining the amount of information

Byte table:

To be continued…

How information is measured

What is the power of the alphabet: an initial concept

Information weight of characters

Representation of characters in binary code

alphabet and use it in a computer expression

Measuring information volume

Instead of an afterword

The amount of information as a measure of reducing knowledge uncertainty.
(Substantive approach to determining the amount of information)