Exists and Not Exists operators. Using the EXISTS Operator The most simple queries with a predicate SQL EXISTS

Difference between EXISTS and IN in SQL? (fourteen)

What is the difference between an Exists and In supply in SQL?

When should we use EXISTS, and when should we use in?

Exists is much faster than in when the results of the subquery are very high.
In faster than EXISTS When the subquering results are very small.

CREATE TABLE T1 (ID INT, TITLE VARCHAR (20), SOMEINTCOL INT) GO CREATE TABLE T2 (ID INT, T1ID INT, SOMEDATA VARCHAR (20)) GO INSERT INTO T1 SELECT 1, "TITLE 1", 5 UNION ALL SELECT 2 , "Title 2", 5 Union All Select 3, "Title 3", 5 Union All Select 4, "Title 4", 5 Union All Select Null, "Title 5", 5 Union All Select Null, "Title 6", 5 INSERT INTO T2 SELECT 1, 1, "DATA 1" UNION ALL SELECT 2, 1, "DATA 2" UNION ALL SELECT 3, 2, "DATA 3" UNION ALL SELECT 4, 3, "DATA 4" UNION ALL SELECT 5 , 3, "Data 5" Union All Select 6, 3, "Data 6" Union All Select 7, 4, "Data 7" Union All Select 8, Null, "Data 8" Union All Select 9, 6, "Data 9 "Union All Select 10, 6," Data 10 "Union All Select 11, 8," Data 11 "

Request 1.

SELECT FROM T1 WHERE NOT EXISTS (SELECT * FROM T2 WHERE T1.ID \u003d T2.T1ID)

Request 2.

SELECT T1. * From T1 WHERE T1.ID NOT IN (SELECT T2.T1ID FROM T2)

If in T1 your identifier has a zero value, Query 1 will find them, but Query 2 will not be able to find zero parameters.

I mean that in cannot compare anything with zero, so it does not have the result for NULL, but exists can compare everything with zero.

Exists faster than in. If most filter criteria are in subquery, then it is better to use IN and if most filter criteria are in the main request, it is better to use EXISTS.

If you are using the IN statement, the SQL mechanism will scan all records extracted from the internal request. On the other hand, if we use EXISTS, SQL mechanism will stop the scanning process as soon as the coincidence is found.

If the subquery returns more than one value, you may need to perform an external query - if the values \u200b\u200bin the column specified in the condition correspond to any value in the resulting set of subquery. To accomplish this task, you need to use a keyword.

You can use a subquery to check if there is a set of records. To do this, you need to use the Exists offer with a subquery. The Exists keyword always returns True or False.

Which one depends faster on the number of requests received by the internal request:

When your internal request chooses a thousand lines, then EXIST will be the best choice.
When your internal query chooses several lines, then in will be faster

Exist evaluates the value of true or false, but in compares the multiple value. When you do not know that the recording exists or not, you must choose Exist

The keyword exists evaluates true or false value, but the key word in compares all values \u200b\u200bin the appropriate subquery column. Another SELECT 1 can be used with the EXISTS command. Example:

Select * from Temp1 WHERE EXISTS (SELECT 1 From Temp2 Where Conditions ...)

But in less effective, so exists faster.

Based rule Optimizer :

Exists is much faster than in when the subquering results are very high.
In faster than EXISTS, when the sub-request results are very small.

Based optimizer costs :

There is no difference.

According to my information, when the subquery returns the NULL value, the entire operator becomes NULL. In this case, we use the keyword EXITS. If we want to compare certain values \u200b\u200bin the subqueries, then we use the key word in.

Exists is when you need to compare the results of the request with another subquery. The results of the query # 1 must be obtained when the maintenance results are coincided. View to join .. For example, select Customer Table No. 1, which also placed the table of applications # 2

In it is necessary to extract if the value of a specific column is listed (1,2,3,4,5). For example, select customers that are in the following zipcodes, that is, zip_code values \u200b\u200bare in the list (....).

When to use one over the other ... When you feel that it reads accordingly (communicates with the best intentions).

I found that using the keywords Exists is often very slow (it is very true in Microsoft Access). Instead, I use the unification operator as follows:

I assume that you know what they are doing, and therefore are used differently, so I will understand your question as follows: when it would be a good idea to rewrite SQL to use in instead of exists or vice versa.

Is this a fair assumption?

Change : The reason I ask is that in many cases you can rewrite SQL based on IN to use exists instead, and vice versa, and for some database engines, the query optimizer will process two differently.

For example:

Select * from Customers WHERE EXISTS (Select * from Orders Where Orders.customerid \u003d Customers.id)

you can rewrite in:

SELECT * from Customers WHERE ID IN (Select CustomerID from Orders)

or with a connection:

Select Customers. * From customers inner Join Orders on customers.id \u003d Orders.customerid

Thus, my question still stands whether the original poster is a question about what makes in and exists, and, therefore, how to use it, or he will ask to rewrite SQL, using IN to use exists instead, or vice versa, will good idea?

I think it has a direct answer. Why don't you check it out of people who have developed this feature in their systems?

If you are the developer MS SQL, here is the answer from Microsoft.

Determines whether the specified value corresponds to any value in the subquery or list.

Specifies a subquery to verify the existence of strings.

In Certain Circumstances, IT is Better to Use in Rather Than Exists. In General, If The Selective Predicate IS in The Subquery, Then Use in. IF The Selective Predicate Is In The Parent Query, Then Use Exists.

IN. Supports only relations of equality (or inequality, when precedes Not.).
This is synonym \u003d any / \u003d Some , eg

SELECT * FROM T1 WHERE X IN (SELECT X FROM T2);

EXISTS. Supports options for types of relationships that cannot be expressed using IN. , eg -

SELECT * FROM T1 WHERE EXISTS (SELECT NULL FROM T2 WHERE T2.X \u003d T1.X AND T2.Y\u003e T1.Y AND T2.Z LIKE "℅" || T1.Z || "℅");

And on another note -

Estimated performance and technical differences between EXISTS. and IN. These may arise due to specific implementations / limitations / errors of a particular supplier, but in many cases it is nothing but the myths created due to an insufficient understanding of the internal databases.

The accuracy of defining tables, the accuracy of statistics, the database configuration and the version of the optimizer affect the execution plan and, therefore, on performance indicators.

"It used to be easier" - I thought, sitting down for optimizing the next request in SQL Management Studio. When I wrote under MySQL, it was really easier - or works, or not. Or slows down or not. Explain solved all my problems, nothing more was required. Now I have a powerful development, debugging and optimization environment and procedures / functions, and all this jumble creates in my opinion only more problems. And why? Because the built-in query optimizer is evil. If in MySQL and PostgreSQL I will write

SELECT * FROM A, B, C WHERE A.ID \u003d B.ID, B.ID \u003d C.ID

and in each of the tables there will be at least 5k lines - everything will hang. And thank God! Because otherwise in the developer, at best, it is produced to write a lot, and in the worst it does not understand what it does! After all, the same query in MSSQL will be similar

SELECT * FROM A JOIN B ON A.ID \u003d B.ID JOIN C ON B.ID \u003d C.ID

The built-in optimizer will take place by the cattle and everything will be the ok.

He will also decide that it is better to do - exist or join and a lot more. And everything will work as optimally optimally.

Only there is one but. At one fine moment, the optimizer will turn into a complex query and will save, and then you will get a honeycomb problem. And you get it, perhaps not immediately, but when the weight of the table reaches the critical mass.

So, by the essence of the article. EXISTS and IN are very heavy operations. In fact, this is a separate subquery for each Stitching results. And if there is also nesting, then this is generally a carcass light. Everything will be the okay when 1, 10, 50 lines returns. You will not feel the difference, and maybe Join will even slowly. But when it takes out 500 - problems will begin. 500 subqueries within a single request is serious.

Let from the point of view of human understanding in and exists are better, but from the point of view of temporary costs for requests that return 50+ strings - they are not allowed.

It is necessary to make a reservation, which naturally, if somewhere decreases - somewhere should arrive. Yes, JOIN is more resource-in memory, because it is to keep the entire table of values \u200b\u200band operate it - overlayer than to pull the subqueries for each line, quickly freeing the memory. It is necessary to watch specifically on request and measure whether the use of unnecessary memory is critical or not.

I will give examples of complete analogies. Generally speaking, I have not met more queries of such a degree of difficulty that could not be unwound into Join'ov Cascade. Let it go for it, but everything can be revealed.

SELECT * from a where a.id in (SELECT ID FROM B) SELECT * FROM A WHERE EXISTS (SELECT TOP 1 1 FROM B WHERE B.ID \u003d A.ID) SELECT * FROM A JOIN B ON A.ID \u003d B. ID SELECT * FROM A WHERE A.ID NOT IN (SELECT ID FROM B) SELECT * FROM A WHERE NOT EXISTS (SELECT TOP 1 1 FROM B WHERE B.ID \u003d A.ID) SELECT * FROM A LEFT JOIN B ON A. ID \u003d B.ID WHERE B.ID IS NULL

I repeat - these examples of MSSQL optimizer optimizes for maximum performance and never will never be on such simple queries.

Consider now an example of a real request, which had to rewrite due to the fact that in some samples he simply depended on (the structure is very simplified and the concepts are replaced, it is not necessary to scare some not the optimality of the database structure).

You need to pull out all the duplicates of "products" in different accounts, focusing on the parameters of the product, its group, and parent group, if there is such.

Select d.PRODUCT_ID from PRODUCT s, PRODUCT_GROUP sg left join M_PG_DEPENDENCY sd on (sg.PRODUCT_GROUP_ID \u003d sd.M_PG_DEPENDENCY_CHILD_ID), PRODUCT d, PRODUCT_GROUP dg left join M_PG_DEPENDENCY dd on (dg.PRODUCT_GROUP_ID \u003d dd.M_PG_DEPENDENCY_CHILD_ID) where s.PRODUCT_GROUP_ID \u003d sg .PRODUCT_GROUP_ID and d.PRODUCT_GROUP_ID \u003d dg.PRODUCT_GROUP_ID and sg.PRODUCT_GROUP_PERSPEC \u003d dg.PRODUCT_GROUP_PERSPEC and sg.PRODUCT_GROUP_NAME \u003d dg.PRODUCT_GROUP_NAME and s.PRODUCT_NAME \u003d d.PRODUCT_NAME and s.PRODUCT_TYPE \u003d d.PRODUCT_TYPE and s.PRODUCT_IS_SECURE \u003d d.PRODUCT_IS_SECURE and s.PRODUCT_MULTISELECT \u003d d.PRODUCT_MULTISELECT and dg.PRODUCT_GROUP_IS_TMPL \u003d 0 and ((sd.M_PG_DEPENDENCY_CHILD_ID is null and dd.M_PG_DEPENDENCY_CHILD_ID is null) or exists (select 1 from PRODUCT_GROUP sg1, PRODUCT_GROUP dg1 where sd.M_PG_DEPENDENCY_PARENT_ID \u003d sg1.PRODUCT_GROUP_ID and dd .M_pg_dependency_parent_id \u003d dg1.product_group_id and sg1.product_group_perspec \u003d dg1.product_group_perspec and sg1.product_group_name \u003d DG1.Product_Group_n AME and))

So this is the case when the optimizer saved. And for each line, heavy exists were performed, which killed the base.

Select d.PRODUCT_ID from PRODUCT s join PRODUCT d on s.PRODUCT_TYPE \u003d d.PRODUCT_TYPE and s.PRODUCT_NAME \u003d d.PRODUCT_NAME and s.PRODUCT_IS_SECURE \u003d d.PRODUCT_IS_SECURE and s.PRODUCT_MULTISELECT \u003d d.PRODUCT_MULTISELECT join PRODUCT_GROUP sg on s.PRODUCT_GROUP_ID \u003d sg.PRODUCT_GROUP_ID join PRODUCT_GROUP dg on d.PRODUCT_GROUP_ID \u003d dg.PRODUCT_GROUP_ID and sg.PRODUCT_GROUP_NAME \u003d dg.PRODUCT_GROUP_NAME and sg.PRODUCT_GROUP_PERSPEC \u003d dg.PRODUCT_GROUP_PERSPEC left join M_PG_DEPENDENCY sd on sg.PRODUCT_GROUP_ID \u003d sd.M_PG_DEPENDENCY_CHILD_ID left join M_PG_DEPENDENCY dd on dg.PRODUCT_GROUP_ID \u003d dd.M_PG_DEPENDENCY_CHILD_ID left join PRODUCT_GROUP sgp on sgp.PRODUCT_GROUP_ID \u003d sd.M_PG_DEPENDENCY_PARENT_ID left join PRODUCT_GROUP dgp on dgp.PRODUCT_GROUP_ID \u003d dd.M_PG_DEPENDENCY_PARENT_ID and sgp.PRODUCT_GROUP_NAME \u003d dgp.PRODUCT_GROUP_NAME and isnull (sgp.PRODUCT_GROUP_IS_TMPL, 0) \u003d isnull (dgp. Product_Group_is_TMPL, 0) WHERE (SD.M_PG_DEPENDENCY_CHILD_ID IS NULL AND DD.M_PG_DEPENDENCY_CHILD_ID IS NULL) OR (sgp.product_group_name is not null and dgp.product_group_name is not null) Go

After these transformations, the productivity of the recesses increased exponentially the number of products found. Rather, the search time remained almost independent of the number of coincidences and was always very small. As it should be.

This is a good example of how MSSQL confidence is an optimizer can play a cruel joke. Do not trust him, do not be lazy, Join'te handles, think every time that it is better in this situation - exists, in or join.

SQL exists predicate performs a logical task. In SQL queries, this predicate is used in the expressions of the species

Exists (Select * from name_Table ...).

This expression returns the truth when one or more lines corresponding to the condition are found on request, and a lie, when no string is found.

For not exists, the other way around. Expression

Not Exists (Select * from Name_Table ...)

returns the truth when no string is found on request, and a lie when at least one line was found.

The most simple queries with the predicate SQL EXISTS

In the examples, we work with the library database and its books in the Book in Use (Bookinuse) and "User" (User). While we need only the book "Book In Bookinuse).

Author	Title	Pubyear.	Inv_no.	User_id
Tough	War and Peace	2005	28	65
Chekhov	The Cherry Orchard	2000	17	31
Chekhov	Selected stories	2011	19	120
Chekhov	The Cherry Orchard	1991	5	65
Ilf and Petrov	The twelve Chairs	1985	3	31
Mayakovsky	Pois	1983	2	120
Parsnip	Doctor Zhivago	2006	69	120
Tough	Sunday	2006	77	47
Tough	Anna Karenina	1989	7	205
Pushkin	Captain's daughter	2004	25	47
Gogol.	Pieces	2007	81	47
Chekhov	Selected stories	1987	4	205
Parsnip	Favorites	2000	137	18

Example 1.Determine the ID of the users who are issued to Tolstoy books that are also issued by the books of Chekhov. In an external query, data is selected on users who have been issued by Tolstoy books, and Exists predicate sets the additional condition that is checked in in an internal request - users who are issued by Chekhov's books. An additional condition in the internal request is the coincidence of user identifiers from external and internal requests: user_id \u003d tols_user.user_id. The request will be as follows:

This query will return the following result:

EXISTS and IN predicate differences

When you first look at requests with predicate EXISTS, it is impressed that it is identical predicate in . This is not true. Although they are very similar. In predicate in the search for values \u200b\u200bfrom the range specified in its argument, and if there are such values, then all rows corresponding to this range are selected. The result of the EXISTS predicate action is the answer "yes" or "no" to the question of whether there are any meanings that are suitable in the argument. In addition, in front of the IN predicate, the name of the column on which the rows should be seen corresponding to the values \u200b\u200bin the range. We will analyze an example showing the difference between the predicate of EXISTS from an in predicate, and the task solved using an in predicate.

Example 4. Determine the ID of the users who are issued by the books of the authors whose books are issued to the user with ID 31. The request will be as follows:

User_id

120

205

Internal request (after IN) selects the authors: Chekhov; Ilf and Petrov. The external request selects all users who have been issued by books of these authors. We see that, unlike the predicate of EXISTS, the predicate IN is preceded by the column name, in this case - Author.

Requests with predicate Exists and additional conditions

If in addition to the Exist predicate in the query, apply at least one additional condition, for example, specified by aggregate functions Such requests can be used for simple data analysis. We will demonstrate this in the following example.

Example 5.Determine the ID of users who have been issued at least one book of Pasternak, and at the same time more than 2 books have been issued. We write the following request, in which the first condition is set by an exist predicate with an invested request, and the second condition with the Having operator should always follow after the invested request:

Request Result:

User_id

120

As can be seen from the Bookinuse table, the book of Pasternak is also issued to the user with ID 18, but only one book is issued and it does not fall into the sample. If you apply the COUNT function to a similar query once again, but already for the selected lines (practice yourself), you can also get information about how many users reading the books of Pasternak, while reading the books of other authors. This is already from the scope of data analysis.

Requests with predicate EXISTS to two tables

EXISTS predicate requests can retrieve data from more than one table. Many tasks can be solved with the same result using operator Join. But in some cases, the use of exists allows you to make a less cumbersome request. Use Exists Preferred in cases where the columns will fall into the resulting table only from one table.

In the following example, from the same database, in addition to the Bookinuse table, the "User" table will also be required.

The result of the query will be the following table:

Author

Chekhov

Mayakovsky

Parsnip

As in the case of using the JOIN operator, in cases of more than one table, you should use table aliases to verify the compliance of key values \u200b\u200bconnecting the tables. In our example of table pseudonyms - bk and us, and the key connecting the tables - user_id.

Exist predicate in connections of more than two tables

Now we will see more of it, why use exists preferred in cases where the columns will fall into the resulting table only from one table.

We work with the database "Real Estate". The DeAL table contains data on transactions. For our tasks in this table will be an important Type column with data about the type of transaction - sale or rent. The Object table contains data on objects. In this table, we will need the values \u200b\u200bof the ROOMS columns (number of rooms) and the logbalc containing data on the presence of a loggia or a balcony in Boolean format: 1 (yes) or 0 (no). Tables Client, Manager and Owner contain data accordingly about clients, managers of the company and owners of real estate objects. In these tables FName and Lname, respectively, the name and surname.

Example 7. Identify customers who bought or renting objects that have no loggia or balcony. We write the following request, in which the exist predicate is set to the result of connecting two tables:

Since the columns are selected from the CLIENT table using the Star Operator, then all columns of this table will be displayed, in which there will be as many rows as customers are consistent with the condition specified by Exists predicate. From the tables to which the attached request is drawn, we do not need to output a single column. Therefore, only one column is extracted to save machine time. To do this, after the word select, a unit is prescribed. The same reception is also applied in queries in the following examples.

Write a SQL query with Exists predicate yourself, and then see the decision

We continue to write SQL requests with Exists predicate

Example 9. Determine the owners of objects that were leased. We write the following request, in which the exist predicate is also set to appeal to the result of connecting two tables:

As in the previous example, from the table to which the external request is addressed, all fields will be displayed.

Example 10. Determine the number of owners, with the objects of which conducted a Saveliev manager. We write a request in which an external request refers to a connection of three tables, and the exist predicate is set to the appeal only to one table:

All requests are checked on an existing database. Successful use!

Relational databases and SQL language

Novosibirsk State Academy of Economics and Management

Laboratory workshop on discipline

"DATABASE"

Laboratory work N 7

"SQL database language: data manipulation teams»

Novosibirsk 2000.

SQL is an abbreviated name of the structured query language (Structured Query Language). From the title of the language it is clear that its main purpose is to form requests for information from the database. Commands on the data selection constitute the basis of the DML data manipulation language - the component of the SQL language. However, DML consists not only of data sampling commands from the database. There are also command modification commands, data management and others.

The laboratory work discusses the basic means of the DML language. In the process of performing laboratory work, we will adhere to the SQL2 standard.

Due to the fact that SQL is a voluminous language, we will consider only the main commands. Various SQL specific tools are discussed in subsequent laboratory work.

To perform laboratory work requires knowledge of the basics of the relational data model, the basics of relational algebra and relational calculation, the principles of working with MS SQL Server DBMS.

As a result of the laboratory work, you are mastered ways to manipulate data using SQL commands, consider the language dialect implemented in MS SQL Server DBMS.

Introduction

SQL contains a wide range of data manipulation capabilities, both to create queries and to update the database. These capabilities are based only on the logical structure of the database, and not on its physical structure, which is consistent with the requirements of the relational model.

Initially, the SQL syntax structure was based (or at least seemed to be founded) on the relational calculus of the code. The only supported relational algebra support was united.

In SQL2, in addition to a similar relational calculation of the syntax developed in the previous standard, operations are directly implemented by combining, intersection, difference and compound. Operations of choice, design and works were maintained (and continued to be supported) almost directly, while fission and assignment operations are supported in a more cumbersome form.

First we describe the SQL query language, and then its input and data changes. Data change operations will be described last, since their structure is based on a certain extent on the structure of the query language.

Simple requests

For us simple request There will be a query that refers only to one database table. Simple requests will help us illustrate the basic structure of SQL.

Simple request. The query that refers only to one database table.

Inquiry: Who works plasters?

Where skill_type \u003d "Plastekers"

Result:

Ryikover

This request illustrates the three most commonly encountered. phrases SQL: SELECT, FROM and WHERE. Although in our example we placed them at different lines, they can all stand in the same line. They can also be placed with different retards, and words inside phrases can be separated by an arbitrary number of spaces. Consider the characteristics of each phrase.

SELECT. The SELECT phrase lists the columns that must be logged in to the resulting table. It is always columns of some relational table. In our example, the resulting table consists of one column (Name), but in general, it may contain several columns; It can also contain calculated values \u200b\u200bor constants. We will give examples of each of these options. If the resulting table must contain more than one column, then all the necessary columns are listed after the SELECT command through the comma. For example, the SELECT WORKER_ID phrase, Name will result in a table consisting of the Worker_ID and Name columns.

Phrase Select. Specifies the columns of the resulting table.

From.. The FROM phrase sets one or more tables to which the request is referred to. All columns listed in SELECT and WHERE phrases must exist in one of the tables listed in the command command. In SQL2, these tables can be directly defined in the scheme as basic tables or data representations, or they themselves may not have named tables obtained as a result of SQL requests. In the latter case, the request is explicitly cited in command.

Phrase from. Specifies the existing tables to which the request is referred to.

Where.. The WHERE phrase contains a condition. Based on which the rows of the table (tables) are selected. In our example, the condition is that the Skill_Type column must contain a "plaster" constant enclosed in apostrophes, as always with text constants in SQL. The WHERE phrase is the most changeable SQL command; It may contain a variety of conditions. Most of our presentation will be devoted to illustrations of various designs permitted in the WHERE command.

Phrase WHERE.Sets the condition based on which lines from specified tables are selected.

The SQL query above is processed by the system in the following order: from, Where, Select. To have the rows of the table specified in the command command, are placed in the workshop for processing. Then, the WHERE phrase is used to each row. All lines that do not satisfy Where condition are excluded from consideration. Then those strings that satisfy the WHERE condition are processed by the SELECT command. In our example, Name is selected from each such line, and all selected values \u200b\u200bare displayed as a query results.

Inquiry: Cancel all data on offices buildings.

Where Type \u003d "Office"

Result:

BLDG IDADRESTYPEQLTY LEVELSTATUS.

312 Ul.V., 123 Office 2 2

210 Birch ul. 1011 Office C 1

111 Osinovaya ul. 1213 Office 4 1

An asterisk (*) in the SELECT command means "the whole line". This is a convenient reduction that we will often use.

Inquiry: What is the weekly salary of each electrician?

Select Name, "Weekly Salary = ", 40 * HRLY_RATE

Where Skill_Type \u003d "Electric"

Result:

M. Faraday Weekly Salary \u003d 500.00

H.Columb weekly salary \u003d 620.00

This request illustrates the use and symbolic constants (in our example "weekly salary \u003d"), and computations in the SELECT command, inside the SELECT command can be calculated in which numeric columns and numeric constants are used, as well as standard arithmetic operators (+, -, *, /), grouped as needed using brackets. We also included the new Order BY command, which sorts the result of the query in the increasing alphanumeric order by the specified column. If you want to order results in descending, then you need to add a DESC to the command. The ORDER BY phrase can sort the results in several columns, in one - in order of increasing, according to others - in descending order. The first indicates the column of the primary sort key.

Symbolic constant. Constant consisting of letters, numbers and "special" characters.

Inquiry: Who has an hourly rate from 10 to 12 dollars?

WHERE HRLY_RATE\u003e \u003d 10 AND HRLY_RATE< - 12

Result:

Worker ID. Name HRLY_RATE SKILL_TYPE SUPV_ID

This request illustrates some additional options for the WHERE command: comparison operators and Boolean Operation and (s). To compare columns with other columns or with constants, six comparison operators can be used (\u003d,<> (not equal),<, >, <=, >\u003d). To create composite conditions or for denial, Boolean operations and (s), or (or) and not (HE) can be used. To group conditions, as usual in programming languages, brackets can be used.

Operators comparison \u003d,<>, <, >, <=, >=.

Boolean operationsAnd (s), or (or) and not (he) .

To formulate this request, you can also use the Between (Between):

Where Hrly_Rate Between 10 and 12

Between can be used to compare some quantity with two other values, the first of which is less than the second, if the compared value can be equal to each of these values \u200b\u200bor any value between them.

Request: List plasterers, roofers and electricians.

Where Skill_type In ("Plastekers", "Roofer", "Electric")

Result:

WORKER_ID NAME HRLY_RATE SKILL_TYPE SUPV_ID

1412 K.Noo 13.75 plasterers 1520

2920 R.Garret 10.00 Roofer 2920

1520 Rykovover 11.75 plasterers 1520

This request explains the use of the In (B) comparison operator. The WHERE condition is considered true if the type of specialty line is located inside the set specified in brackets, that is, if the type of specialty is plasterers, roofer or electrician. We will meet with the input operator in subqueries.

Suppose that we cannot accurately recall the writing of the specialty: "Electric" or "Electrik" or somehow. Template symbols that replace vague characters strings facilitate the search for inaccurate writing in the query.

Symbols of template.Symbols that replace undefined strings of characters.

Inquiry: List employees whose type of specialty begins with "ELE".

Where Skill_Type Like ("Elek%")

Result:

WORKER ID NAME HRLY_RATE SKILL_TYPE SUPV_ID

1235 m. Faraday 12.50 Electrician 1311

1311 x.Columba 15.50 Electrician 1311

SQL has two template symbols:% (percentage) and _ (underscore). The underscore replaces exactly one indefinite symbol. The percentage replaces an arbitrary number of characters, starting from zero. When the template symbols are used, a LIKE operator (as) is used to compare symbolic variables with constants. Other examples:

Name Like "__ COLUMB"

Name Like "__k%"

The condition in the first example is true if Name consists of two characters followed by Columbus. In the Worker table, all names begin with the first initial and point. Thus, using this condition we. We find all employees by the name "Columbus". The condition of the second example allows you to find all employees whose names begin with the letter "K".

Inquiry:Find all the work that start over the next two weeks.

Where Start _Date Between Current_Date and

Result: (Suppose that current date CURRENT DATE \u003d 10.10)

Worker_ID BLDG_ID START_DATE NUM_DAYS

1235 312 10.10 5

1235 515 17.10 22

3231 111 10.10 8

1412 435 15.10 15

3231 312 24.10 20

1311 460 23.10 24

This request illustrates the use of the Between operator (between) with the DATE type (date) and interval values. CURRENT_DATE is a function that always returns the value of today's date. Expression

CURRENT_DATE + INTERVAL "14" DAY

adds a two-week interval to the current date. Thus, the Assignment is selected (as an assumption that today 10.10) in the event that in it the value of the Start_Date column lies between 10.10 and 24.10. It can be seen from this that we can add to the fields of the values \u200b\u200bof the type of interval. Moreover, we can multiply the values \u200b\u200bof the gaps for integers. For example, suppose we want to find out what number will be through a certain number of weeks (indicated by the NUM_Weeks variable (number of weeks)). We can do it like this:

CURRENT_DATE + INTERVAL "7" DAY * NUM_WEEKS

2. Multi-fold requests

The ability to link the data elements outside the same table is important for any database language. In relational algebra, this function performs the connection operation. Although a significant part of SQL is based directly on relational calculus, SQL binds data from different tables in the same way as the operation of connecting relational algebra does. Now we will show how it is done. Consider the query:

Inquiry:

The data required for the response is in two tables: Worker and Assignment. To solve in SQL, you must list both tables in the FROM command and set a special type of WHERE condition:

SELECT SKILL_TYPE.

From Worker, Assignment

Where Worker.worker_id \u003d Assignment.Worker_ID

And blundg_id \u003d 435

What's going on here? We must consider two stages of processing the system of this request.

1. As usual, the FROM phrase is first processed. However, in this case, since the team contains two tables, the system creates a decartian product of the rows of these tables. This means that it is created (logically) one large table consisting of columns of both tables in which each line of one table is paired with each row of another table. In our example, because there are five columns in the WORKER table, and there are four columns in the Assignment table, in the Cartesian work created by the From command, there will be nine columns. The total number of decartular lines of the product is M * N, where M is the number of strings of the WORKER table; A n is the number of rows of the ASSIGNMENT table. Since in the WORKER table 7 rows, and in the ASSIGNMENT table 19 lines, then the Cartesian product will contain 7x19 or 133 lines. If the FROM command lists more than two tables, then a decartian product of all tables specified in the command is created.

Cartesian work. Result of combining each line of one table with each Row of another table.

2. After creating a giant relational table, the system, as before, uses the WHERE command. Each row of a table created by the command from. Checked for the completion of the WHERE condition. Rows that do not satisfy condition are excluded from consideration. Then the SELECT phrase is applied to the remaining rows.

The WHERE phrase in our request contains two conditions:

1. Worker. Worker_ID \u003d Assignment.worker_id

2. BLDG_ID \u003d 435

The first of these conditions is the condition of the connection. Please note that since both WORKER and ASSIGNMENT tables contain a column named Worker_ID, their Cartesian work will contain two columns with the same name. In order to distinguish them, we put in front of the name of the column name of the source table, separating its point.

The first condition means that in any selected bar, the WORKER_ID column value from the Worker table must match the WORKER_ID column value from the ASSIGNMENT table. In fact, we connect two tables by worker_id. All lines in which the values \u200b\u200bof these two columns are not equal, are excluded from the work table. Exactly the same thing happens when performing a natural connection of relational algebra. (However, some difference from the natural connection is still there: the SQL language automatically does not automatically delete the oven column of the WORKER_ID). The complete connection of these two tables with an additional condition BLDG_ID \u003d 435 is represented in Fig. 1. Application of the SELECT command will eventually, in the end, the following query result:

Skill Type

Plasterer

Roofer

An electrician

Fig. 1. Connection of WORKER and ASSIGNMENT tables

Now we will show how to attach the table to it in SQL itself.

Inquiry: List employees, specifying the names of their managers.

SELECT A.Worker_Name, B.Worker_Name

From worker a, worker in

Where b.worker_id \u003d a.supv_id

The FROM phrase in this example creates two "copies" of the Worker table, giving them aliases A and V. pseudonym - this is an alternative name given to the table. Then copies A and in Worker table are connected by the WHERE command based on the condition of the worker_id equality in B and supv_id in A. Thus, each line from A is attached to the line B, containing information about the Row Manager A (Fig. 2).

Fig. 2. Connecting two copies of the Worker table

Choosing from each line two employee name, we will get the desired list:

A.Names.name.

M. Faraday H. Columbus

K. Eleo Ryikover R. Gareret R. Garreret

P. MAYSON P.MAYSON Ryikover Ryikovener H.Columb X.Columba J. Barrister P. Maison

Pseudonym. An alternative name given to the table.

A.worker_name represents an employee, a b.worker_name represents the manager. Please note that some workers are managers themselves, which follows from the worker_id - supv_id executed in their lines.

In SQL, you can approach more than two tables at once:

Inquiry

Select Worker_Name.

From Worker, Assignment, Building

Where worker.worker_id \u003d Assignment.worker_id and assignment.bldg_id \u003d building.bldg_id and

Type \u003d "Office"

Result:

M. Faraday

Ryikover

J. Barristers

Please note that if the name of the column (for example, Worker_ID or BLDG_ID) is encountered more than in one table, then in order to avoid uncertainty, we must specify the name of the source table before the column name. But if the name of the column is found only in one table, like Type in our example, there is no uncertainty, so you do not need to specify the table name.

SQL commands of this query create one table from three relational database tables. The first two tables are connected by worker_id, after which the third table is attached to the table on the BLDG_ID. Condition

Type \u003d "Office"

where commands lead to the exception of all lines, except for lines related to office buildings. This meets the requirements of the request.

3. Subqueries

Subquery.Request inside the query

The subqueros can be placed in the WHERE query command, as a result of which the options of the WHERE command are expanding. Consider an example.

Inquiry: What are the specialty workers assigned to the 435 building?

SELECT SKTLL_TYPE.

From Worker Where Worker_ID in

(Select worker_id

Where BlDG_ID. = 435)

Subquery in this example

(Select worker_id

Where BlDG_ID. = 435)

The request containing subquery is called external request or main request. The subquery leads to the creation of the following multiple ID (identifiers) of employees:

Worker ID.

External request. The main request in which all subqueries are contained.

Then it is a lot of id takes a subqueriment in an external query. From this point on, an external request is executed using a set created by subquery. The external request processes each string of the Worker table in accordance with the WHERE condition. If the worker_id lines lies in (in) a set created by subquery, then skill_type lines are selected and displayed in the resulting table:

Skill Type

Plasterer

Roofer

An electrician

It is very important that the SELECT subquering phrase contains worker_id and only worker_id. Otherwise, the phrase Where is an external request, meaning that worker_id lies in a variety of employees IDs, I would not make sense.

Please note that the subquery can logically execute before at least one line is considered by the main request. In some sense, the subquery is independent of the main request. It can be performed as a full-fledged request. We say that this subquery is not correlated with the main request. As we will see soon, the subqueries can be correlated.

Non-corrosioned subquery.A subquery, whose value does not depend on any external request.

Let us give an example of a subquery inside subquery.

Inquiry: List employees assigned to offices.

Again, we consider the request by which we studied the connection.

Select Worker_mame.

Where worker_id in.

(Select worker_id

Where BlDG_ID IN.

Where Type \u003d "Office"))

Result:

M. Faraday

Ryikover

J. Barristers

Please note that it is never necessary to specify the name of the tables anywhere before the names of the columns, since each subquery processes one and only one table, so that no uncertainties occur.

The execution of the request occurs in the order from the inside to the outside. That is, the most internal query (or "lowest") is performed first, then the subquery containing it is performed, and then the external request.

Correlated subqueries. All subqueries reviewed above were independent of the main queries in which they were used. By independence, we mean that subqueries can be performed by themselves as a full request. Now we will turn to the consideration of the subqueries class, the results of the execution of which may depend on the line under consideration by the main request. Such subquers are called correlated subquers.

Correlated subquery. A subquery, the result of which depends on the line under consideration by the main request.

Inquiry: List employees whose hourly rates are higher than the rates of their managers.

Select Worker_Name.

Where a.hrly_rate\u003e

(SELECT B.HRLY_RATE

Where b.worker_id \u003d a.supv_id)

Result:

The logical stages of this request are as follows:

1. The system creates two copies of the WORKER table: a copy A and a copy of V. in accordance with how we determined them, but refers to the employee, in - to the manager.

2. The system then examines each line A. This string is selected if it satisfies the WHERE condition. This condition means that the string will be selected if the HRLY_RATE value in it is greater than the HRLY_RATE generated by subquery.

3. The subquery selects the HRLY_RATE value from the row in, the worker_id of which is equal to the supv_id string A, at the moment the main request under consideration. This is a HRLY_RATE manager.

Please note that since A.hrly_rate can only be compared with one value, the subquery must produce only one value. This value varies depending on which string is considered. Thus, the subquery correlates with the main request. We will meet with other examples of correlated subqueries later, when we study the built-in functions.

Exist and Not Exists operators

Suppose we want to identify workers who are not appointed to work on some building. With a superficial look, it seems that such a request is easy to perform with a simple negation of the affirmative version of the query. Suppose, for example, that we are interested in the building with BLDG_ID 435. Consider the request:

Select worker_id

WHERE BLDG_ID NOT 435

Unfortunately, this is an incorrect wording solution. The request will simply give us the ID employees working on other buildings. Obviously, some of them can also be appointed on the building 435.

In a correctly formulated solution, the NOT EXISTS operator is used (does not exist):

Select worker_id

Where Not Exists.

Where Assignment.worker_id \u003d worker.worker_id and

Result:

Worker_id

Exists and not exists are always placed before subquery. Exists takes the value "Truth", if the set generated by subquery is not empty. If a set generated by a subquerity is empty, then exists takes the value "Lie". The NOT EXISTS operator naturally works exactly on the contrary. It is true if the result of the subquering is empty, and false otherwise.

Exist Operator. Takes the value "Truth", if the resulting set is not empty.

NOT EXISTS Operator. Takes the value "Truth", if the resulting set is empty.

In this example, we used the NOT EXISTS operator. The subquence selects all such rows of the Assignment table in which the worker_id has the same meaning as in the line under consideration by the main request, and the BLDG_ID is 435. If this set is empty, then the worker's string considered by the main request is selected, since this means that this means that it means that it means that it means that it means that This employee does not work on the building 435.

In the solution given by us, using the correlated subquery. If we use instead of Not EXISTS operator in, we can do non-corrosioned subquery:

Select worker_id

Where Worker_ID NOT IN

(Select worker_id

WHERE BLDG_ID \u003d 435)

This solution is easier than the solution with the NOT EXISTS operator. There is a natural question, why should we generally need exists and not exists. The answer is that Not Exists is the only means of solving requests containing the word "each" in the condition. Such requests are solved in relational algebra using the division operation, and in relational calculus - with the help of a quantifier of universality. Let us give an example of a request, in the condition of which there is a word "each":

Inquiry: List employees assigned to each building.

This question can be implemented in SQL using double denial. We reformulate the request, including double negation in it:

Inquiry: List such workers for whom not There is a building on which they are not appointed.

We allocated double denial. It is clear that this request is logically equivalent to the previous one.

Now we want to formulate a solution on SQL. In order to simplify the understanding of the final decision, we will first give a decision of the preliminary problem: the tasks of identifying all buildings to which the hypothetical worker, "1234" not Assigned.

(I) SELECT BLDG_ID

Where Not Exists.

Assignment.worker_id \u003d 1234)

We marked this query (i), as we will refer to it later. If there is no building that meets this request, then the 1234 worker is assigned to each building and, therefore, satisfies the conditions of the source request. In order to obtain a source request solution, we must summarize the query (i) from a specific working 1234 to the WORKER_ID variable and turn this modified request to the subquery of a larger query. Let us give the decision:

(Ii) Select Worker_ID

Where Not Exists.

Where Assignment.Bldg_id \u003d building.bldg_id and

Assignment.worker_id \u003d worker.worker_id)

Result:

Worker ID.

Please note that the subquery, starting with the fourth query string (II), is identical to the query (i), in which "1234" is replaced by worker.worker_id. The query (II) can be read as follows:

Select Worker_ID from Worker if there is no building to which worker_id is not assigned.

This corresponds to the conditions of the source request.

We see that the NOT EXISTS operator can be used to formulate those requests, in the solution of which in relational algebra required the operation of division, and in relational calculus - the quantifier of universality. From the point of view of ease of use, the Not Exists operator does not give any special advantages, that is, in SQL queries, in which not exceeds are used twice, it is not easier to figure it out than in solutions of relational algebra with a division operation or solutions for relational calculus with a quantifier of universality. To create language structures that allow you to more naturally solve such requests, additional research will be required.

Built-in functions

Consider issues of this type:

What are the maximum and minimum hourly rates? What is the average number of days of employees on the building 435? What is the total number of days allocated for plastering on the building 312? How many different specialties?

To answer these questions, statistical functions are required that consider the set of table rows and issue one value. SQL has five such functions called the built-in features or set functions. These are SUM functions (sum), AVG (average), Count (number), maxims (maximum) and min (minimum).

Built-in function (set function). Statistical function that operates in many strings: SUM (sum), AVG (average), Count (number), maxims (maximum), min (minimum).

Inquiry: What are the maximum and minimum hourly rates?

SELECT MAX (HRLY_RATE), MIN (HRLY_RATE)

Result:17.40, 8.20

The functions of MAX and MIN operate with one table column. They choose the maximum or minimum value, respectively, from this column. The wording of our request does not contain WHERE commands. For most requests, this may not be as shown by our following example.

Inquiry:What is the average number of days of employees on the building 435?

SELECT AVG (Num_Days)

WHERE BLDG_ID \u003d 435

Result: 12.33

Inquiry:What is the total number of days allocated for plastering on the building 312?

SELECT SUM (Num_Days)

From Assignment, Worker

Where worker.worker_id \u003d Assignment.worker_id and

Skill_Type \u003d "Plastekers" and

Result: 27

The solution uses the connection of the ASSIGNMENT and Worker tables. This is necessary, as Skill_Type is located in the Worker Table, A BLDG_ID - in the Assignment table.

Inquiry: How many different specialties?

SELECT COUNT (Distinct Skill_Type)

Result: 4

Since the same specialty can be repeated in several different lines, in this request it is necessary to use the Distinct keyword (different) so that the system does not count the same type of specialty more than once. The DistinCT operator can be used with any of the built-in features, although, of course, with the functions of MAX and MIN it is redundant.

Distinct.. Operator eliminating repetitive lines.

The SUM and AVG functions should be used only with numeric columns. Other functions can be used with numeric, and symbolic data. All functions except Count can be used with calculated expressions. For example:

Inquiry: What is the average weekly salary?

SELECT AVG (40 * HRLY_RATE)

Result: 509.14

Count can refer to the whole string, and not on the semi-backer :

Inquiry: How many buildings have quality levels?

SELECT COUNT (*)

From Building Where.

Result: 3

As all these examples show if the SELECT command is a built-in function, then nothing else can stand in this SELECT command. The only exception to this rule is associated with the phrase of Group BY, which we now consider.

Phrases group by and having

Management often requires statistical information about each group in a variety of groups. For example, consider the following query:

Inquiry:For each manager, find out the maximum hourly rate among its subordinates.

In order to solve this task, we must divide employees into groups in accordance with their managers. Then we will define the maximum bet inside each group. In SQL, this is done in this way:

GROUP by supv_id

Result:

SUPV_IDMAX (HRLY RATE)

When processing this query, the system first breaks the WORKER table rows to the following rule. Rows are placed in one group if and only when they match supv_id. The SELECT phrase is then applied to each group. Since in this group only one supv_id value, then there is no supv_id uncertainty in the group. For each group, the SELECT phrase displays SUPV_ID, and also calculates and displays the MAX value (HRY_RATE). The result is presented above.

In the SELECT command with embedded functions, only those columns that are included in the group BY phrase can occur. Please note that the SUPV_ID can be used in the SELECT command, since it enters the group BY phrase.

Phrase group by.. This means that the lines must be broken into groups with the common values \u200b\u200bof the specified column (columns).

The GROUP BY phrase allows you to perform certain complex calculations. For example, we may need to find out the average value of these maximum rates. However, calculations with embedded functions are limited in the sense that the use of built-in functions within other built-in functions is not allowed. Thus, the expression of type

AVG (MAX (HRLY_RATE))

prohibited. The implementation of this query will consist of two stages. First, we must place the maximum bets in the new table, and at the second stage, calculate their average.

With the GROUP BY command, you can use the WHERE command:

Inquiry: For each type of buildings, find out the average level of quality among status buildings 1.

SELECT TYPE, AVG (QLTY_LEVEL)

WHERE STATUS \u003d 1

Result:

TypeAVG (QLTY_Level)

Shop 1.

Residential building 3.

The WHERE phrase is performed before the GROUP BY command. Thus, no group can contain a string in which the status is different from 1. Status strings 1 are grouped by the value of Type, and then the SELECT phrase is used to each group.

Phrase having. Imposes conditions for groups.

We can also apply the conditions to the groups created by the phrase group by. This is done with the phrase having. Suppose, for example, that we decided to specify one of the previous queries:

Inquiry: For each manager, who has more than one subordinate, find out the maximum hourly rate among his subordinates.

We can reflect this condition the corresponding team with the Having team:

SELECT SUPV_ID, MAX (HRLY_RATE)

From Worker Group by supv_id

Having Count (*)\u003e 1

Result:

SUPV_ID MAX (HRLY_RATE)

The difference between the phrases of WHERE and HAVING is that where Where is applied to rows, while having applies to groups.

The request may also contain the WHERE command, and the Having command. In this case, the first thing is performed by the WHERE phrase, since it is performed before breaking into groups. For example, consider the following modification of the previously presented request:

Inquiry: For each type of buildings, find out the average quality level among status buildings 1. To consider only those types of buildings whose maximum quality level does not exceed 3.

SELECT TYPE, AVG (QLTY_JLEVEL)

WHERE STATUS \u003d 1

Having Max (QLTY_Level)<= 3

Result:

Type AVG (QLTY_Level)

Shop 1.

Residential building 3.

Please note that starting from the phrase from the phrase is performed in order, and then the SELECT phrase is used. So, the WHERE phrase is used to the Building table, and all the lines in which Status is different from 1 is removed. The remaining lines are grouped by TYPE; All strings with the same value of Type are in the same group. Thus, a negligible groups are created, one for each Type value. Then, the Having phrase is used to each group, and those groups in which the maximum quality level value exceeds 3 is deleted. Finally, the SELECT phrase is applied to the remaining groups.

7. Built-in features and subqueries

Built-in functions can only be used in the SELECT phrase or in the Having team. However, the SELECT phrase containing the built-in function may be part of the subquer. Consider an example of such a subquerity:

Inquiry: Who from employees is the hourly rate above average?

Select Worker_Name.

Where Hrly_rate\u003e

(SELECT AVG (HRLY_RATE)

Result:

H. Columbus

Please note that the subquery does not correlate with the main request. The subquery produces exactly one value - the average hourly rate. The main request chooses an employee only if its rate is more calculated average.

Embedded functions can also be used in correlated requests:

Request: Who from the employees is the hourly rate above the average hourly rate among the subordinates of the same manager?

In this case, instead of calculating one average hourly rate for all workers, we must calculate the average rate of each group of workers, submitting to the same manager. Moreover, our calculation should be renewed for each employee discussed by the main request:

SELECT A. WORKER_NAME