Men have search php sf. Safe and convenient search in MySQL
You have a JavaScript blocked in your browser. Allow JavaScript to work the site!
Search by site taking into account the morphology of the Russian language on PHP + site map
By creating your site with time you think about the need to make it a convenient universal search on it. There is a simple solution: fasten search from search engines, for example: search from Yandex or search from Google. The general lack of such a solution - only those pages that the search engine asked to add to the search index will be the search engine. In other words, part of your site will not be "sort to".
Sad, looking for another solution. Yes, here it is: Yandex.Server is a product to search for your site, taking into account the morphology of the Russian language. Loading. In the Unix Yandex. Server works as a daemon, and on the MS Windows platform - as a service. Those. It can only work when root - access to the server. When working the site on virtual hosting is not suitable. :-( The second drawback is no settings. Only one button "Run / stop".
We begin to "dig" the Internet. How so? Everyone has a search on your website. Somehow, people do it. For example, the frontal decision, which has long been described by me: Contextual search on the site, which does not take into account the declination of words and does not index the words on the pages. But the further deeper into the task, the more you realize that the task is completely nondrivial.
First, you need to scan the entire site and choose all words from it. Well, I already know how - the site map generator has long been successfully coping with this task.
Secondly, you need to get the original form of all words. There are several options here, for example, you can use streamerwhich cuts off the prefix, suffix and the end of the words. Or a more complex system using dictionaries.
Thirdly, it all needs to be downloaded to the database and index to search for a minimum of time.
Having spent two weeks of its time, having tried a large number of different options and algorithms I stopped at the following: 1. For scanning, I use a simplified parser, which, with the help of a regular expression, cuts all href from the page:
// Get unique links from $ HTML \u003d File_Get_contents ($ URL); if (preg_match ("|
Now you need to separate external links from the internal and recursively contact the parser with the address of the internal link. Here it starts "rake" ... Internal links can be listed as external with http: // domain / address, they can be relative to the current page, they can be relative to the base tag. Next, it is necessary to check whether the indexation of this page is prohibited in Robots.txt and whether this page has not been scanned. You can use the Robots.txt analysis example and a search example by SQL
we remove all short words, we transform all words to one register and highlight the basis (root) of the word. To highlight the root of the word, it is best to take advantage of PHP - class, which allows you to highlight the roots of words, taking into account the morphology of Russian, English, Ukrainian, Estonian or German. Dictionaries for each language occupy 10-15 MB. It does not need to install additional software on the server, everything will work on the most common hosting. The disadvantage is low the highlight of the root. The library is connected as follows:
The phpMorphy object has three parameters: The first is the path of the dictionaries folder; The second is the code page "RU_RU" - Russian in UTF-8, "RUS" - Russian in Windows-1251; Third - options.
The options use an important storage parameter, it can take one of three values:
Phpmorphy_storage_file (do not upload dictionaries in the whole memory, it is the slowest option, but the most economical in terms of working with server resources),
Phpmorphy_storage_shm (download the dictionary file entirely in Shared - memory, the PHP Shmop extension is required) or
Phpmorphy_storage_mem (also download the file to the whole memory if the SHMOP is not used, the speed of operation does not differ from the previous one).
On a virtual hosting, most likely, you will have to use the first option, and on a dedicated server for greater speed it is better to use options using memory. Select an option for your tasks if frequent appeals are planned to the module, it is better to use the option with shared memory.
An example of the PHPMorphy library is.
3. Now you need to make the database tables in which we will store all scanning and parsing results:
// List of site pages in the form of links, header and announcement // (the first 300 page characters for output in search results). CREATE TABLE IF NOT EXISTS PAGE (id` IF unsigned not null primary key auto_increment, `url` Varchar (255) not null default" "Unique,` title` Varchar (128) Not Null Default "", `Description` Text Not NULL DEFAULT "") // All words of the site. // Word - what remained after the stemmer (what we called the "root") // Sound - the result of the Soundex function for this word. Create Table If Not Exists Word (IF Unsigned Not Null Primary Key AUTO_INCREMENT, `Word` Varchar (30) Not Null,` Sound` CHAR (4) Not Null Default "A000") Create Index iDx_Word_Word ON ". $ Search -\u003e Word. " (Word (8)) CREATE INDEX IDX_WORD_SOUND ON. "$ Search-\u003e Word." (Sound (4)) // Each line is the word "Word", having met on the page "Page" "CNT" - Create Table If Not Exists Index (`Page` INT Unsigned Not Null,` Word` int unsigned not null , `CNT` Smallint unsigned not , unique (Page, Word))
Now you need to make the form of a search expression request. The simplest form of a search query looks like this:
its code like this:
Download site search script
I have a very large amount of time to create this example, so I want to convert it into money. If you want to repeat my feat - good luck. If you appreciate your time, I am pleased to exchange time for money. In total, 2900 rubles (~ 46 $) you will get a complete open, detailed searched search script with the site map generator.
Archive contents:
phpMorphy / - Library for removing the root of words
stemmer / - highlighting the basis of the word fast algorithm
config.php - settings to work with the database and the general functions that you may already use and replace them with your
index.php - search form + search results
install.php - creating MysQL database tables for search
link_bar.php - page navigation
search.php - class for working with search. Contains methods:
sound_ex ($ String) - Russian Soundex to get the word sound
update ($ URL, $ scan \u003d 0) - recursively scan all site pages, highlight the title of the page, body, description.
ParsingWord ($ URL, $ Words) - word analysis and adding them to the search base
GetWords ($ Words) - in the transmitted array replaces all words on their roots
uRL_SHORT ($ URL, $ Base \u003d "", $ EXT \u003d 0) - Place the link, separation of external ones
iS_ROBOTS ($ URL) - Checking the presence of the link in robots.txt
Readurl ($ Site) - reading site page with CURL, processing, - forwarding
sitemap.php - Building Sitemap.xml site map
spider_http.php - site scanner based on reading and palacing
spider_sitemap.php - site scanner based on sitemap.xml parsing
Installation Instructions:
Unpack the contents of the search.zip archive to the Search folder. Allow recording to it from scripts-install rights 777.
Edit the installation of the database in the config.php file, start install.php - the necessary databases will be created.
Run "/Search/spider_http.php" scanner fills the base tables: Table of all pages of the site, Title, Keywords and Description are in it. Table of words, in her the roots of words found on the pages It is possible to form a base based on an existing site map, for this use "/Search/spider_sitemap.php"
Place a search query on the pages. Search query form:
Edit the output format on the search.php page
start include_ONCE "Updater.php"; Update ($ URL); When adding, changing or deleting each page $ URL is a page you want to update. Conveniently call after saving page changes If the page returns 404 error or it is empty - it will be removed from the database.
run "/search/sitemap.php" to create a site map sitemap.xml Do not forget to register the path to the site map in Robots.txt: Sitemap: /Search/SItemap.xml
Site search script capabilities
Scanning all the pages of the site, taking into account the ban in robots.txt and
Selling text of pages with word selection, counting statistics words
Selection on the page Title, Keywords, Description
Highlighting the roots of words, taking into account the morphology of the Russian language and libraries
Allocation of the foundations of words by a rapid algorithm (not recommended, bombed in the text of the script)
Check of Russian spell when scanning, based on the absence of the word in the dictionary
Four messages mode: 0-work Silently, 1-issuing only site errors, 2-issuing errors and minimum of information, 3-detailed informing when working
Search for consonance words. Russian SOUNDEX.
Sort search results by relevance. First of all, pages are shown on which there are all search words in the maximum amount.
Train output of the results found
The script is detailed in Russian in Russian
The script code is implemented on PHP + MySQL, fully open and does not use any additional libraries. Everything you need comes in the kit.
Your site generator on the basis of a base created by the scanner
That the script cannot:
not taken into account , REL \u003d Nofollow
do not be removed from the search for general texts present on all pages.
Agreement on use:
You can use the received code in any of your developments, you do not have to specify a reference to the source.
You do not have the right to resell it, place in free or limited access, as well as publish in any form.
All other rights are saved by the author.
You can refer to the author with questions, comments, wishes. Contacts .
Be careful! For 2900 rubles (~ 46 $) you can choose one of the two script options that differ significantly from each other. The search script for the site in the UTF-8 encoding uses functions of working with double-byte symbols MB_ *, takes out the pages with regular expressions made for the UTF-8 encoding (Unicod / Unicode), creates the database tables in UTF-8. The search script for the site in the Windows-1251 encoding uses functions for work only with single-tie encodings STR *, disassembled pages with regular expressions made for single-path encodings.
You can enter or register ! Or without registration When you press the download button, you confirm acceptance with the terms of use of the script described on this page. From your balance will be debited the amount of 2900 rubles (~ 46 $) and the file is loaded.
Brief reference for search implementation: Row processing, cutting of service characters, drawing up a request to the database, logic, pavement, relevance.
Part 1: General Vedomosti
Row processing
First of all, you need to cut the string with handles.
$ search \u003d substr ($ search, 0, 64);
64 characters to the user will be enough to search. Now I will scream all the "abnormal" symbols.
In theory, it is impossible to give the user the opportunity to look for too short words - among other things, it loads the server strongly. So, allowing to look only according to words that are longer than two letters (if the restriction is more, it is necessary to replace "(1.2)" on "(1, number of characters)").
$ good \u003d trim (preg_replace ("/ s (\\ s (1,2)) \\ s /", "", ereg_replace ("+", "", "$ search")));
And after replacing bad words - it is necessary to squeeze double spaces (they were made specifically for the correct search for short words).
$ good \u003d ereg_eplace ("+", "", $ good);
Suppose we want to provide the user with the ability to choose the search logic - look for all words or only one of several. If you want to do as in Yandex - two ampersant means "and" (word1 && word2 && word3) or somehow, then I am not a adviser. Shamanism with rows on the small IMHO website does not justify the time spent. Therefore, I draw a search form:
And in the search script once again check that the user introduced:
It is not bad to immediately inform the user as he found the rows of the table. For this, an additional request is made to the database:
$ Query \u003d "Select ID from Table Where Field Like"%. Str_replace ("", "%" OR Field Like "%, $ good)."% "";
For statistics on certain words, you can do the following:
$ WORD \u003d Explode ("", $ search); While (List ($ k, $ V) \u003d Each ($ Word)) (IF (Strlen ($ V)\u003e 2) $ STAT \u003d "$ V:". mysql_num_rows (mysql_query ("SELECT ID From Table Where Field Like" % $ V% "")); Else $ Stat \u003d "$ V: a short";); $ wrd_stats \u003d" Word statistics: ". implode (" ", $ STAT)." "; Unset ($ Stat);
Train output results
Well, when we have a layout for searching and the number of search results lines, make a car search - a pair of trifles. Check the variable $ Page (no less than 0, not more than $ results_amount / $ rows_in_page). In a query that counts the number of lines (see above), we write fields you need and sorting fields. And then add
(Syntax: Limit<кол-во строк> Either Limit.<кол-во строк отступа>, <кол-во строк>)
As a result of the execution of such a request, we will receive exactly the very rows that you want to display on the page. You can either draw links to the next and previous page, or, more complicated, to make the navigation panel to several pages.
If ($ Page\u003e 0) Print (" previous page"); if ($ page<$results_amount/$rows_in_page)
print ("next page");
Backlight
To highlight the words in the text with light or bold font, you need to do only the following:
$ highlight \u003d STR_REPLACE ("", "|", $ good);
Spaces (and they have a single word in our words, and nowhere else does not meet anywhere, besides, we also cut them out from the ends), it is enough to replace the vertical trait - the separator of options in regular expressions. "Bad" words we do not highlight, because in the database they are not looking for :). In the code that displays the text we write:
After writing the issue, I rushed, it was, writing and myself "backlight". It was not here! I have HTML tags in my text, so I had to think a lot ... It turned out such a thing (a string with words for highlighting is):
$ text \u003d eregi_replace ("\u003e ([^<]*)($words)", ">\\1\\2", $ text);
You have to watch, there is no word in the tag. However, there is a problem of the resource-intensity of such a replacement (my K6-266 over the text in 5 kilobytes thought as much as seven seconds). Sad.
Applying such techniques, it is possible, firstly, to limit the freedom of the user's action and not give it a) to learn the software structure of the site b) to cause server overload (for example, sending a megabyte of the text consisting of words in three letters long (the phrase turned out to be ambiguous, but rewrite I will not :) so that the script 250 times climb into the database) c) see an error message as a result of hitting in the line of special query language. Secondly, some convenience for the user is a tractor and illumination.
I remember the "Safe and Convenient Search" article there was such a phrase
Part 2. Briefly about relevance
Oleg Yusov
To display the search results for relevance, it is necessary:
The required VARCHAR fields, or any of the TEXT field varieties (Smalltext, MediumText, etc.) make FullText keys:
ALTER TABLE TABLE ADD FULLTEXT (FIELD)
Further - even easier:
$ Query \u003d "Select *, Matchword Against (" $ SearchWords ") AS RELEV from Table Order By Relev Desc"
Notes:
By default, the search for words containing at least 4 characters. It is necessary to install #define min_word_len 4 in the source ft_static.c, although in my opinion it is not necessary to edit it.
The% symbols in the search form are not available, the words in the search field are packed using the partition list.
The list of word separators will rule in the source ft_static.c.
It is necessary to at least dozen entries in the table to start calculating the relevance.
It is impossible to use the RELEV field in Where Clause:
SELECT *, MATCH FIELD AGAINST ("$ SearchWords") AS RELEV From Table WHERE RELEV\u003e 0 Order by RELV DESC
although you can:
Select *, Match Field Against ("$ SearchWords") AS RELEV From Table Where Match Field Against ("$ SearchWords")\u003e 0 Order by RELEV DESC
The speed is high enough - even in some cases faster Like search
All of the above works starting with the version MYSQL 3.23.23
When creating FullText indexes, 2 options are possible for several fields:
Select *, Match Field1, Field2 Against ("$ SearchWords") AS RELEV From Table Order by RELV DESC
relevance is calculated from all fields immediately. In the second case, such a request will give an error. Here we calculate the relevance as follows:
Select *, Match Field1 Against ("$ SearchWords") + Match Field2 Against ("$ SearchWords") AS RELEV From Table Order by RELEV DESC
The second option is somewhat more complicated in queries, however, in my opinion it is better, because Increases search flexibility - to each of the fields can be set, for example, the ratio of importance and when the fields are summarized by fields to multiply them to this coefficient. Search phrase will be "more" to be in the fields with a large coefficient. For example, if we search for indexed resource directory pages, the page name field is usually specified with a large coefficient than the fields of meta tag descriptions or keywords.
Part 3: Exercises with Relevance
First how to add a fulltext index:
MySQL\u003e ALTER TABLE ARTICLEA ADD FULLTEXT (ZTEXT); Error 1073: BLOB COLUMN "ZTEXT" CAN "T BE Used in Key Specification with the USED Table Type MySQL\u003e Alter Table Articlea Type \u003d Myisam; Query OK, 36 Rows Affectedd (0.60 SEC) Records: 36 Duplicates: 0 Warnings: 0 MySQL \u003e ALTER TABLE ARTICLEA ADD FULLTEXT (ZTEXT); Query OK, 36 Rows Affectedd (10.00 SEC) Records: 36 Duplicates: 0 Warnings: 0
Text indexes can only be done in MYISAM Types. Texts are taken from the table and drop into the index file, and the base is growing. Regarding requests. It is impossible to use the RELEV field in Where Clause:
SELECT *, MATCH FIELD AGAINST ("$ SearchWords") AS RELEV From Table WHERE RELEV\u003e 0 Order by RELV DESC
Although you can:
Select *, Match Field Against ("$ SearchWords") AS RELEV From Table Where Match Field Against ("$ SearchWords")\u003e 0 Order by RELEV DESC
The calculated field, of course, cannot be used in WHERE on all the syntax rules, but can be used in Having:
SELECT *, MATCH FIELD AGAINST ("$ SearchWords") AS RELEV From Table Having Release\u003e 0 Order by RELV DESC
Search through Match, as Oleg wrote, is done only by the word. ... However, on relevance, you can only sort, and choose Like (this, of course, will affect performance, I don't even know how much).
Remove the condition "RELEV\u003e 0", we leave sorting. The rest, as before, rub the resulting string and turn into a request with several LIKE operators:
Select *, Match Field Against ("$ SearchWords") AS RELEV From Table WHERE Field Like "% $ Word1%" OR Field Like "% $ Word2%" Order by RELEV Desc, Datefield Desc
Part 4: Propening Started
I continue to search the topic with sorting on the relevance in the MYSQL database.
MySQL offers in the latest versions of the database to use FullText indexing and Match Field Against design. However, not all servers are the latest version of MySQL, and not all hosting providers want to update the software for the reasons of the reliability of the system.
At one time I assumed that the search for sorting on relevance would need to do in several requests, and, therefore, it is better not to take it at all. The thoughts that the relevance can be counted at the very query divergently visited me, but I was afraid and submit such a design.
However, the employee of one of the site-building firms N-SKA spacked me the search system that they apply on their sites. I definitely remember the request, I will try to reproduce it so:
SELECT TITLE, DATE_FORMAT (MATERIAL_DATE, "% E.% C.% Y") AS DATE1, IF (Text Like "% Word1 Word2 Word3%", 3 * 10, 0) + if (Text Like "% Word1%" 9, 0) + if (Text Like "% Word2%", 9, 0) + if (text like "% Word3%", 9, 0) AS RELEVANCE FROM Table WHERE TEXT LIKE "% WORD1%" OR TEXT LIKE " % Word2% "Or Text Like"% Word3% "Order by Relveance Desc, Material_Date Desc
It looks terribly, but it works even on the old versions of MySQL. I tried to compare the speed of work with this request:
Select Title, Date_Format (Material_Date, "% E.% C.% Y") AS DATE1, MATCH TEXT AGAINST ("Word1 Word2 Word3") AS RELEVANCE from Table WHERE Text Like "% Word1%" Or Text Like "% Word2% "Or Text Like"% Word3% "Order by Relveance Desc, Material_Date Desc
On average, the speed of the universal request is two times less than that using new designs. What is quite logical - the more versatility, the greater the resource intensity.
Let's try to build such a request automatically. Sut off the long string, as well as all the wrong characters and short words. Draw a request.
$ Query \u003d "SELECT TITLE, DATE_FORMAT (MATERIAL_DATE,"% E.% C.% Y ") AS DATE1, IF (Text Like"%. $ good_words. "%", ". (Substr_Count ($ good_words," " ) + 1). "* 10, 0) + if (Text Like"%. Str_replace ("", "%", 9, 0) + if (Text Like "%", $ good_words). "%", 9, 0) AS RELEVANCE From Table Where Text Like "%". Str_replace ("", "%" or Text Like "%", $ good_words). "%" ORDER by RELEVANCE DESC, MATERIAL_DATE DESC ";
Not very difficult. For reliability and protection against floods, you can limit the number of words in the query.
Some additions to previous publications
Total number of lines found in the table. To display the search results, of course, you need to use the Limit statement (so as not to write every time the formation of this parameter, use ready-made functions). If no grouping operations are done in the query, it is better to calculate the number of rows immediately in the query - Count (*), and not through the PHP MYSQL_NUM_ROWS () function. You can check on large tables. If group operations are produced, we make a request from Count (Distinct (<поле, по которому группируем>)), but without GROUP BY.
Illumination. If there are no HTML tags in the texts, it's easier to live
If the tags are used in the text, that is, three options a) not to draw backlight b) because the user does not see the user (except that a very curious user), then you can make the field of the index in which there will be no tags and the characters [^ \\ W \\ x7f- \\ XFF \\ s] will be replaced by spaces (these characters are cut from the search string at the very beginning, so that the search is not performed). Search and backlight in this case, to make exactly the index. c) make the backlight of the text from the usual field, having previously cutting the tags with the SRIP_TAGS () function.
The full version of the search code, as always, in the list of files.
There are still questions or something incomprehensible - welcome to our
11.1K.
One of the most popular and necessary functions on any site is a search implemented using a special form. This functionality allows visitors to quickly find their content on the site.
Today we want to tell how to make a site search using a special form that will interview the database tables and display information about current managers on the site. You will learn how to create database tables that will contain information about current personnel.
Develop search forms using PHP, and also get acquainted with SQL ( Structured Query Language.) - Special language for collecting, recordings and modifying information contained in databases. Before you start, we recommend that you download project files.
What you need
Tool to work with MySQL databases.
Local or remote server with PHP support.
Text editor.
Create a database
If you are not quite sure that you can deal with the database on your hosting, contact the hoster to obtain relevant instructions or assistance. After the database is created, you will need to connect it, create a table and write the necessary data into it.
The most popular tool for managing MySQL is PHP My Admin, this tool will be enough for our current manual.
Creating a table
Our table must be created in the following format:
Column Name.
Data Type
Length.
NULL OR NOT NULL
PRIMARY KEY?
AUTO INCREMENT.
Id
Int.
1
NOT NULL
Yes
Yes
FirstName.
Varchar.
50
NOT NULL
No.
No.
LastName.
Varchar.
50
NOT NULL
No.
No.
Email
Varchar.
50
NOT NULL
No.
No.
PhoneNumber.
Varchar.
15
NOT NULL
No.
No.
The database table consists of columns and strings, as in Excel. The first column allows you to identify data by name. Next comes the Data Types column (data type), which indicates the type of data contained in the column. The Length field indicates the maximum amount of memory (storage) for the table column. We use variables that give more flexibility. In other words, if the length of the FULL NAME is less than 50 characters, then only part of the reserved place will be occupied.
And among these personnel can not be empty values \u200b\u200b( nULL, Empty). The first line is highlighted in yellow, because the ID column is our main key. The main key in the database ensures that each entry is unique. This column also applies autocrem, which means that each entry in our database will be assigned a unique number automatically.
We introduce staff representatives to the table
As soon as you figure it out with the table, start filling it with data. 6 records are enough to fix the procedure in mind. Below I offer you your own example:
Column ID.
FirstName.
LastName.
Email
PhoneNumber.
2
Ryan.
Butler.
[Email Protected]
417-854-8547
3
Brent.
Callahan
[Email Protected]
417-854-6587
Development of form
To create a search form from Google, open any suitable text editor. I recommend using free PSPAD. You can use any text editor where the syntax backlight is provided. This greatly facilitates the process of writing and debugging PHP code. By creating a page for a search form, do not forget to save it in the .php format, otherwise the PHP code will not be processed properly. As soon as you save the document, copy the following markup to it: