![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|||||||||||||||||||||||
|
Image sizes: 256x256, 48x48, 32x32, 24x24, 20x20, 16x16 File formats: BMP, GIF, PNG, ICO ![]() ![]() ![]() ![]() Search strategyWe do know how the system identifies the key word. We use this knowledge to generate an optimal query. First of all, subject to certain initial assumptions. Suppose we have some text-source and want to find documents similar Internet content. Where will the text of the source? Since the task of searching could not arise from nothing, somewhere surely there must be information, generating interest in the problem. Maybe this is a journal article, book, website, etc. It is this information and the need to streamline and in a form suitable for analysis. If the problem exists only in your head, try to write a short essay outlining their vision problems - and it will be a text-source. If we were able to dissect the text of the source, just as it does a search engine on the idea, we could get results with the highest relevancy. Try. Take the text-source and analyze it. To automate the process you can look at an interactive page www.shipbottle.ru/ir/, which operates with a muddle built by the author of the applet, or use a little program MTAS (mtasprog.exe) (www.sas.upenn.edu/ ~ bkat / dwnld. htm). (Please read carefully: for treatment of the Russian text will have to write a small script file.) If the text of the source - the file to disk, point the way to it - she will calculate all required parameters. Otherwise, for example, when the text of the source - the page in the magazine, the analysis must be done manually.The sequence of actions is as follows: 1. We select the text source. The clearer description of the problem in the text of the source, the quality and accuracy will result. Hazy and confusing text-source fished from a search engine is as stupid documents. 2. Remove from the text of stop-words (they can be deleted). 3. Calculate the frequency of occurrence of each term. And do it without taking into account the morphology of words. Thus, the word ship, and ships are different terms. No need to take into account and register, all lower case letters are considered. 4. Write down on a separate sheet of terms in descending order of their frequency of occurrence (the first should go, those that occur more often). 5. Select a range of frequencies. It must lie somewhere in the middle. No need to take too much or, conversely, too rare terms. Range selection is subjective. You should focus on the specific meaning of the text. The need to manually select the range should not be confusing, because now you do not choose the terms of the text, and constructed by a certain law ordered list. 6. Of the selected range write out the terms. In the large text in the range can be quite a lot of words. All their use is unlikely. Enough to take 10-20 terms. They should be selected, guided, first of all, common sense. And do not be limited to typical terms, even if they seem to be most successful. The list of must fall and common words (it is better to choose from the middle of the range). 7. Query is compiled, having chosen the words in the order they appear in the list of terms. Request must be understood as a vehicle-related words Boolean OR. This is a very important requirement. To the result is not distorted, you should examine the particular query syntax specific search engine. 8. Sends a request to search engine. In response, you can get several million hits. But do not worry. If the search engine ranks the results (and this is another necessary condition), the first pages would be almost absolutely relevant documents. The remarkable thing about that document - the source of the request (if it exists in the Internet equivalent) does not necessarily lead the list. It may well be on the margins. ![]()
| Copyright © 2006-2022 Aha-Soft. All rights reserved. |
|