|
|
|
Finder Icon
Image sizes: 256x256, 48x48, 32x32, 24x24, 20x20, 16x16
File formats: BMP, GIF, PNG, ICO
Purchase entire Search Icon Library now!
View Search Icon Library now!
Download demo icons
Automatic analysis of texts
It turns out that all man-made texts are constructed by the same rules! No one can bypass them. Whatever language is used, no matter who wrote - classic or graphomaniac - the internal structure of the text remains unchanged. It is described by the laws of GK Zipf. He suggested that the natural human laziness (though this property of any living creature) leads to the fact that words with more letters in the text rarely short of words. Based on this postulate, Zipf brought two universal laws: The first law of Zipf's "rank - frequency"
Choose any word and count the number of times it occurs in the text. This quantity is called the frequency of occurrence of the word. Measure the frequency of each word of text. Some words will have the same frequency, that is included in the text of an equal number of times. Group the them, taking only one value from each group. Arrange the frequency as they decay and numbered. Serial number of frequencies is called the rank of frequency. Thus, the most common words will have a rank of 1, following them - 2, etc. Let’s pick random options and determine the probability to meet the floor, on which was chosen. Probability is equal to the frequency of occurrence of the word to the total number of words in the text.
Probability = Frequency of occurrence of words / number of words
Zipf found an interesting pattern. It turns out that if you multiply the probability of finding words in the text to the rank of frequency, the resulting value (P) is approximately constant!
C = (frequency of occurrence of the word x Rang frequency) / Number of words
If we transform the formula a bit, and then take a look in the handbook on mathematics, we see that this function of the type y = k / x and its graph - equilateral hyperbole. Consequently, according to Zipf's first law, if the most common word occurs in the text, for example, 100 times, then the next frequency word is unlikely to meet 99 times. The frequency of occurrence of the second most popular words, with high probability, will be at 50. (Of course, you should understand that the statistics do not entirely accurate: 50, 52 - not so important.)
The constant in different languages is different, but within a single language group remains unchanged, whatever text we take. For example, for English texts Zipf constant is approximately equal to 0,1. I wonder how it looks from the perspective of the laws of Zipf's Russian texts? They are not an exception. Analysis is stored in my computer files with Russian texts convinced that the law is perfect and there. For the Russian language Zipf coefficient given equal 0,06-0,07. Although these studies do not claim to comprehensiveness, universality of Zipf's law suggests that the data obtained is quite reliable.
|
Copyright © 2006-2022 Aha-Soft. All rights reserved.
|
|
Individual Ready Icons
You can buy individual icons to suit your needs. Each icon is $1 when purchased individually.
Icon Usage Agreement
Business Icon Set is a pack of fine-looking stock icons for use in software and on Web projects. They are provided in a variety of formats, sizes and color schemes.
SibCode Medical Icon Library is a fascinating set of icons that covers most of the typical medicine-related application's needs. The top choice at an affordable price!
Large Weather Icons represent different weather conditions typical in software, weather Web sites, and mobile gadgets. The icons come in typical sizes standard to Windows, Mac OS, and other systems. The biggest versions are huge 512x512 pixel icons with vector sources in Adobe Illustrator (.ai) format easily available and ready to be printed or used in presentations in highest resolution.
Releasing a software for system administrators, making a network utility or developing a Web-based log file analyzer? Enhance any software with a set of Standard Admin Icons by Aha-Soft!
Database Icon Set brings you a variety of bright and colorful icons for database software development. The images are available in all the typical icon sizes and file formats include GIF, ICO, PNG and BMP.
|
|
|
|