Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 103 (2017) 244 – 247
An algorithm for constructing feature relations between the classes in the training set Khamroev Alisher Senior researcher, Centre for the development of software and hardware-program complexes at the TUIT, Tashkent, Uzbekistan, E-mail:
[email protected] Abstract. In this paper, we propose a new approach for determining the of feature relations between classes to solving a problem of classification on based (precedents of) training set. The main idea of this approach is the selection of best features for object classification in the condition of increasing the number of classes and features in the training set. For this, pay attention to two criteria: increase the recognition quality and reduce the number of calculations by removing non-informative features.
1. Introduction Today, the approach from the perspective of professionals’ experience is very effective to solve the problems of applied sphere with the help of methods and algorithms of pattern recognition. Usually, specialists try to solve problems heuristically, such as intellectual analysis of information, classification of objects, based on long-term observations and experience. By putting into practice sphere specialist’s knowledge, experience and intellectual capacity into pattern recognition methods, such problems, which are difficult and time-consuming, may be solved by the use of computer in high speed and short period of time. One of the important problems of pattern recognition is – the reduction of the feature space. To solve this problem, relation building approach, which separates a feature or collection of features classifying the objects of two classes is offered. The several appearances of the relations between classes can be brought: x the dependence relation between centre of the classes (distance, proximity and the levels of similarity); x analyzing the weight of the classes; x level of validity of the classes on the compact hypothesis; x intersecting and nonintersecting level of classes; x establish feature relations among each pair of the classes. With the last relation there was not worked enough. To determine this relation, in the comparison of the two classes of training set, emission of informative feature collections is done on the basis of experience of specialists of this sphere or algorithmic provision. To solve the classification problem, it is required to follow methods of solving problems of applied sphere. Many classification methods focus on the analysis relatively to the closeness of classified object to all objects in classes or prototype of class. In this case, all objects of training set are compared by all selected features. When the amount of features and classes is big, it is possible to determine (set) the relations between classes. Feature relations between classes are set for the training set and directed to the reduction of spending of time of classification of controlled object and other expenditure. 1877-0509 © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientific committee of the XIIth International Symposium “Intelligent Systems” doi:10.1016/j.procs.2017.01.094
Khamroev Alisher / Procedia Computer Science 103 (2017) 244 – 247
In the classification of the controlling object, before it is considered as class representative (prototype of a class) and to analyse it for the relevance or not relevance to other class only relations between these classes are used. In identifying the images, this principle is called: “I am on your side, it’s right?”. Several examples can be mentioned to this problem. For example, in medicine, ܭଵ diagnosis put by the doctor to the patient ܵ, cannot justify itself, i.e. other symptoms ܭ௨ of disease in the patient can be observed. In differentiating diseases from each other in such case, important characters between them serve to clarify this problem [1]. In botany, classification problem of plants by morphological features of types through one category is also important to determine the feature relations between types (classes). Type in one category have close features to each other. Mostly, separation the few of important features is enough in classification. For example, plant scientist can approximately say that plant is on ܭ type. But, there is a possibility that it may be ܭ type. That is why, new type of plant is considered as ܭ and comparison and checking them by feature relations between types ܭ and ܭ is done. This problem firstly tested in the popular database of IRIS [1]. Based on the experience, knowledge and results, which are gained from the solution process of the problem of classification of the species of genus “Tulipa L.” category, the presence of feature relations is determined and proved. Feature relations between classes are analyzed for overlapping and not intersecting classes. This work focuses on the analysis of feature relations only between overlapping classes. If classes intersect each other, then their closeness to the object has to be taken into account. This will be continued in the next works of the solution of problem. By teaching process of training set, feature relations between classes are determined and fixed. 2. The statement of the problem In real process, let ܵ head collection is given, which is determined by the study of sphere’s objects of subject. ܵመ training set is separated from it. There are݉ objects in the training set and let write them in the from ܵመ ൌ ሼܵଵ ǡ ܵଶ ǡ ǥ ǡ ܵ ሽ. Each ܵ objects are expressed in ܰ dimensional ݔ ൫ܵ ൯ ܺ א , ܺ ܺ א, (݅ ൌ തതതതത തതതതതത ݉) characteristics (features). If ݅- feature of ܵ object is marked in the form ݔ ൫ܵ ൯ ൌ ܽ , ͳǡ ܰ, ݆ ൌ ͳǡ then it can be written in ܵ ൌ ൫ܽଵ ǡ ܽଶ ǡ ǥ ǡ ܽ ൯ form. Training set objects ܵመ are divided into nonintersecting ݈ ܭଵ ǡ ܭଶ ǡ ǥ ǡ ܭ classes. In order to determine the feature relation between classes, setting parameters of identification algorithm A and getting to the optimal solution are carried out based on the given training set. This process is analyzed by training process of the benchmark method of identification of standard objects, which require numerous calculations on the basis of precedents [4]. In the work, the algorithms for Calculating Estimates (ACE), which are based on the principle of partial precedent, are used [5]. Analysis of the best А* algorithm is required from the ACE algorithms. Algorithm is required to determine. As a result, it is determined that, objects ܵ relevant to the ܭ௨ classes [6]: כܣ൫ܵ ǡ ܭ௨ ൯ ൌ ߙ௨ ሺܵ ሻ, ߙ௨ ൫ܵ ൯ אሼͲǡ ͳǡ οሽ. Here ߙ௨ ሺܵ ሻ the value of the expression when it equals to 0, ܵ object that does not belong to class ܭ௨ , chargeability of being equal to 1, equation ο means that means the cancellation of classification. Considering all above mentioned, algorithm А* need to be built that, comparative relationܴሺܭ௨ ǡ ܭ ሻ is set to all couples between classes. Here ݑǡ ܿ ൌ ͳǡʹǡ ǥ ݈, ܿ ് ݑ, ݈ ʹ. Hence to compare class objects ܭ௨ and ˔ܭ, it is required to solve the problem of analysis of informative features. 3. The solution of the problem When l is equal to the number of classes, the number of pairs of each class is equal to ݒൌ ܥଶ . If the relation between each class pair is in the form of ܴ௨ , then they can be expressed in the following combinations: For 1 - class: ܴଵǡଶ Ǣܴଵǡଷ Ǣ ǥ Ǣܴଵǡ Ǣ
245
246
Khamroev Alisher / Procedia Computer Science 103 (2017) 244 – 247
For 2 - class: ܴଶǡଷ Ǣ ܴଶǡସ Ǣ ǥ Ǣܴଶǡ Ǣ …………… For ݈ െ ʹ - class: ܴିଶǡିଵ Ǣ ܴିଶǡ Ǣ For ݈ െ ͳ - class: ܴିଵǡ . Then, it is ܴ௨ ൌ ܴ௨ . If requirements ܭ௨ ܭ ת ൌ , ܭ௨ ǡ ܭ ് , ݊ ͳ, ݈ ʹ are satisfied, then following approval is appropriate. Approval 1. The number of relations between classes ܭ௨ and ܭ isͳ ݒ ܥଶ . Proof: If requirements of 1-approval are done, ݊ ൌ ͳ, relations between all classes are ܴ௨ ൌ ሼܺଵ ሽ, i.e. ȁܴ௨ ȁ ൌ ͳ. If ݊ ݈, ܴ௨ ് ܴ௧ (݆ ് ݑǡ ܿ ് )ݐthen all possible relations can be built: ݒ՜ ܥଶ . Approval 2. ܴሺܭ௨ ǡ ܭ ሻ ് . Proof: In the theory of identification of images ܭ௨ ܭ ת ൌ , ܭ௨ ǡ ܭ ് is carried out for the 1approval, to the performance of all requirements at least one feature is needed. From here ܴሺܭ௨ ǡ ܭ ሻ ് came out. Approval 3. If ܺ אሼܺሽ feature participates in relations ܴ௨ at least once, then ܴଵǡଶ ܴ ଵǡଷ ǥ ܴିଵǡ ൌ ݊. ͳǡ ݊, and also participation of Proof: Elements of the relation ܴ௨ should consist of features ܺ , ݅ ൌ തതതതത ܺ feature in relations ܴ௨ at least once, according to the collection, if one element is selected from every of them, then total amount will be equal to ݊. Relation ܴ௨ is determined by separating the “informative”, “reserve” and “non-informative” features from ܺ feature collection in training set ܵመ. “Informative” features are used for the pair ܭ௨ and ˔ܭin the comparison of their objects, “reserve” features are used in necessary cases, the usage of “noninformative” features are not recommended. If there are such class pairs in training set, if objects in these classes can be differentiated by their one or more informative or reserve features, then following theorem is appropriate. Theorem 1. If ܵመ ܭ א௨ , ܭ ് , then for ܵመ ܭ ב participation of at least one ܺ כinformative or reserve feature is must and enough. Proof. If ܵመ ܭ א௨ and ܭ௨ ܭ ת ൌ in training set ܵመ, then there is ܺ feature (or collection of features), which provides the object ܵመ to not be relevant to class ܭ and this feature (or collection of features) is informative or reserve feature, which participate in the feature relations according to 1- and 2- lemma of ܭ௨ and ܭ classes. In relations between classes, titles for features: “informative”, “reserve” and “non-informative” are considered as linguistic terms, it can be worked with the help of theory of unobvious collections. For relation ܴሺܭ௨ ǡ ܭ ሻ vector ݓ௧ ൌ ൫݃ଵ ǡ ݃ଶ ǡ ǥ ǡ ݃ ൯ is made and with this vector for relationܴ௨ informative, reserve or non-informative condition of available features are expressed. If ͳ ݃ ݄ଵ , ݅ െfeature is informative, if ݄ଵ ݃ ݄ଶ ,݅ െ feature is reserve, if ݄ଶ ݃ Ͳ, ݅ െfeature is not informative. Here ݄ଵ , ݄ଶ (݄ଵ ݄ଶ ) are boundary values. But according to training set, determining them by boundary values will be in accordance with purpose. IV. Experimental Studies One of the reasons to develop this approach is – the problem of identification by the morphological features of types of genus Tulipa L. in botany. Below is focused on the development of identification system which stands for replacement of detector keys, which is used for many years in botany. Classification of plant types by taxonomic rang units is difficult problem. In the analysis of plants not only their samples in natural conditions, also their herbaria are used. Deterioration of morphologic features can be observed in herbaria. It makes identification problems of them [2]. This developed approach is carried out in the analysis of given training set by genus Tulipa L. There 4 classes and their features consistin ech from 20 objects with 16 features are classified.
247
Khamroev Alisher / Procedia Computer Science 103 (2017) 244 – 247
In the table below, result is given, to determine the relations between all class pairs objects ܭ௨ and ͳǡͶ) by each feature. Given results in table 1 are received based on the algorithms of partial ݑ( ˔ܭǡ ܿ ൌ തതതത precedent in PRASK-2 [4] complex. Table 1. Determining the relations between classes of Tulipa Feature №
ࢿ
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2.5 0 1 1 0 1 2 1 0 0.2 1 1 1 1 1 1
ࡾሺࡷǡ ࡷሻ
ࡾሺࡷǡ ࡷሻ
ࡾሺࡷǡ ࡷሻ
ࡾሺࡷǡ ࡷሻ
ࡾሺࡷǡ ࡷሻ
ࡾሺࡷǡ ࡷሻ
K1
K2
K1
K3
K1
K4
K2
K3
K2
K4
K3
K4
LIF (%)
0,95 0 1 0,9 0 0 1 1 0 0,7 0 1 1 1 1 0,95
1 0 0,95 0,95 0 0 1 0,95 0 0,1 0 1 1 1 1 0,9
0,95 0 0 0,9 0 0 1 1 0 0,7 0 1 1 0,9 1 0,8
0,6 0 0 0,5 0 0 1 1 0 0,6 0 1 1 1 1 0,95
0,9 0 1 0,85 0 1 1 1 0 0,7 1 1 1 0,9 1 0,7
0,15 0 1 0,85 0 0,1 1 1 0 0,9 1 1 1 1 1 0,9
0,85 0 0,95 0,95 0 0 0,55 0,95 0 0,4 0 0 0 1 0 0,9
0,4 0 1 0,5 0 0 1 1 0 0,6 0 0 0 1 0 1
1 0 0,95 0,95 0 1 1 0,95 0 0,7 1 1 0 1 0 0,9
0,9 0 1 1 0 0,1 1 1 0 0,9 1 1 0 1 0 1
0,6 0 1 1 0 1 1 1 0 0,5 1 1 0 1 0 0,65
0,8 0 1 1 0 0,1 1 1 0 0,9 1 1 0 0,35 0 0,45
75,83 0 82,1 86,25 0 27,5 96,25 98,75 0 64,2 50 83,3 50 92,9 50 84,2
Here LIF – level of informativity of feature, ߝ – bound for the closeness of features From table 1 feature relations are determined as following (for example ܴଵଶ ): ܴଵଶ ൌ ሼܺ ǡ ܺଵଶ ǡ ܺଵଷ ǡ ܺଵସ ǡ ܺଵହ ሽ – informative feature relations; ܴଵଶ ൌ ሼܺଵ ǡ ܺଷ ǡ ܺସ ǡ ܺ ǡ ଼ܺ ǡ ܺଵଶ ǡ ܺଵଷ ǡ ܺଵସ ǡ ܺଵହ ǡ ܺଵ ሽ – informative and reserve feature.
V. Conclusion To conclude, it should be noted that, theory of unobvious collections may be used in order to increase the clarity of informative, reserve and non-informative feature relations.
References [1] Khamroev A.Sh. A heuristic approach for defining relations between classes. The republican Scientific-technique conference: “The information and telecommunication problems”. Tashkent, 10-11 March, 2016. – Pp. 187-190. (in Uzbek). [2] Kamilov M.M., Khudayberdiev M.X., Minglikulov Z.B. The problem identify objects of plants. The republican Scientific-technique conference. – Jizzakh, 2015. – Pp. 188-190. (in Uzbek) [3] Krasinsky V.I. Recognition Method of fuzzy situations on set of quantitative and qualitative characteristics. The first regional forum "Siberian industry of information systems". Novosibirsk 2002: http://www-sbras.nsc.ru/win/telecom/forum_2002/ krasinsky/krasinsky.html (in Russian). [4] Zhuravlev Y.I., Ryazanov V.V., Senko O.V. "RECOGNITION". Mathematical methods. . Software system. Practical applications. - M.: "Fazis", 2005, P. 103 (in Russian). [5] Zhuravlev Y.I., Kamilov M.M., Tulyaganov Sh.E. Algorithms for calculating estimates and their application. -Tashkent: 1974. -124 pp. (in Russian). [6] Zhuravlev Y.I. Favorite scientific works. - M.: Master, 1998. - 420 pp. (in Russian).