information technology; image recognition; machine learning; plant identification from photo images; online-system; SURF; SIFT; FREAK; BOVW

В умовах постійної взаємодії із зовнішнім світом і живою природою в житті людини постає проблема швидкої та точної ідентифікації рослин, оскільки рослини відіграють значну роль не тільки у сільськогосподарській галузі, але й також у паливно-енергетичному комплексі, медицині, екології, інтер’єрі, вивченні навколишнього світу тощо. У цій роботі запропоновано інформаційну онлайн-систему ідентифікації рослин за їх фотознімками. Розробку програмного забезпечення здійснено у вигляді клієнт-серверної архітектури із застосуванням сучасних технологій (мови С#, TypeScript, HTML, CSS, фреймворки ASP.NET MVC, ASP.NET WebApi, Angular), потужних алгоритмів розпізнавання зображень та машинного навчання. Математичну основу склали алгоритми знаходження ключових точок, такі як SURF, SIFT та FREAK, модель BOVW (Bag-Of-Visual-Words) і метод опорних векторів. Система має простий, лаконічний та зрозумілий інтерфейс, адаптивну верстку, забезпечує зручну взаємодію з користувачем з будь-якого пристрою за наявності доступу до мережі Інтернет, можливості легкого розширення бібліотеки зображень. Здійснено апробацію розробленої програми на різних наборах даних (фотографії листя дерев та квітів, у тому числі зроблених на мобільний телефон, що є найбільш типовим сценарієм взаємодії користувача із системою) та проведено порівняльний аналіз зазначених алгоритмів (досліджувався вплив кількості даних, слів у моделі BOVW, зашумленості та строкатості фону, типу ядрової функції у методі головних компонент, інваріантність до повороту та масштабу тощо). Навіть на досить малих за обсягом наборах даних для навчання система демонструє непогану якість розпізнавання.

разработанной программы и проведен сравнительный анализ алгоритмов. An online information system for identifying plants by their photographs has been developed. The mathematical basis was formed by algorithms for finding key points, such as SURF, SIFT and FREAK, the BOVW model and Support Vector Machine algorithm. The developed program was tested and a comparative analysis of the algorithms was carried out.
Keywords: information technology, image recognition, machine learning, plant identification from photo images, online-system, SURF, SIFT, FREAK, BOVW.

Introduction.
A big part of a modern society's life belongs to the interaction with the outside world and nature. Plants play a large role not only in the agricultural sector, but also in the fuel and energy complex, medicine, ecology, interiors, environmental studies, and more. In such conditions, the problem of identifying a given plant, as well as the issue of creating a flexible and functional system with comfortable access to it is essential. An online plant identification system is a way to simplify the lives of scientists, travelers, farmers or even the average person. Anyone with a mobile device, tablet or computer with Internet access and the ability to take a photo of a flower or leaf can get the full information about the plant they see in front of them.
Literature review and problem statement. Analyzing existing plant recognition software for images, we can conclude that they do not always show good or at least close to correct results. Web applications are virtually unrepresented in this field.
The following are the closest analogues. "What Mushroom Is Growing In My Yard" [1] is one of Urban Mushrooms website services. Solves the problem of mushrooms identification and serves as an online encyclopedia. The main drawback is the insufficiently intelligent process of analysis: the user is only able to filter species in the environment of life, and then he has to choose the most similar images from the given list. It works only with mushrooms.
"Garden Answers Plant Identification" [2] is a mobile application that recognizes plants by photo. The app finds similar plants and lets the user choose from a list. As a result, it most often shows a list of plants of similar color. It works well with flowers, but does not give satisfactory results on the leaves of the plant. For the free version, functionality is limited.
«Cam Find» [3] is another mobile application that provides image recognition mechanisms. It works not only with plants, so it gives inaccurate results regarding plants. Thus, the development of an online system for plant recognition by photo images is an actual and important task in practice.
The purpose of this work is to create mathematical base and software for online system that allows to identify plants by their photo images, which has a clear interface, provides convenient interaction with the user, has the ability to expand the image library.
Materials and methods. When considering the task of identifying a leaf or a flower on an image, there are a number of issues that are important for successful resolution: scale, localization of the object on a specific part of the image, background and noise, projection, rotation and angle. Since all these factors can radically affect the classification result, finding the key points in the image is the most relevant method to identify a plant. The most popular key point search algorithms currently implemented are SURF, SIFT, and FREAK [4][5][6].
The key point finding algorithm results in a set of vectors or, in the case of binary descriptors, binary lines that describe the image. The length of these vectors for each of the images is different depending on the number of key points found, which is a problem for further application of machine learning methods for image classification. This problem is solved by the presenting of the image in the form of Bag-Of-Visual-Words (BOVW) [7], this model was implemented in work. After that, the Support Vector Machine algorithm is applied, which allows the identification of plants.
The software consists of two parts: server and client, developed on the basis of the following technologies: • server part -language С#, frameworks ASP.NET MVC and ASP.NET WebApi; • client part -languages TypeScript, HTML and CSS, framework Angular.
The advantage of this system is that the data can be dynamic. Photo sets, classes, class data can be modified or expanded without recompiling the software system. Next to the photo sets describing a particular class of plants a JSON file describing that class should be. This format not only add flexibility to the system, but also greatly facilitates its potential localization.
The system interface and examples of its work on identifying plants by photos of their leaves are shown in figures 1 and 2. Figure 3 represents the results of key point detection by the SURF and FREAK algorithms. Photos of tree leaves were taken on a mobile phone. This is an example of the most common scenario of user interaction with the system.
The developed software can also be applied to the recognition of flowers and other structural elements of plants. An example is shown in Figure 4. The results of the comparative analysis and research of the quality of the implemented algorithms in the problem of plant identification are given below. The data set collected for training the model consists of three classes of images: poplar leaves (26 pcs), maple (27 pcs) and oak (24 pcs). The performance of the system using the SURF method, the chi-square kernel function in the Support Vector Machine algorithm, and the number of words in Bag-Of-Visual-Words 36 with using cross validation to verify accuracy is 70% correct identification. That is, even with a sufficiently small training dataset, the system shows a satisfactory result. Even with a slight increase the number of images, the quality of recognition is increased. For example, when the amount of data in the ratio: poplar leaves (29 pcs), maple (27 pcs), oak (38 pcs) the quality is already 73%. Important properties of the SURF algorithm in this task are invariance of rotation and scale, as well as better speed compared to the SIFT algorithm.
Let us now consider how the word count parameter in the BOVW model influences the result of identification. When using the SURF method, the average value of the classification quality with the number of words 18 is 56 %, increasing the number of words to 64, we observe an increase the quality to 82 %, with the further increase of words to 200 the quality does not increase, and even slightly worsens 81 %. Applying the FREAK method for the number of words 18, 36 and 64, respectively, we get the percentages of classification quality of 56 %, 61 %, 70 %. Because the FREAK method works much longer on large words than SURF, and the quality is approximately the same as for 64 words, so this configuration is not rational in terms of time resources.
Comparative analysis of the application of different nuclear functions in the Support Vector Machine algorithm shows that in task of plants identification the chi-quadratic function demonstrates the most stable and qualitative results.
Also, experiments show that the noise and motley background of the images have a tremendous impact on the quality of recognition. The user can get better results by using a plain white sheet of paper as a background. This is especially important for FREAK descriptor, which is more sensitive to noise.
Conclusions. An online information system for identifying plants by their photographs has been created. Software development was carried out using modern technologies and powerful algorithms for image recognition and machine learning. The system has a simple, concise and intuitive interface, adaptive layout, provides convenient interaction with the user from any device with Internet access, the possibility of expanding the image library. A number of computational experiments and a comparative analysis of the implemented algorithms in task of plant identification are conducted. Even with relatively small training datasets, the system demonstrates decent recognition quality.