Wednesday, February 12, 2025 15:00 MLIT Conference Hall, Online seminar via MTS Link Evgeniy Antonov (NRNU MEPhI) Component Architecture of the Software Complex for Intellectual Analysis of Scientific and technical (based on the PhD thesis) Abstract: Modern methods of intelligent data analysis (IDA) based on machine learning, natural language processing, and visualization technologies require adaptation to the specifics of scientific and technical information (STI), which is characterized by a variety of formats and unstructured and poorly structured data. The present work deals with the development of the component architecture of the STI IDA program complex, which provides horizontal scalability for working with large data. The author presents specialized algorithms for data extraction and saturation, taking into account the peculiarities of scientific publications, including the extraction of text keywords, physical quantities and units of measurement, chemical elements, tables, images, the unification of names of affiliations and countries, as well as the definition of intergovernmental associations. The program complex is composed of four main blocks: a client-server module, a distributed workflow management module, a data processing and saturation module, and a data warehouse. The system's architecture offers flexibility, enabling the augmentation of its functionality through the integration of contemporary technological solutions. The system's capacity to operate with diverse data sources, encompassing PDF documents, web pages, and databases, is a notable feature. It furnishes interactive analytical dashboards to facilitate the visualization of results, thereby enhancing the system's utility. The theoretical and practical significance of the work is evident in the development of existing approaches to IDA and the implementation of the developed solutions in real-world projects. In particular, the system has been successfully implemented in several notable projects, including the creation of a database of properties and structures of irradiated materials, the digitalization of experimental data, and the establishment of a repository of scientific publications from the Joint Institute for Nuclear Research. Сonnecting to MTS Link. Information on the seminar and the link to connect are available at Indico.