Workload prediction for DIRAC distributed computing infrastructure

Monday, May 24, 2021 - Wednesday, June 2, 2021     

There is a new project of MLIT employee Igor Pelevanyuk within the framework of the 4th Wave of the INTEREST online scientific program, which is organized by the JINR University Centre and will be held from May 24 to July 2, 2021.
DIRAC Interware is a platform for the integration of different computing resources into a unified system. In JINR it is used to run various jobs related to Monte-Carlo generation and data reconstruction. So far almost a million jobs have been successfully executed which consumed 500 years of wall-time. With monitoring data available it is possible to predict the duration of jobs running on the integrated infrastructure. It would allow to plan load more precisely.
This project involves diving into the terminology of High-Throughput Computing and the practical use of collected data. Please, include a link to your Github(GitLab, BitBucket) account if you already have some projects completed in Python.

1. Analyze already performed workloads.
2. Design a model of the system.
3. Apply existing parameters to the model.
4. Perform prediction for the future workloads.
5. Compare prediction and real data.
Preliminary schedule by topics/tasks
The schedule is going to be established after beginning of the project
Required skills
A strong python programming skill is required. Good knowledge of OOP is required. Knowledge of design patterns is gonna be an advantage.
Acquired skills and experience
Practical experience in product development with Python. Data analysis experience.
Recommended literature
Design Patterns: Elements of Reusable Object-Oriented Software by The Gang of Four.