A Lightweight Execution Manager for Training TensorFlow Models under the Slurm Queuing System

Megtekintés/ Megnyitás
Metaadat
Teljes megjelenítés
Link a dokumentumra való hivatkozáshoz:
Gyűjtemény
Absztrakt
Artificial neural networks currently represent the flagship of Machine Learning
and have reached multiple fields alongside Computer Science. This kind of computational
model generally needs massive amounts of data and high-performance computing
resources. The availability of graphical processing units is especially relevant. Thus, only
institutional computing platforms and clusters satisfy such a high demand for
computational power and storage resources. These systems rely on resource managers
capable of handling multiple users and computing resources. However, the users interested
in working with artificial neural networks, especially those without a background in
Computer Engineering, might not master system administration. For them, planning their
executions within the framework of a resource manager focused on high-performance
computing is problematic. This work presents S-TFManager, an easy-to-use open-source
web manager for launching and controlling the execution of TensorFlow models consisting
of artificial neural networks in a heterogeneous cluster with a Slurm queuing system. Both
TensorFlow and Slurm are arguably the most extended tools in their respective fields, so
the proposed tool is of public interest. The tool, written in Python, includes built-in
batching and visualization capabilities, and its simplicity makes it easy to extend.
- Cím és alcím
- A Lightweight Execution Manager for Training TensorFlow Models under the Slurm Queuing System
- Szerző
- Lupión, Marcos
- Cruz, C. Nicolás
- Romero, Felipe
- Sanjuan, F. Juan
- Ortigosa, M. Pilar
- Megjelenés ideje
- 2025
- Hozzáférés szintje
- Open access
- ISSN, e-ISSN
- 1785-8860
- Nyelv
- en
- Terjedelem
- 16 p.
- Tárgyszó
- machine learning, TensorFlow, Slurm, Resource Management
- Változat
- Kiadói változat
- Egyéb azonosítók
- DOI: 10.12700/APH.22.3.2025.3.4
- A cikket/könyvrészletet tartalmazó dokumentum címe
- Acta Polytechnica Hungarica
- A forrás folyóirat éve
- 2025
- A forrás folyóirat évfolyama
- 22. évf.
- A forrás folyóirat száma
- 3. sz.
- Műfaj
- Tudományos cikk
- Tudományterület
- Műszaki tudományok - multidiszciplináris műszaki tudományok
- Egyetem
- Óbudai Egyetem