Computer Architectures for training LLM systems – Past Present and Challenges of Future Systems

Training Large Language Models (LLMs) requires using very large-scale infrastructure, such as dedicated clouds or even a group of clouds. In addition, training such models in a reasonable time requires the use of dedicated Hardware. In my talk, I will emphasize why specialized Hardware for training is needed, present some of the currently proposed solutions, and extend the discussion on the challenges future systems still need to address.