The update signifies a broader shift within ROCm, as AMD is streamlining components and transitioning certain tools to new repositories to enhance integration and compatibility with existing systems. Key changes include the migration of AMD SMI to the AMDTools repository and the phasing out of tools like ROCTracer and ROCProfiler. The update also emphasizes the deprecation of several features, including AMDGPU wavefront size compiler macros and specific ROCm Object Tooling tools, with plans to integrate their functionalities into newer solutions.
In the realm of deep learning, ROCm continues to support various frameworks, including Taichi and Megablocks, enhancing its ecosystem for high-performance computing. The release notes detail the updates, known issues, and upcoming changes, ensuring developers are well-informed about the evolving landscape of ROCm.
Looking ahead, users can expect further enhancements in future releases, particularly regarding the HIP runtime API, which aims to achieve greater alignment with CUDA APIs and improve overall efficiency. For detailed information on compatibility, changes, and tutorials, developers are encouraged to consult the ROCm documentation.
In summary, ROCm 6.4.3 marks a significant step forward in AMD's commitment to advancing the capabilities of its GPU ecosystem for deep learning and HPC applications. The ongoing updates and strategic changes reflect AMD's focus on providing developers with robust tools and support, paving the way for innovative applications and improved performance in computational tasks
ROCm 6.4.3 released
AMD has released ROCm 6.4.3, a significant release that addresses multiple issues, featuring updates for AMD Radeon PRO and Radeon GPU drivers, enhancements to ROCm SMI, and improvements to ROCm documentation. The update addresses a problem that was leading to performance degradation in communication operations due to heightened latency in specific RCCL applications. The update addresses a problem in the AMDGPU driver's scheduler constraints that may lead to failures in queue preemption during workload execution. The ROCm documentation is being consistently updated to offer clearer and more comprehensive guidance tailored to a diverse range of user needs and use cases.
TThe release includes five new tutorials specifically designed for AI developers, which cover topics such as inference, ChatQnA vLLM deployment and performance evaluation, text-to-video generation with ComfyUI, DeepSeek Janus Pro on CPU or GPU, DeepSeek-R1 with vLLM V1, and GPU development and optimization. AMD ROCm offers a robust ecosystem for deep learning development, featuring support for Taichi, a streamlined library designed for mixture-of-experts training, along with updated information on hardware and library support. The support for the operating system and hardware remains consistent in this release.