New Alluxio Enterprise AI Innovations Accelerate GPUs Anywhere with 97%+ GPU Utilization
July 09 2024 - 8:00AM
Alluxio, the developer of the open-source data platform, today
announced the immediate availability of the latest enhancements in
Alluxio Enterprise AI. Version 3.2 showcases the platform's
capability to utilize GPU resources universally, improvements in
I/O performance, and competitive end-to-end performance with HPC
storage. It also introduces a new Python interface and
sophisticated cache management features. These advancements empower
organizations to fully exploit their AI infrastructure, ensuring
peak performance, cost-effectiveness, flexibility and
manageability.
AI workloads face several challenges, including the mismatch
between data access speed and GPU computation, which leads to
underutilized GPUs due to slow data loading in frameworks like Ray,
PyTorch and TensorFlow. Alluxio Enterprise AI 3.2 addresses this by
enhancing I/O performance and achieving over 97% GPU utilization.
Additionally, while HPC storage provides good performance, it
demands significant infrastructure investments. Alluxio Enterprise
AI 3.2 offers comparable performance using existing data lakes,
eliminating the need for extra HPC storage. Lastly, managing
complex integrations between compute and storage is challenging,
but the new release simplifies this with a Pythonic filesystem
interface, supporting POSIX, S3, and Python, making it easily
adoptable by different teams.
"At Alluxio, our vision is to serve data to all data-driven
applications, including the most cutting-edge AI applications,"
said Haoyuan Li, Founder and CEO, Alluxio. "With our latest
Enterprise AI product, we take a significant leap forward in
empowering organizations to harness the full potential of their
data and AI investments. We are committed to providing cutting-edge
solutions that address the evolving challenges in the AI landscape,
ensuring our customers stay ahead of the curve and unlock the true
value of their data."
Alluxio Enterprise AI includes the following key features:
- Leverage GPUs Anywhere for Speed and Agility -
Alluxio Enterprise AI 3.2 empowers organizations to run AI
workloads wherever GPUs are available, ideal for hybrid and
multi-cloud environments. Its intelligent caching and data
management bring data closer to GPUs, ensuring efficient
utilization even with remote data. The unified namespace simplifies
access across storage systems, enabling seamless AI execution in
diverse and distributed environments, allowing for scalable AI
platforms without data locality constraints.
- Comparable Performance to HPC Storage - MLPerf
benchmarks show Alluxio Enterprise AI 3.2 matches HPC storage
performance, utilizing existing data lake resources. In tests like
BERT and 3D U-Net, Alluxio delivers comparable model training
performance on various A100 GPU configurations, proving its
scalability and efficiency in real production environments without
needing additional HPC storage infrastructure.
- Higher I/O Performance and 97%+ GPU
Utilization - Alluxio Enterprise AI 3.2 enhances I/O
performance, achieving up to 10GB/s throughput and 200K IOPS with a
single client, scaling to hundreds of clients. This performance
fully saturates 8 A100 GPUs on a single node, showing over 97% GPU
utilization in large language model training benchmarks. New
checkpoint read/write support optimizes training recommendation
engines and large language models, preventing GPU idle time.
- New Filesystem API for Python Applications -
Version 3.2 introduces the Alluxio Python FileSystem API, an FSSpec
implementation, enabling seamless integration with Python
applications. This expands Alluxio's interoperability within the
Python ecosystem, allowing frameworks like Ray to easily access
local and remote storage systems.
- Advanced Cache Management for Efficiency and
Control - The 3.2 release offers advanced cache management
features, providing administrators precise control over data. A new
RESTful API facilitates seamless cache management, while an
intelligent cache filter optimizes disk usage by caching hot data
selectively. The cache free command offers granular control,
improving cache efficiency, reducing costs, and enhancing data
management flexibility.
"The latest release of Alluxio Enterprise AI is a game-changer
for our customers, delivering unparalleled performance,
flexibility, and ease of use," said Adit Madan, Director of Product
at Alluxio. "By achieving comparable performance to HPC storage and
enabling GPU utilization anywhere, we're not just solving today's
challenges – we're future-proofing AI workloads for the next
generation of innovations. With the introduction of our Python
FileSystem API, Alluxio empowers data scientists and AI engineers
to focus on building groundbreaking models without worrying about
data access bottlenecks or resource constraints."
“We have successfully deployed a secure and efficient data lake
architecture built on Alluxio. This strategic initiative has
significantly enhanced the performance of our compute engines and
simplified data engineering workflows, making data processing and
analysis seamless and more efficient,” said Hu Zhicheng, Data
Architect at Geely (parent company of Volvo). “We are honored to
collaborate with Alluxio in creating an industry-leading data and
AI platform, driving the future of data-driven intelligent
development.”
Availability Alluxio Enterprise AI version 3.2
is immediately available for download here:
https://www.alluxio.io/download/.
Supporting Resources
- Download a trial version: https://www.alluxio.io/download/
- Product announcement blog:
https://www.alluxio.io/blog/whats-new-in-3-2/
- Webinar registration link:
https://us06web.zoom.us/webinar/register/WN_Hg7hQoBBTHObfbH8dTI3Hw#/registration
- Documentation:
https://docs.alluxio.io/ee-ai/user/stable/en/Overview.html
- GPU utilization rate testing tool:
https://www.alluxio.io/gpu-test-tool/
About Alluxio
Alluxio, a leading provider of the high performance data
platform for analytics and AI, accelerates time-to-value of data
and AI initiatives and maximizes infrastructure ROI. Uniquely
positioned at the intersection of compute and storage systems,
Alluxio has a universal view of workloads on the data platform
across stages of a data pipeline. This enables Alluxio to provide
high performance data access regardless of where the data resides,
simplify data engineering, optimize GPU utilization, and reduce
cloud and storage costs. With Alluxio, organizations can achieve
magnitudes faster model training and serving without the need for
specialized storage, and build AI infrastructure on existing data
lakes. Backed by leading investors, Alluxio powers technology,
internet, financial services, and telecom companies, including 9
out of the top 10 internet companies globally. To learn more, visit
www.alluxio.io.
Media Contact:
Beth WinkowskiWinkowski Public Relations, LLC for
Alluxio978-649-7189beth@alluxio.com