Top Five Things to Know About Data Storage for AI and Deep Learning

maltiger

In this blog series, I’ve focused on how NetApp can help you streamline your artificial intelligence projects. With technologies and services for managing data everywhere, NetApp is well positioned to solve your AI data challenges.

With the announcement of NetApp^® ONTAP^® AI, we’re taking our commitment to your AI success a step further. Built on our partnership with NVIDIA and powered by NVIDIA DGX supercomputers and NetApp all-flash storage, ONTAP AI lets you simplify, accelerate, and scale your AI data pipeline to gain deeper understanding in less time.

Combining Data Fabric enabled NetApp storage with GPU-accelerated NVIDIA computing systems results in capabilities that aren’t available from other turnkey AI solutions, on-premises or in the cloud. Here are five of the key advantages of ONTAP AI.

1. Industry-Leading Scale and Efficiency

Digital transformation projects combined with the Internet of Things are creating an unprecedented amount of data. Data pipelines have to scale to accommodate petabytes of data—or even exabytes in some cases. If you choose a solution that can’t scale to meet your capacity requirements, your operations will stall and have to be rearchitected.

The highly scalable NetApp AFF A800 included in ONTAP AI can scale out to 79PB of raw capacity—25x more than other turnkey AI solutions. ONTAP AI also provides industry-leading data reduction capabilities, combining deduplication, compression, and compaction—a trifecta not available from competing solutions. Although data reduction depends on the types of data being stored, if you achieve an average data reduction ratio of 6:1, effective capacity jumps to more than 450PB. Either way, ONTAP AI scales to support capacities that are orders of magnitude higher than competing solutions. Add to that NetApp’s superior cloud connectivity, and you can extend storage capacity even further.

2. Superior Performance

NetApp ONTAP AI is designed to deliver balanced compute and I/O performance that grows with your needs. Compute performance scales through the addition of NVIDIA DGX-1 servers. The NVIDIA DGX-1 is a fully integrated, turnkey hardware and software system, purpose-built for deep learning workflows.

Processing petabytes of data through an AI pipeline requires tremendous I/O bandwidth. The ability to scale I/O bandwidth is an area where ONTAP AI shines. Each AFF A800 HA controller pair delivers bandwidth up to 25GBps. You scale I/O bandwidth by adding controller pairs. With up to 12 HA pairs in a single cluster, that’s a maximum bandwidth of 300GBps—up to 6x the I/O performance of the closest competitor. These numbers are conservative — we’ve measured maximum bandwidth while maintaining sub-millisecond latency. More relaxed latency thresholds will support even higher bandwidth.

We’ve done a lot of work to characterize the performance of ONTAP AI, using metrics that are relevant to both data scientists and IT architects. Expect to see more on performance in an upcoming blog post.

3. Proven Architecture

Deploying a new AI project brings plenty of challenges. Battling infrastructure complexities should not be one of them. You need a proven architecture that accommodates your chosen AI software framework and scales from pilot to production with no surprises. Both the NVIDIA and NetApp technologies in ONTAP AI have a proven ability to deliver.

With a substantial head start in GPU computing for HPC and AI, NVIDIA has emerged as the clear leader for AI computing infrastructure. NVIDIA GPU-powered instances are now offered by all leading hyperscale cloud platforms, and NVIDIA supports a broad ecosystem of AI software. The NVIDIA DGX platform packages together multiple GPUs in a high-I/O configuration for demanding AI clusters.

The default DGX file system is NFS. As I explained in an earlier blog, NFS has become the de facto file system of choice for AI. I think it’s safe to say that NetApp is the undisputed NFS leader; we virtually invented the NFS appliance and NAS, and NetApp engineers have played a key role in the continuing evolution of NFS over the past 20+ years. No other NFS system matches NetApp flash storage when it comes to delivering both sequential and random I/O.

A final differentiator for ONTAP AI is our proven experience with converged infrastructure. FlexPod^® from NetApp and Cisco was a pioneer in the converged infrastructure space and remains a leader today. As a result, NetApp has a track record for developing reference architectures that exceed customer expectations. We’ve also built a team of global channel partners dedicated to customer success with converged infrastructure; many are also long-time NVIDIA partners.

4. Edge to Core to Cloud

A consistent theme in this blog series has been the need for an AI pipeline capable of spanning from edge to core to cloud. As described earlier, NVIDIA GPUs are available in the cloud as well as the desktop and data center. NVIDIA is also investing in technology to deliver and accelerate AI at the edge for a range of use cases, including autonomous cars.

The ONTAP operating environment provides superior data management from edge to core to cloud. You can deploy the NetApp software-defined storage offering, ONTAP Select, at the edge on almost any x86 platform, including a variety of rugged and semirugged options that are suitable for harsh environments. These devices can collect and process data in offline environments, which can be seamlessly moved to the on-premises data center or to the cloud.

The NetApp Cloud Volumes Service makes NetApp’s NFS capabilities available in AWS, Azure, and Google Cloud Platform, with other international cloud players to follow. Cloud Volumes ONTAP gives you NetApp data management with both file and block storage in the cloud. A significant part of your data footprint may be cold at any given time. With NetApp FabricPool, cold data can be tiered to the public cloud or to on-premises object storage and automatically recalled as needed, extending your available storage capacity.

NetApp’s strength at the edge, in the core, and in the cloud means that you can move, manage, and process data in any location using the same interfaces and tools. This gives NetApp an almost unfair advantage over competing vendors, none of whom offer anything comparable at the edge or in the cloud.

5. Advanced Support for the Data Sources That Matter

A final differentiator for NetApp is support for—and long experience with—the diverse data sources that remain critical to many enterprises. This includes workloads such as SAP, Oracle, DB2, Splunk, MongoDB, Cassandra, MarkLogic, Hadoop, and Spark—to name just a few—all of which can be hosted on a single, unified platform. With these data sources residing on ONTAP, smart data movement can be used to efficiently transfer data into an AI training platform such as ONTAP AI. The result is more streamlined and effective AI training across the broadest set of enterprise data sources with fewer hassles and compatibility issues.

Check out these resources to learn more about NetApp AI.