In the fast-paced world of artificial intelligence (AI), as companies and tech-enthusiasts ponder about the works of DeepMind or Google’s AlphaGo victory over Lee Sedol, an old question, and now with a new twist, is back in a spiral of modern preparations: when it comes to AI infrastructure, is self-hosting the pioneering way to go or is it better to go the tried-and-tested route, which is, of course, cloud solutions? Recently, the founder of the AI startup Lytix, Sid Premkumar reignited this debate after publishing a blog post showing that training an open-source AI model on Lytix’s own infrastructure, instead of using Amazon Web Services (AWS), could cost less.
An instructive blog post by Sid Premkumar sheds light on the financial trade-offs involved in running the Llama-3 (8B) model; the post does a comparison with AWS, which Premkumar pits head-to-head with a self-hosted setup. The AWS g4dn.16xlarge instance has impressive specs (ie, Nvidia Tesla T4 GPUs, 384 GiB memory, and 112 vCPUs), but it comes at an astronomical price tag that, when compared with the upfront and operational cost of self-hosting, can seem eye-wateringly high. Premkumar found staggering potential savings, but critically depends on the assumption of 100 per cent utilisation of the hardware – a scenario that is more theoretical than practical for most applications.
The rhetoric around AI infrastructure today closely resembles that of the early discussions around cloud computing. The champions of on-premise solutions singing the praises of control, security and the potential for cost savings can be matched by the self-hosting AI infrastructure advocates today. But, as was the case then, evolving technological needs and capabilities could once again shift the balance in favour of the cloud.
The cloud, which pushes a pay-as-you-go model, is simply more efficient in the pay-as-you-go paradigm. This is especially true when you look at the total cost of ownership, not just the hardware costs but the lifecycle management, maintenance and staffing overhead that self-hosting involves.
The need for a diverse set of specialist skills to build and maintain that infrastructure means it’s almost impossible for even a large organisation to do it on its own. Cloud providers offer an off-the-shelf solution to that problem; it’s a lot cheaper and less tricky to pay someone else to do it for you.
Given the fast-evolving nature of AI, cloud services deliver infrastructure provisioning and management flexibility at a speed that’s difficult to match by static, on-prem solutions, which can quickly become obsolete.
Security and operational reliability on the part of the cloud provider are both almost unparalleled in the world, and the control plane that these providers maintain has an arsenal of security-related features that most organisations wouldn’t be able to replicate. When it comes to the mission-critical workloads of AI, this is vital.
The costs of entry into AI infrastructure are enormous. You can build your own AMD or Nvidia GPU-based systems, but the high cost of that hardware and its capacity to handle AI workloads gives those sparsely populated playing fields to only the most financially endowed. As a result, the predominant way that we’re seeing AI’s transformative power applied is by granting broader access to this technology via cloud providers’ elasticity and costs-spreading power.
Cloud arguments are compelling, but there is a niche but important case for local computing and storage resources beating a centralised cloud service where latency is the enemy. But even in this space, one can observe incursions by cloud providers – a hybrid of the scalability of the cloud and the immediacy of the edge.
The battle for AI infrastructure supremacy between self-hosting and the cloud ends only in one way: with the cloud winning amid its breadth of cost, capability and flexibility advantages. While the illusion of up-front savings remains a big draw for self-hosting, those savings pale in comparison with the total cost of ownership given the necessary skills, infrastructure and technology investments required. Add to that the technological obsolescence concerns, and the argument for the cloud is pretty compelling. As the frontier of AI evolves, however, there’s simply no safer bet for using the disruptive technology to its fullest potential than aligning yourself with a scalable, secure and innovative ecosystem – aka the cloud.
There is a $10 billion edge in AI infrastructural decisions that all point decisively toward implementing solutions in the cloud, and this edge can be seen across all three dimensions of value: financial, operational and strategic. All these factors have come together to make the cloud the indispensable partner in the organisational project of AI.
© 2024 UC Technology Inc . All Rights Reserved.