Considering that AI Workloads running in a public cloud – or a cloud-enabled datacenter – need comprehensive planning and in-depth analysis of cost structures, data and compute need to end up in a cost-optimized production environment.
A post recently written on LinkedIn stated that, “I really don’t know a single customer of mine that doesn’t have a cloud strategy. Yet the details heavily differ. From on-premise to off-premise, from hybrid to public”. This, and several other discussions, prove that moving to the Cloud is in many ways real. But what does that for the rolling out artificial intelligence (AI) initiatives in any given company? And what does a move to “the” Cloud really mean?
A fellow CTO, and customer of mine, said in a broadcast, “At a certain time in their journey, most companies are using the public cloud, but depending on their need, they end up with a hybrid cloud or datacenter strategy”
For example, consider a new initiative in the area of AI and Machine Learning (although the following holds true for most compute and data intensive workloads): To achieve the desired outcomes of this initiative, different stakeholders have different goals in mind. The business owner wants to invest the money perfectly with the lowest TCO and the quickest outcome, and the data scientist wants to use his known environments, quickly spin up new machines, and needs access to all relevant data. Typically, first steps in developing new methods is either done on a local development machine (Laptop, Gaming PC) or in an easy-to-order cloud instance from one of the big cloud providers.
Once the minimum vital product (MVP) is successful, the need for a stable production comes into play. For the right placement of the new workload, some key criteria to consider are:
- Data rates: How much data do I need to feed to the workload, and how much data is transferred back?
- Data source: Where is the data located – in my own datacenters or somewhere else?
- Connectivity: How is my traffic flow from the Cloud to my Datacenter?
- Security: Which security constraints do I need to fulfill?
- Agility: How agile is my workload?
- Availability: How long do I need to run this?
Taking all this into account, the optimal placement of such a workload is often in an on-premise environment, but it needs to comply with user’s expectations for modern cloud environment, which are:
- DevOps methodology and/or automation to scale up and down and deploy new workloads.
- State-of-the-art environments, both on hardware and software.
- A balanced approach in cost vs. performance to see the benefits of running it on-premise.
A single solution won’t fit all customer needs, as not all requirements are always the same. Depending on existing environments, I/O throughputs and Compute vs. GPU demand, initial approaches to an on-premise solution could include:
- Dell Technologies GPUaaS
- Dell Technologies virtual HPC
- Dell Technologies Ready Architecture for Machine Learning
- VMWare Cloud Foundation on VxRail
In a nutshell, the solution that fits best into existing processes and the know-how of the team are ideal candidates for delivering a cloud-like operational model on-premise.
To learn more:
More information around Dell Technologies offerings for running effective in an on-premise cloud can be found here: