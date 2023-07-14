Whether it\u2019s text, images, video or, more likely, a combination of multiple models and services, taking advantage of generative AI is a \u2018when, not if\u2019 question for organizations.\n\nSince the release of ChatGPT last November, interest in generative AI has skyrocketed. It\u2019s already showing up in the top 20 shadow IT SaaS apps tracked by Productiv for business users and developers alike. But many organizations are limiting use of public tools while they set policies to source and use generative AI models. CIOs want to take advantage of this but on their terms\u2014and their own data.\n\nAs so often happens with new technologies, the question is whether to build or buy. For generative AI, that\u2019s complicated by the many options for refining and customising the services you can buy, and the work required to make a bought or built system into a useful, reliable, and responsible part of your organization\u2019s workflow. Organizations don\u2019t want to fall behind the competition, but they also want to avoid embarrassments like going to court, only to discover the legal precedent cited is made up by a large language model (LLM) prone to generating a plausible rather than factual answer.\n\nFrom taking to making\n\nRather than a rigid distinction between building and buying such complex technology, Eric Lamarre, the senior partner leading McKinsey Digital in North America, suggests thinking in terms of taking, shaping and\u2014in a very few cases\u2014making generative AI models.\n\n\u201cAs a \u2018taker,\u2019 you consume generative AI through either an API, like ChatGPT, or through another application, like GitHub Copilot, for software acceleration when you do coding,\u201d he says. Finished apps that include generative AI may not offer much competitive differentiation, and answers they produce won\u2019t always be perfect. But you want to adopt them to avoid competitive disadvantage, especially as they often arrive as new features in applications that staff already know how to use. \u201cEvery company will be doing that,\u201d he adds. \u201cIn the shaper model, you're leveraging existing foundational models, off the shelf, but retraining them with your own data.\u201d\n\nThat reduces the \u2018hallucination\u2019 problem and gets you more accurate and relevant results.\n\n\u201cContact center applications are very specific to the kind of products that the company makes, the kind of services it offers, and the kind of problems that have been surfacing,\u201d he says. A general LLM won\u2019t be calibrated for that, but you can recalibrate it\u2014a process known as fine-tuning\u2014to your own data. Fine-tuning applies to both hosted cloud LLMs and open source LLM models you run yourself, so this level of \u2018shaping\u2019 doesn\u2019t commit you to one approach.\n\nMcKinsey tried to speed up writing evaluations by feeding transcripts of evaluation interviews to an LLM. But without fine-tuning or grounding it in the organization\u2019s data, it was a complete failure, according to Lamarre. "The LLM didn't have any context about the different roles, what kind of work we do, or how we evaluate people,\u201d he says.\n\nGenerative AI models like ChatGPT and GPT4 with a plugin model let you augment the LLM by connecting it to APIs that retrieve real-time information or business data from other systems, add other types of computation, or even take action like open a ticket or make a booking. That includes curated data, like a legal database, in the same way you might add a commercial weather prediction service to a more traditional machine learning (ML) model for generating routes or predicting shipping times rather than build your own weather model from scratch.\n\nShaping will involve more than simply building an LLM into your own applications and processes, and organizations will need more sophisticated capabilities, Lamarre warned. \u201cTo get good output, you need to create a data environment that can be consumed by the model,\u201d he says. \u201cYou need to have data engineering skills, and be able to recalibrate these models, so you probably need machine learning capabilities on your staff, and you need to be good at prompt engineering. So how do I coach my people to ask the right questions to get the best output?\u201d\n\nHe cautioned CIOs against \u2018shiny object syndrome\u2019 with generative AI, especially if they haven\u2019t already built up expertise in ML. \u201cThe reality that\u2019s going to hit home in the next six to 12 months is generative AI is just as difficult as \u2018traditional\u2019 AI,\u201d he says.\n\nBut with those skills, shaping generative AI systems created from existing models and services will deliver applications most likely to offer competitive differentiation. However, making will be even more challenging and, most likely, rare, Lamarre predicts.\n\nBuy in or lose out\n\nFor smaller organizations like The Contingent, a non-profit supporting vulnerable children, families, and young professionals, even with 10 of its 60 staff working in technology and data research, building their own generative AI seems daunting to consider, according to CIO Peter Kim.\n\nThere\u2019s a crisis in child welfare with support needs outpacing capacity, and he\u2019s interested in how generative AI can help profile audiences, evaluate messaging around the continuum of opportunities for volunteering, match applicants with internships, and even reduce the time it takes to recruit new staff.\n\nThat will start with using the Copilot features Microsoft is introducing in many products, including in Cloud for Nonprofit. \u201cIt would seem almost foolish to pass this up, because it\u2019s just going to become part of the norm,\u201d he says. \u201cIf you\u2019re not using it, you\u2019re going to get left behind.\u201d\n\nBut Kim also plans to customize some of the generative AI services available. He expects it to be particularly helpful for coding the many connectors the non-profit has to build for the disparate, often antiquated, systems government and private agencies use, and writing data queries. In addition, he hopes to understand nuances of geographical and demographic data, and extract insights from historical data and compare it to live data to identify patterns and opportunities to move quickly.\n\nRather than devote resources to replicate generative AI capabilities already available, that time and effort will go to automating existing manual processes and exploring new possibilities. \u201cWe're not imagining utilizing AI to do the same things just because that's the way we've always done it,\u201d he says. \u201cWith this new superpower, how should we develop or refine refactoring these business processes?\u201d\n\nBuying rather than building will make it easier to take advantage of new capabilities as they arrive, he suggests. \u201cI think one of the success of organizations in being able to utilize the tools that are becoming more readily available will lie in the ability to adapt and review.\u201d\n\nIn a larger organization, using commercially available LLMs that come with development tools and integrations will allow multiple departments to experiment with different approaches, discover where generative AI can be useful, and get experience with how to use it effectively. Even organizations with significant technology expertise like Airbnb and Deutsche Telekom are choosing to fine-tune LLMs like ChatGPT rather than build their own.\n\n\u201cYou take the large language model, and then you can bring it within your four walls and build that domain piece you need for your particular company and industry,\u201d National Grid group CIDO Adriana Karaboutis says. \u201cYou really have to take what's already there. You're going to be five years out here doing a moonshot while your competitors layer on top of everything that\u2019s already available.\u201d\n\nPanasonic\u2019s B2B Connect unit used the Azure OpenAI Service to build its ConnectAI assistant for internal use by its legal and accounting teams, as well as HR and IT, and the reasoning was similar, says Hiroki Mukaino, senior manager for IT & digital strategy. \u201cWe thought it would be technically difficult and costly for ordinary companies like us that haven\u2019t made a huge investment in generative AI to build such services on our own,\u201d he says.\n\nIncreasing employee productivity is a high priority and rather than spend time creating the LLM, Mukaino wanted to start building it into tools designed for their business workflow. \u201cBy using Azure OpenAI Service, we were able to create an AI assistant much faster than build an AI in-house, so we were able to spend our time on improving usability.\u201d\n\nHe also views the ability to further shape the generative AI options with plugins as a good way to customize it to Panasonic\u2019s needs, calling plugins important functions to compensate for the shortcomings of the current ChatGPT.\n\nFine-tuning cloud LLMs by using vector embeddings from your data is already in private preview in Azure Cognitive Search for the Azure OpenAI Service.\n\n\u201cWhile you can power your own copilot using any internal data, which immediately improves the accuracy and decreases the hallucination, when you add vector support, it\u2019s more efficient retrieving accurate information quickly,\u201d Microsoft AI platform corporate VP John Montgomery says. That creates a vector index for the data source\u2014whether that\u2019s documents in an on-premises file share or a SQL cloud database\u2014and an API endpoint to consume in your application.\n\nPanasonic is using this with both structured and unstructured data to power the ConnectAI assistant. Similarly, professional services provider EY is chaining multiple data sources together to build chat agents, which Montgomery calls a constellation of models, some of which might be open source models. \u201cInformation about how many pairs of eyeglasses the company health plan covers would be in an unstructured document, and checking the pairs claimed for and how much money is left in that benefit would be a structured query,\u201d he says.\n\nUse and protect data\n\nCompanies taking the shaper approach, Lamarre says, want the data environment to be completely contained within their four walls, and the model to be brought to their data, not the reverse. While whatever you type into the consumer versions of generative AI tools is used to train the models that drive them (the usual trade-off for free services), Google, Microsoft and OpenAI all say commercial customer data isn\u2019t used to train the models.\n\nFor example, you can run Azure OpenAI over your own data without fine-tuning, and even if you choose to fine-tune on your organization\u2019s data, that customization, like the data, stays inside your Microsoft tenant and isn\u2019t applied back to the core foundation model. \u201cThe data usage policy and content filtering capabilities were major factors in our decision to proceed,\u201d Mukaino says. \n\nAlthough the copyright and intellectual property aspects of generative AI remain largely untested by the courts, users of commercial models own the inputs and outputs of their models. Customers with particularly sensitive information, like government users, may even be able to turn off logging to avoid the slightest risk of data leakage through a log that captures something about a query.\n\nWhether you buy or build the LLM, organizations will need to think more about document privacy, authorization and governance, as well as data protection. Legal and compliance teams already need to be involved in uses of ML, but generative AI is pushing the legal and compliance areas of a company even further, says Lamarre.\n\nUnlike supervised learning on batches of data, an LLM will be used daily on new documents and data, so you need to be sure data is available only to users who are supposed to have access. If different regulations and compliance models apply to different areas of your business, you won\u2019t want them to get the same results.\n\nSource and verify\n\nAdding internal data to a generative AI tool Lamarre describes as \u2018a copilot for consultants,\u2019 which can be calibrated to use public or McKinsey data, produced good answers, but the company was still concerned they might be fabricated. \u201cWe can\u2019t be in the business of being wrong,\u201d he says. To avoid that, it cites the internal reference an answer is based on, and the consultant using it is responsible to check for accuracy.\n\nBut employees already have that responsibility when doing research online, Karaboutis points out. \u201cYou need intellectual curiosity and a healthy level of skepticism as these language models continue to learn and build up,\u201d she says. As a learning exercise for the senior leadership group, her team crated a deepfake video of her with a generated voice reading AI-generated text.\n\nApparently credible internal data can be wrong or just out of date, too, she cautioned. \u201cHow often do you have policy documents that haven\u2019t been removed from the intranet or the version control isn't there, and then an LLM finds them and starts saying \u2018our maternity policy is this in the UK, and it's this in the US.\u2019 We need to look at the attribution but also make sure we clean up our data,\u201d she says. \n\nResponsibly adopting generative AI mirrors lessons learned with low code, like knowing what data and applications are connecting into these services: it\u2019s about enhancing workflow, accelerating things people already do, and unlocking new capabilities, with the scale of automation, but still having human experts in the loop.\n\nShapers can differentiate\n\n\u201cWe believe generative AI is beneficial because it has a much wider range of use and flexibility in response than conventional tools and service, so it\u2019s more about how you utilize the tool to create competitive advantage rather than just the fact of using it,\u201d Mukaino says.\n\nReinventing customer support, retail, manufacturing, logistics, or industry specific workloads like wealth management with generative AI will take a lot of work, as will setting usage policies and monitoring the impact of the technology on workflows and outcomes. Budgeting for those resources and timescales are essential, too. It comes down to can you build and rebuild faster than competitors that are buying in models and tools that let them create applications straight away, and let more people in their organization experiment with what generative AI can do?\n\nGeneral LLMs from OpenAI, and the more specialized LLMs built on top of their work like GitHub Copilot, improve as large numbers of people use them: the accuracy of code generated by GitHub Copilot has become significantly more accurate since it was introduced last year. You could spend half a million dollars and get a model that only matches the previous generation of commercial models, and while benchmarking isn\u2019t always a reliable guide, these continue to show better results on benchmarks than open source models.\n\nBe prepared to revisit decisions about building or buying as the technology evolves, Lamarre warns. \u201cThe question comes down to, \u2018How much can I competitively differentiate if I build versus if I buy,\u2019 and I think that boundary is going to change over time,\u201d he says.\n\nIf you\u2019ve invested a lot of time and resources in building your own generative models, it\u2019s important to benchmark not just how they contribute to your organization but how they compare to the commercially available models your competition could adopt today, paying 10 to 15 cents for around a page of generated text, not what they had access to when you started your project.\n\nMajor investments\n\n\u201cThe build conversation is going to be reserved for people who probably already have a lot of expertise in building and designing large language models,\u201d Montgomery says, noting that Meta builds its LLMs on Azure, while Anthropic, Cohere, and Midjourney use Google Cloud infrastructure to train their LLMs.\n\nSome organizations do have the resources and competencies for this, and those that need a more specialized LLM for a domain may make the significant investments required to exceed the already reasonable performance of general models like GPT4.\n\nTraining your own version of an open source LLM will need extremely large data sets: while you can acquire these from somewhere like Hugging Face, you\u2019re still relying on someone else having curated them. Plus you\u2019ll still need data pipelines to clean, deduplicate, preprocess, and tokenize the data, as well as significant infrastructure for training, supervised fine-tuning, evaluation, and deployment, as well as the deep expertise to make the right choices for every step.\n\nThere are multiple collections with hundreds of pre-trained LLMs and other foundation models you can start with. Some are general, others more targeted. Generative AI startup Docugami, for instance, began training its own LLM five years ago, specifically to generate the XML semantic model for business documents, marking up elements like tables, lists and paragraphs rather than the phrases and sentences most LLMs work with. Based on that experience, Docugami CEO Jean Paoli suggests that specialized LLMs are going to outperform bigger or more expensive LLMs created for another purpose.\n\n\u201cIn the last two months, people have started to understand that LLMs, open source or not, could have different characteristics, that you can even have smaller ones that work better for specific scenarios,\u201d he says. But he adds most organizations won\u2019t create their own LLM and maybe not even their own version of an LLM.\n\nOnly a few companies will own large language models calibrated on the scale of the knowledge and purpose of the internet, adds Lamarre. \u201cI think the ones that you calibrate within your four walls will be much smaller in size,\u201d he says.\n\nIf they do decide to go down that route, CIOs will need to think about what kind of LLM best suits their scenarios, and with so many to choose from, a tool like Aviary can help. Consider the provenance of the model and the data it was trained on. These are similar questions that organizations have learned to ask about open source projects and components, Montgomery points out. \u201cAll the learnings that came from the open source revolution are happening in AI, and they're happening much quicker.\u201d\n\nIDC\u2019s AI Infrastructure View benchmark shows that getting the AI stack right is one of the most important decisions organizations should take, with inadequate systems the most common reason AI projects fail. It took more than 4,000 NVIDIA A100 GPUs to train Microsoft\u2019s\u00a0Megatron-Turing NLG 530B\u00a0model. While there are tools to make training more efficient, they still require significant expertise\u2014and the costs of even fine-tuning are high enough that you need strong AI engineering skills to keep costs down.\n\nDocugami\u2019s Paoli expects most organizations will buy a generative AI model rather than build, whether that means adopting an open source model or paying for a commercial service. \u201cThe building is going to be more about putting together things that already exist.\u201d That includes using these emerging stacks to significantly simplify assembling a solution from a mix of open source and commercial options.\n\nSo whether you buy or build the underlying AI, the tools adopted or created with generative AI should be treated as products, with all the usual user training and acceptance testing to make sure they can be used effectively. And be realistic about what they can deliver, Paoli warns. \n\n\u201cCIOs need to understand they're not going to buy one LLM that's going to change everything or do a digital transformation for them,\u201d he says.