Getting natural language processing (NLP) models into production is a lot like buying a car. In both cases, you set your parameters for your desired outcome, test several approaches, likely retest them, and the minute you drive off the lot, value starts to plummet. Like having a car, having NLP or AI-enabled products has many benefits, but the maintenance never stops \u2014 at least to function properly over time, it shouldn\u2019t.\n\nWhile productionizing AI is hard enough, ensuring the accuracy of models down the line in a real-world environment can present even bigger governance challenges. Model accuracy degrades the moment it hits the market, as the predictable research environment it was trained on behaves differently in real life. Just as the highway is a different scenario than the lot at the dealership.\n\nIt\u2019s called concept drift \u2014 meaning when variables change, the learned concept may no longer be precise \u2014 and while it\u2019s nothing new in the field of AI and machine learning (ML), it\u2019s something that continues to challenge users. It\u2019s also a contributing factor as to why, despite huge investments in AI and NLP in recent years, only around 13% of data science projects actually make it into production (VentureBeat).\n\nSo what does it take to move products safely from research to production? Arguably just as important, what does it take to keep them in production accurately with the changing tides? There are a few considerations that enterprises should keep in mind to make sure their AI investments actually see the light of day.\n\nGetting AI models into production\n\nModel governance is a key component in productionizing NLP initiatives and a common reason so many products remain projects. Model governance covers how a company tracks activity, access, and behavior of models in a given production environment. It\u2019s important to monitor this to mitigate risk, troubleshoot, and maintain compliance. This concept is well understood among the AI global community, but it\u2019s also a thorn in their side. \n\nData from the 2021 NLP Industry Survey showed that high-accuracy tools that are easy to tune and customize were a top priority among respondents. Tech leaders echoed this, noting that accuracy, followed by production readiness, and scalability, was vital when evaluating NLP solutions. Constant tuning is key to models performing accurately over time, but it\u2019s also the biggest challenge practitioners face.\n\nNLP projects involve pipelines, in which the results from a previous task and pre-trained model are used downstream. Often, models need to be tuned and customized for their specific domains and applications. For example, a healthcare model trained on academic papers or medical journals will not perform the same when used by a media company to identify fake news.\n\nBetter searchability and collaboration among the AI community will play a key role in standardizing model governance practices. This includes storing modeling assets in a searchable catalog, including notebooks, datasets, resulting measurements, hyper-parameters, and other metadata. Enabling reproducibility and sharing of experiments across data science team members is another area that will be advantageous to those trying to get their projects to production-grade.\n\nMore tactically, rigorous testing and retesting is the best way to ensure models behave the same in production as they do in research \u2014 two very different environments. Versioning models that have advanced beyond an experiment to a release candidate, testing those candidates for accuracy, bias, and stability, and validating models before launching in new geographies or populations are factors that all practitioners should be exercising.\n\nWith any software launch, security and compliance should be baked into the strategy from the start, and AI projects are no different. Role-based access control and an approval workflow for model release and storing and providing all metadata needed for a full audit trail are some of the security measures necessary for a model to be considered production-ready.\n\nThese practices can significantly improve the chances of AI projects moving from ideation to production. More importantly, they help set the foundation for practices that should be applied once a product is customer-ready.\n\nKeeping AI models in production \n\nBack to the car analogy: There\u2019s no definitive \u201ccheck engine\u201d light for AI in production, so data teams need to be constantly monitoring their models. Unlike traditional software projects, it\u2019s important to keep data scientists and engineers on the project, even after the model is deployed.\n\nFrom an operational standpoint, this requires more resources, both human capital and cost-wise, which may be why so many organizations fail to do this. The pressure to keep up with the pace of business and move onto the \u2018next thing\u2019 also factors in, but perhaps the biggest oversight is that even IT leaders don\u2019t expect model degradation to be a problem.\n\nIn healthcare, for example, a model can analyze electronic medical records (EMRs) to predict a patient\u2019s likelihood of having an emergency C-Section based upon risk factors such as obesity, smoking or drug use, and other determinants of health. If the patient is dubbed high-risk, their practitioner may ask them to come in earlier or more frequently to reduce pregnancy complications.\n\nThe expectation is that these risk factors remain constant over time, and while many of them do, the patient is less predictable. Did they quit smoking? Were they diagnosed with gestational diabetes? There are also nuances in the way the clinician asks a question and records the answer in the hospital record that could result in different outcomes.\n\nThis can become even more tricky when you consider the NLP tools most practitioners are using. A majority (83%) of respondents from the aforementioned survey stated that they used at least one of the following NLP cloud services: AWS Comprehend, Azure Text Analytics, Google Cloud Natural Language AI, or IBM Watson NLU. While the popularity and accessibility of cloud services is obvious, tech leaders cited difficulty in tuning models and cost as major challenges. Essentially, even experts are grappling with maintaining the accuracy of models in production.\n\nAnother problem is that it simply takes time to see when something\u2019s amiss. How long that is can vary significantly. Amazon may be updating an algorithm for fraud detection and mistakenly blocks customers in the process. Within hours, maybe even minutes, customer service emails will point to an issue. In healthcare, it can take months to get enough data on a certain condition to see that a model has degraded.\n\nEssentially, to keep models accurate you need to apply the same rigor of testing, automating retrain pipelines, and measurement that was conducted before the model was deployed. When dealing with AI and ML models in production, It\u2019s more pertinent to expect problems than it is to expect optimal performance several months out.\n\nWhen you consider all the work it takes to get models into production and keep them there safely, it\u2019s understandable why 87% of data projects never make it to market. Despite this, 93% of tech leaders indicated that their NLP budgets grew by 10-30% compared to last year (Gradient Flow). It\u2019s encouraging to see growing investments in NLP technology, but it\u2019s all for naught if businesses don\u2019t take stock in the expertise, time, and continual updating required to deploy successful NLP projects.