I read an interesting market statistic recently from VentureBeats AI that stated \u201c87% of data science projects never make it into production\u201d. This was later validated by Gartner who added that \u201c80% of AI projects will remain alchemy [projects], run by wizards\u201d.\u00a0I appreciate that as an industry we are still very early in our maturity with respect to \u00a0data science\/AI\/ML lifecycle management, that AI\/ML engineering practices are nascent, and that the tooling ecosystem is still emerging and expanding, but I have to wonder how much longer employers will tolerate investing in technology, tools, and people with little to no return. I also wonder how much longer data scientist will tolerate building models that never get used. There is clearly an issue with the way we develop, publish, support and maintain AI\/ML solutions and the issue is impeding ROI.\u00a0To accelerate ROI in AI\/ML initiatives, I suggest we look at Flow.\nFlow, is the experience that we have while following a process.\u00a0For this article, let\u2019s look specifically at the documented and undocumented processes that governs and manage how concepts are translated into AI\/ML models \u2026 and then how they are ultimately transformed into application code, where those concepts then create value for the enterprise.\nPersonas and the Data Analytics Value Cycle\nTo address flow, we need to first identify and understand the key personas (or \u00a0stakeholders) as well as their role and function in the development process.\u00a0 In most organizations, there are at least four key personas:\n\nBusiness Product Owner: Identifies the use cases and develops initial hypothesis and acceptance criteria.\nData Engineer: Surfaces trusted data consistently and reliably so that data scientist and application developers can build the solution to test the hypothesis in the market.\nData Scientist: Designs, builds, and packages inferences and models that enable hypothesis market testing.\nApplication Developer: Embeds and deploys the analytics solutions via an application so that the end users\/customers can interact the model, the system can generate and collect data for future training, and the initial hypothesis can be validated.\n\nAs evidenced by the varied role descriptions, each of these personas are critical to complete one successful evolution of the analytics value cycle.\u00a0As described by Rob Small, in his blog \u201cAccelerating the Analytics Value Cycle to Drive Tangible Business Outcomes\u201d, the analytics value cycle consists of three major technical components: data services, model development, and integrated applications.\u00a0\nExpanding on the scope of the analytics value cycle, I would add a 4th non-technical component\u201d backlog development and prioritization. Backlog development and prioritization is the process to identify market opportunities, define hypotheses, and then prioritize the hypotheses and work to be done.\n\u00a0Throughout the data analytics value cycle, the Data Scientist is instrumental since their role will help:\n\ndetermine the scope and risk of the hypothesis during backlog development\ndefine the data requirements for engineers building data services\nlead the model development step\nprovide knowledge transfer, support, and enablement to development teams that are consuming their model\n\nTo unlock ROI in the analytics value cycle, we will need to evaluate and analyze the data scientist experience as they traverse the process.\u00a0We will need to identify opportunities to optimize or automate process, to remove or reduce redundancies and wait queues, and to invest in tools and technologies that will accelerate cycle time.\u00a0In short, to deliver ROI in AI\/ML initiatives, we need to actively and intentionally discover and fill the seams across the process and automation that add delays, effort, and waste \u00a0the process.\u00a0 In other words, we must fix Flow.\nAn Example of Flow\nThe following example may be useful to further explain the concept and need of Flow:\nDay 1\nJane has been tasked with building an inference model to identify arrythmias using EKG data from heart patients participating in a national study.\u00a0She needs patient records from multiple affiliated hospitals and in order to get this data, Jane has completed the requisite forms to have a snapshot of the data created and published to her lab. This will allow her to experiment without impacting production systems.\nDay 2\nWaiting on request to be fulfilled.\nDay 3\nJane\u2019s ticket is returned since it was missing the L4 approval in the system. Ticket is closed by service desk. Jane need to reopen the ticket.\u00a0It isn\u2019t clear where to add approvals, so Jane included a comment\/question asking how to link the approval to the request.\nDay 4\nBill, the data engineer, called Jane and walked her through the automated approval process and requisite fields needed to send her manager and project sponsor the request.\u00a0A few hours later, Jane had her approvals and Bill started processing the ticket.\nDay 5\nBill put the ticket on hold until the compliance and security approvals and procedures were added to the ticket. Jane got and email regarding the hold and a new ticket in the compliance tracking tool.\nDay 6\nWaiting on request to be fulfilled.\nDay 7\nStill waiting on request to be fulfilled.\n\u00a0\n\u00a0\nDay 8\nJane gets the approval code and link from compliance to policy procedural requirements.\u00a0She adds that data to the ticket and resubmits her ticket to Data Engineering.\nDay 9\nWaiting on request to be fulfilled.\nDay 10\nJane gets multiple emails as her ticket is passed around to the Backup team to add this snap shot request into the schedule, to the Data Engineering team to write the needed queries and apply the need encryptions to that data and to the Infrastructure team to provision a target cluster for her data.\nDay 11\nJane is excite expecting to have her tickets completed.\u00a0 Instead, the Infrastructure team puts the ticket on hold because Jane hasn\u2019t submitted a ticket for her sandbox environment.\u00a0There isn\u2019t a target landing zone and the Infrastructure team doesn\u2019t have the needed specifications to build the cluster. Jane submits her ticket copying the request from her last project.\u00a0It is likely over-spec\u2019d but it is faster than starting from scratch.\nDay 12\nJane gets her environments. She updates the Data Engineering ticket with the environment details.\nDay 13\nHer snapshots are loaded! Excited to get working, Jane opens the environment to find none of her tools are there, and the connectors to the data haven\u2019t been configured yet. She logs another ticket to get license keys and install binaries for the tools.\u00a0Within the hour, Jane is feverishly installing and configuring tools and database connections.\u00a0At the end of the day, she finally is ready to start building a model.\u00a0 \n\u00a0\nEpilogue\nTo date, Jane has spent well over 100 hours completing tickets, getting approvals, waiting for data, installing software, and configuring the environment.\u00a0She is no closer to defining the needed model; she is starting to feel pressure from her business sponsor and product owner; and, she is frustrated with the level of service and speed getting the needed tools and data to do her job.\u00a0Jane is not feeling valued.\u00a0She doesn\u2019t not feel like she is contributing.\u00a0In short, Jane is not experiencing good Flow.\u00a0She is not able to focus her efforts on value-added activities (aka. model development) and she is certainly not experiencing any feeling of accomplishment at work.\nKnow anyone who has had a similar experience? \n\u00a0\nCan you imagine what happens to Jane if she accidentally requests the wrong data or submits incorrect configurations for the exploratory lab clusters?\u00a0 \nIn the example above, it\u2019s clear that the process is broken.\u00a0It\u2018s obvious why ROI is trapped in the system.\u00a0And, it\u2019s understandable why Jane is experiencing dissatisfaction at work.\u00a0\nToo often, enterprises look to solve these problems by buying new technology, subscribing to a new cloud service, or trying to automate specific steps in a process, like automating cluster provisioning.\u00a0While many of the actions are or can be components of the solution, none can create Flow. And as such, none will unlock ROI.\nCreating Flow\nTo address Flow, we need to first understand the end-to-end process, and then overlay the experience of the personas as they move through that process.\u00a0Studying the \u00a0experience, we best reveal the undocumented steps and waste in the process.\u00a0For example, while it may only take 15 minutes of work to provision a compute cluster, the data scientist experiences a 3-day request fulfillment turnaround time (1 day to submit, 1 day to process, and 1 day to deliver).\u00a0Even though the provisioning process has been optimized and automated, the end user experience is still slow and painful.\nCentral to unlocking Flow in the data analytics value cycles is the use of value stream mapping techniques. Value stream mapping is a tool borrowed from Lean to assess and evaluate process efficiency of a repeatable cycle.\u00a0 Using a value stream, organizations can gain transparency into the AI\/ML value cycles and collect data that reveal insights illustrating how and where to improve Flow.\u00a0 Using Jane\u2019s experience, we can assemble a crude value stream map like the one below.\u00a0\nIn this value stream map, we can understand process efficiency by calculating value-added time (or, the hands-on keyboard time, or eyes-on requirements time) and non-value added time (waiting in queues, loop backs, handoffs, etc.).\u00a0 Using these two data points, we can them tabulate process efficiency.\nProcess Efficiency = \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Value Added Time\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \n\u00a0\u00a0\u00a0 (Value Added Time + Non-value Added Time)\u00a0\nIn the example below, value-added time is calculated at 4.0 days whereas non-value added time is calculated at 6.1 days.\u00a0 Using the formula above, we divide value-added time by total elapsed time (value added + non-value added or 10.1 days). The results of the formula is a process efficiency of 39%. When we convert process efficiency ratio into real business impact, we see that it takes Jane and company nearly 2.5 weeks to create 1 week of value.\n Dell Technologies\nSample Value Stream Map of Jane\u2019s data and environment standup process.\nValue Stream Mapping and Flow\nValue Stream Maps are great tools for improving process transparency because they span organizational barriers and departmental silos and create enterprise (macro-) level context for team members. Departmental processes or task specific sub-process that are typically invisible, or black boxes, quickly become visible to all personas and roles along the pathway. The shared transparency (data)enables teams to make informed decisions about where and how to invest in people and process. For example, what if an exploratory lab came in a standard size, with built-in and predefined scaling capabilities -- would you need to have the infrastructure workflow outlined above?\u00a0 How could this automation impact process efficiency?\n\u00a0\nThe Value Stream Map tool helps to highlight inefficiencies and bottlenecks in the process that impede flow and trap ROI in the system; they help us to define our problem correctly. According to Steve Jobs, \u201cif you define the problem correctly, you almost have the solution.\u201d\nTo learn how our organization employs Value Stream Mapping, click here. You\u2019ll also find a video, infographic, and a service brief for download.