by Thor Olavsrud

Lexmark retools, reskills for data transformation

Feature
Jul 21, 2021
AnalyticsCIO 100Data Management

To eliminate data silos and achieve more business value from the terabytes generated every week, Lexmark established a single, end-to-end source of data truth. Here’s what it took to get there.

Global printer and imaging products manufacturer, Lexmark International, generates a lot of data. Its Lexmark Managed Print Services alone consists of more than a million printers and multifunction devices that generate more than a terabyte of data per week. Lexmark has more than six million additional devices in the wild, and customers of those devices can opt-in to share their data with Lexmark as well.

The volume and velocity of that data made managing and gaining insight from it a daunting task until it created the Lexmark Product Digital Thread (PDT), a live, integrated repository and audit trail of the real-time data generated across the end-to-end lifecycle of its products, from design to recycling. Lexmark has won a CIO 100 Award in IT Excellence for the PDT.

“It’s really creating this correlated, business-consumable data across the lifecycle of our products, both our hardware and our supplies, as well as our key components,” says Andy Kopp, director of transformation products at Lexmark. “This really manifests itself in three distinct tiers within our Azure data lake, what we call our raw, refined, and consumption layers. These tiers of data support users across the spectrum of the types of business users and analytics that we need to do.”

Lexmark’s data challenge

It wasn’t just data volume that was giving Lexmark trouble. Its data was also spread out across an array of business applications typical of global manufacturers: CADD, PLM, ERP, CRM, and more. Additionally, the company’s supply chain is highly distributed. Partnering arrangements for the design, manufacturing, warehousing, and distribution of its products mean its data is disseminated beyond its walls.

Andy Kopp, director of transformation products, Lexmark Lexmark

Andy Kopp, director of transformation products, Lexmark

“Like every large, global, high-tech manufacturer, Lexmark has a very data-intensive business, and the pandemic was certainly a multiplier of that,” Kopp says. “Our value chain is quite extended, from our suppliers’ suppliers to our customers’ customers, and how products are designed and manufactured and fulfilled through this extended value chain really makes us, largely, a data company to be able to manage and operate our business.”

As a result of these issues, Lexmark developed numerous batch and transactional integrations to move data from source systems into repositories, to move data between it and its partners and customers, to integrate with its business applications, and to synchronize master data across systems of entry, record, and consumption. In turn, that caused many business users to carve out their own data domains, with customized extracts and repositories that inevitably created silos of data within business functions and applications.

Kopp says the situation made it challenging to analyze data across the lifecycle. When that did happen, it was usually conducted as a one-off exercise by cross-functional teams that took days or even weeks. It required advanced business user data wranglers who were able to carve out methods to perform highly specialized analysis. It also meant, though, that the knowledge of how to identify, extract, transform, and analyze the data was siloed with those advanced business users.

“There’s a lot of data heroics going on in Lexmark as people try to find answers to the business’s questions and try to harvest this data to gain insights from it,” Kopp says.

Reducing the need for data heroics

With the PDT, Kopp and his team were seeking to reduce the need to perform such data heroics and establish a single, end-to-end source of data truth that was broadly accessible and consumable by business users.

Tackling the problem required more than technology, it necessitated restructuring and reskilling as well. Lexmark merged its IT and software R&D groups under a chief information and technology officer (Tom Eade, who has since retired). Inside that group, Lexmark established a formal data science and analytics (DS&A) team as the core of an extended community of practice (CoP). The team committed to identifying and developing Lexmark’s data heroes, and to replacing their Microsoft Excel and VBA tools with tools like R and Python.

“There is absolutely no aspect of our data management technology or capabilities that went untouched as part of this mission,” Kopp says. “It was really an almost complete retooling of our technical skills.”

While the company strived to provide training to help its people make that transition, a lot of it required employees to learn on the job.

“There was an enormous gap between formal instruction and being able to apply it at enterprise scale for a mission like digital threads,” Kopp says. “There was a tremendous amount of on-the-job learning. Because of that, you have to really live that mantra of ‘fail fast, make mistakes, learn from it, move on.’”

Concurrently, Lexmark also overhauled its data governance capabilities and created the enterprise data governance and ethics (EDGE) community, with a cross-functional executive team that comprises the EDGE Council. The council is responsible for final review and approval of data management policies and sets priorities for Lexmark’s portfolio of data-centric initiatives, including continuous improvement of the PDT and establishing new digital threads, like a consumer digital thread. The council also oversees defining data roles and responsibilities, including identifying and assigning data stewards and data custodians within business functions.

Capabilities of the PDT

The PDT was built on several reusable capabilities, chosen to address Lexmark’s three transformational needs: architecting and designing for privacy and security, architecting and designing for operations, and enabling business user data wranglers.

The capabilities included:

  • Security classifications to manage access and control at a field level to the business user’s point of consumption
  • Privacy designations to apply the appropriate treatment of data (e.g., encryption) at a field level or for a combination of fields
  • Low-code/no-code pipeline development to allow users to create graphical representations of actions to be taken on ingested data, and to provide the operations team with a single pane of glass to debug, restart, or rewind pipeline instances
  • A data dictionary to establish a definition of the source of ingested data
  • Data genealogy to capture the actions taken on data from source to target
  • A business glossary, to establish a definition, in business terms, of refined, transformed, consumable business data

The biggest challenge was cultural change

While there were some difficult technical challenges to establishing the PDT, Kopp says cultural change was the biggest challenge.

“It’s a very different way for us to work,” he says. “This whole notion of really democratizing data is great, but it’s a very challenging thing to pull off from a security perspective, from a privacy perspective, from a governance perspective, and everything that goes into that.”

Kopp notes that many of Lexmark’s data heroes built their reputations inside the company based on those abilities. In many cases they spent years perfecting a particular spreadsheet with formulas and macros and pivots to analyze a certain part of the business. Now they are being asked to give that up for the greater good.

“We don’t want one person to figure out the journey of transformation that data has to go on to solve a particular business need,” he says. “We want to make that more systematic. We want to put that into the solution for everybody to take advantage of, and that’s a big change. It’s something that doesn’t come overnight.”