\u201cIn 3 keys to keep your data lake from becoming a data swamp\u201d Thor Olavsrud provides CDOs, CIOs, and other business intelligence leaders with relevant guidance on preventing the so called \u2018data swamp\u2019. This includes:\n\nCollect less data to start with\nAdopt a machine learning strategy\nDetermine the business issue you\u2019re trying to address.\n\nThese are great suggestions \u2014 in particular, having a business mission for your data lake.\nWhy does this matter?\nClearly, delivering a data lake rather than a data swamp is important because insight-driven companies make more money and develop more sustainable barriers to entry. Forrester Research estimates that \u201cInsights-driven businesses will steal $1.2 trillion a year by 2020\u201d (\u201cInsights-Driven Business\u201d, Forrester Research, July 27, 2016).\nThe insight-driven accomplish this by experimenting and continuously learning. These firms are adding data lakes often at the same time as they put in place CDOs. They understand in particular that advanced data capabilities are needed to implement a successful digital transformation\u201d(\u201cData Centric Businesses need a Data Centric Leader\u201d, Forrester Research, April 26, 2016). Successful CDOs clearly need to lean on their CIOs to succeed at creating a valued data lake.\nTo further understand how to avoid a data swamp, I interviewed 12 leading edge CIOs; specifically, I asked them for recommendations regarding the dos and don\u2019ts of data lakes.\nCIOs see real opportunity to create new business value from data lakes and self-service business intelligence. They believe these trends matter because they are about increasing the availability and transparency of data and enabling business users with the ability to get answers without involving their IT colleagues. Most CIOs connect self-service business intelligence to the data lake, which means effective data lakes are not just about data storage, but also about citizen and professional data scientists\u2019 self-exploration of the information contained within them.\nSome CIOs even see self-service business intelligence options broadening the community served by IT:\n\u201cSelf Service BI allows our broader community to be engaged in relevant and up-to-date analytics-based decision making\u201d, CIO at Binghamton University\nWhile some see the valuable impact of taking this step when data lakes are used for every day decisions, Joanna Young, former CIO of Michigan State University suggests that CIOs shouldn\u2019t \u201cuse new tools to pave old reporting cow paths.\u201d Several CIOs, however, suggest self-service business intelligence and data lakes offer better business and IT alignment all by themselves.\nTaking these comments on board, I would like to suggest five things that will keep you out of the murk of a data swamp:\n1. Make it purposeful\nCIOs suggest that they have learned from the first wave of business intelligence and value is generated only when asking the right questions are asked. They are candid that even though the tools make data highly available, asking the right questions is still a challenging process for most organizations. Some worry that if data lakes don\u2019t move from the experimentation phase to generating business value, that CEOs and CFOs will start complaining and heads may roll. For this reason, it came as no surprise when one CIO said that a data lake with no business goals or purposes is just taking up space.\nCIOs say even though it is not easy that IT shouldn\u2019t always say yes to a data lake project. This sentiment was also found in The Big Data Payoff, Capgemini IDG. 2016 where interviews with 210 business executives showed that those who excel with big data use it to achieve strategic business objectives\nThis echoes Tom Davenport who said, \u201cEven the most analytically oriented company needs to target its analytical efforts where they will do the most good, because resources, especially talent, are always constrained.\u201d\nCIOs need to help make sure their business customers start with an end in mind and be clear that a \u201cdata first, questions later\u201d approach won\u2019t work. Fixing things can simply start with CIOs asking their business counterparts and internal IT proponents what problems they are trying to solve. Countering several industry slogans, CIOs say that while it is about the data, it's also really about the intended purposes and \u00a0translating data into answers can be even more challenging than many vendors portend. CIOs know from firsthand experience that understanding what data is telling you is crucial and claim that this won't happen by magic.\n2. Start simple\nOne CIO said that the notion of a data lake can feel difficult if you have difficulty identifying data definitions in huge systems. CIOs feel that a big bang approach is a loser. David Chou, CIO and Chief Digital Officer at Children\u2019s Mercy Hospital, feels there is a need to stop trying to solve "world data hunger" with data lakes because to do something about this involves people, processes, governance, and prioritization.\nChou wisely says that one organization\u2019s pilot could be another's phase one production rollout. CIOs suggest that projects should be based on your organization\u2019s size. They say CIO\u2019s should find a problem and focus on the source data that could possibly relate to solving this problem only. This should be about IT and the business learning together, piloting and starting small to get big. Or put differently \u2013 go slow to get fast.\nCIOs say that new tools need to be used to answer new questions or to enable better answers to existing questions. Interestingly, some CIOs questioned whether their IT organization should deliver these new approaches or whether it is better to deliver all of this through a public cloud vendor.\n3. Govern the data going in\nCIOs feel that it is critical that there is transparency with how data is used and combined. This includes proper design and planning, and identifying system \u2018sources of truth\u2019 that allow citizen and professional data scientists to access extracted data. Without this, collected data is just a bunch of bits taking up storage space from other systems. Chou puts it this way, \u201cIt all comes down to the governance model.\u201d\n4. Fix data problems\nCIOs stress the need for data hygiene. They suggest that there are all sorts of quality, governance, and accuracy issues and that in reality, a data swamp is just a data lake filled with dirty data as a result of poor data curation process. With appropriate data curation, requiring both IT maturity and data governance in place, data swamps shouldn't happen.\nCIOs claim that data swamps can also be avoided with proper analysis and that to get value out of the data lake and big data, you need to continue to do \u201cdata management 101.\u201d Several CIOs suggest master data management and stewardship is required, with both IT and the business understanding that real-world data is \u2018dirty\u2019 at the start and that openness about this is the beginning of the \u2018cleansing\u2019 process. CIOs assert that everyone should be aware that it takes a lot of work to get proper results.\n5. Manage data access and security\nCIOs worry about the \u2018putting all your eggs in one basket\u2019 effect. They stress the importance of establishing data security and privacy from the start of a data lake project. This is an important point because most CIOs see most big data and data lake projects as still largely experimental. This should come as no surprise considering that only 27 percent of business executives say their big data projects have achieved profitability(The Big Data Payoff, Capgemini IDG. 2016).\nRegardless, data lakes have already become targets for hackers or improper internal access. This means data security governance needs to be done sooner rather than later. CIOs suggest a big challenge with data lakes is in finding the right tools to provide necessary protection of sensitive data while maintaining the appropriate access to become insights-driven.\nTake these 5 steps to avoid the data swamp\nMore and more organization want to become insights-driven to stay relevant and a data lake can be an element of this but it takes real discipline to succeed here. I strongly agree with Olavsrud\u2019s suggestions but I advocate that there are more steps needed to avoid a data swamp because clearly, creating a data lake should not add business risk.