6 Steps To Clean Up Your Data Swamp

In Big Data by Daniel NewmanLeave a Comment

If you’re one of the millions of companies worldwide gathering data like a chipmunk gathers nuts, you’re not alone. Big data holds big promise for nearly every industry. And yet, many companies are currently finding themselves with huge pools of data that provide very little actual value. While the term “data lake” conjures up images of beautiful, pristine, life-affirming waters, many lakes have come to resemble swamps: Muddy, murky, and filled with frightening surprises.

If you’re living the swamp life, please know: All is not lost. New technology—and the right mindset—can help you and your company get back on track. The following are a few tips for cleaning up your data swamp to ensure smooth sailing moving forward.

  1. Define your data goal. You know what they say: if you don’t know where you’re going, you’ll probably end up somewhere else. One of the most important steps in preventing a data swamp is to establish clear boundaries for the types of information you are trying to gather, and what you want it to do for you. Just because it’s possible to gather 1,000 fields of information about your customer, that doesn’t mean you should do it. It is far too easy to be stuck in a quagmire of data overwhelm. Don’t be a data hoarder. Instead, take time to clearly define what you want to gain from the data you’re gathering. It will go a long way in helping you keep your data manageable and clean.
  2. Define ownership. Just as it’s easy to get overwhelmed with the vast amounts of data your company is collecting, it’s equally easy to fall behind on managing it. Because we’re still in the early phases of Big Data and analytics, many companies still haven’t figured out who is charged with manning their data pools. Many rely on their marketing departments to sort through mounts of reporting, while others source it out to their IT teams (and some haven’t established any manager at all). Whatever avenue your company takes, be sure to be consistent. An unmanned lake will become a swamp quite quickly.
  3. Make it searchable. The vast amounts of data being generated on a daily basis have far outpaced a human’s ability to sort and manage them. You’ll need to be diligent about assigning metadata to your information to make it useful and readable to your human staff. Yes, it seems redundant to create data describing your data, but metadata is what actually brings your data to life.
  4. Automate when possible. Technology is here. Use it! Without advancements like cognitive analytics, language processing, artificial intelligence, and machine learning, it will be difficult to properly make your data work for you. These technologies will sort through mounds of data to create useful and accurate hypotheses where humans, quite simply, cant. These technologies are designed to work together. Do not be afraid to use them. You’ll need a tech-friendly mindset if you want to keep your data lake flowing.
  5. Keep it clean. Clean data is essential to your team’s confidence in the data process. Once your team loses faith in your data, it will be hard to get them back on track. Whether it’s messy, inaccurate, or poorly designed, your dirty data is an Achilles Heel you can’t afford to keep. Once you’ve cleaned your data swamp, you must be committed to keeping it that way. Establish clear guidelines on where and how data is collected to prevent “data wildness,” and make sure those standards are consistently honored. Take time to vet sources as “trustworthy,” and take preemptive steps to ensure it stays that way. A little work on the front end will prove hugely valuable when it comes to putting your data to use.
  6. Get help. Lastly, don’t be afraid to reach for a life ring if you’re drowning in a sea of data. There are plenty of specialty firms and technology available to help you streamline your process. They can create the right algorithms, workarounds, and back-fills to help your business flourish, no matter how swampy your lake may have gotten in the past.

At the end of the day, know this: You don’t have to be an expert to benefit from Big Data. You just need to know when you’re out of your depths. Big Data’s promise lies as much in its ease of use and efficiency as it does in its predictive outcomes. The more tend to it, the more it will help you.

Additional Resources on This Topic
Big Data Integration Steps to Avoid
Big Data’s Big Problem
Every Business Has Access to the Data They Need—But Do They Use it?

This post was first published on Forbes. 

Daniel Newman is the Principal Analyst of Futurum Research and the CEO of Broadsuite Media Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise. From Big Data to IoT to Cloud Computing, Newman makes the connections between business, people and tech that are required for companies to benefit most from their technology projects, which leads to his ideas regularly being cited in CIO.Com, CIO Review and hundreds of other sites across the world. A 5x Best Selling Author including his most recent “Building Dragons: Digital Transformation in the Experience Economy,” Daniel is also a Forbes, Entrepreneur and Huffington Post Contributor. MBA and Graduate Adjunct Professor, Daniel Newman is a Chicago Native and his speaking takes him around the world each year as he shares his vision of the role technology will play in our future.

Leave a Comment