When popular trends emerge, be it in the business world or elsewhere, it seems inevitable that along with said trend comes a vernacular of its own, language that is best understood by those who have the closest association. Such is the case in the world of big data as well. As the popularity of data analytics has proliferated across the business world, so too has the language spoken by the practitioners. So for those somewhat new to the game or simply looking to bone up on the latest terms and slang, let us identify and attempt to explain the latest and greatest in Big Data buzzwords:\nBig data\nFirst, let\u2019s analyze the popular (or notorious) term itself, \u201cbig data.\u201d What does it really mean? There isn\u2019t a clear definition, but overall the term is usually used to refer to the great amount of data that organizations have access to and that has the potential to be put to \u201cgood\u201d use. It is often used interchangeably with data analytics or predictive modeling approaches. If you really want to be specific, try this definition around three \u201cV\u2019s\u201d that I like (probably coined by Gartner):\n\nVolume: Large datasets (think Terabytes or more)\nVelocity: Data that needs to be ingested quickly and in some cases, acted upon quickly as well (think data being sent by Telematics device on a car)\nVariety: Data that is different in formats, sources, etc \u2013 audio files, database records, social media comments, etc\n\nNote that there are multiple other versions, including one with four, five,\u2026even seven \u201cv\u2019s\u201d \u2013 it looks like the number of \u201cv\u2019s\u201d to describe Big Data is rapidly beating the number of blades that guys need to get that perfect shave.\nBusiness Intelligence (BI)\nBroadly speaking, BI refers to the approaches, tools, mechanisms that organizations can use to keep a finger on the pulse of their businesses. Also referred by unsexy versions -- \u201cdashboarding\u201d, \u201cMIS\u201d or \u201creporting.\u201d A development in this space is that today\u2019s BI tools also allow a non-technical person to segment reports by different groups, perform basic levels of root-cause-analysis, and develop customized charts on the fly.\nData visualization\nRefers to the approaches and tools used to visually understand the insights from data as well as all of its interconnections. There is usually a gold mine of insights hidden in vast volumes of data \u2013 the art and science of data visualization involves converting this data into a visual through which the insight leaps out. While there are plenty of tools available, there is a certain art to selecting the most appropriate visual to convey an insight or prove\/disprove a hypothesis.\nPredictive analytics\nRefers to using statistical modeling techniques to predict outcomes. Think of developing an algorithm that predicts which borrower is going to pay back a loan or not, or the likelihood of a newsletter subscriber to open an email. Techniques involve regression models, decision trees, neural networks, and other methods. Done right, predictive analytics can lead to very smart strategic choices for a business. But be cautious here \u2013 you are implying that you can predict the future.\nDescriptive analytics\nThis concept is defined as using data to explain what has already happened (compare with predictive analytics). How were sales this month, split by categories, regions, store segments, etc? How does loan performance compare by cohorts, over time, by product, etc? Most descriptive analytics tend to involve (relatively) simpler analytics, but some sophisticated approaches also go in here (e.g. clustering).\nReal-time analytics\nAs the name suggests, this phrase refers to the approaches to use data and analytics in \u201creal time.\u201d This might refer to the ability of an organization to pre-process data in real time and only use processed data going forward, or the ability to continuously predict outcomes in real time as the input data and context changes (e.g. predicting weather), or even the ability of users to perform BI and other analytical tasks on data that is coming in real time.\nHadoop\nWhether deserved or not, this is the word that gets talked about the most when conversations about big data come up. Hadoop refers to a way to store large volumes of data using off-the-shelf hardware products. Key features include: scalability (keep adding new hardware and the system can keep taking more data), redundancy (the storage model can deal with hardware breakdown), open source and hence free pricing, and the ability to ingest data in different formats (video, traditional data tables, Facebook comments, etc.). Given how commonly \u201cHadoop\u201d and \u201cbig data\u201d are spoken together, it is worthwhile noting that Hadoop is not an essential ingredient of a big data strategy. In fact, in many use cases of big data, a Hadoop based architecture may not make sense, just like it would make sense in other use cases. So do not equate analytics or big data with Hadoop.\nUnstructured data\nThis is an imprecise term but it broadly refers to data that can\u2019t be fit into the usual data model of rows and columns. Think of a collection of video files or text documents or weblog data or email content. Most of this data won\u2019t neatly fit into the construct of usual data warehouses, although there might be very relevant insights and patterns hiding in such data. Most unstructured data needs to be converted into some structure to unlock such insights.\nMachine Learning\nThis is a broad topic that covers approaches that allow using a machine to help discover insights and linkages, make predictions and recommend decisions. Predictive Modeling is a subset of machine learning and so is clustering & segmentation. Whenever I hear someone mention \u201cmachine learning\u201d, I always ask them to be more specific -- are they referring to predictive vs. descriptive analytics, are they referring to supervised or unsupervised learning, etc. By itself, machine learning is too broad a term to be helpful in discussing capabilities or discussing an approach to solving a problem.\n[Other notable mentions that nearly made it to the above list: R, Python, NoSQL, and Neural Networks]\nSo there you have it. It seems there\u2019s no shortage of chatter related to big data these days. Hopefully, these basic definitions will help any big data newcomer better understand what everyone is talking about.