A look at why and how enterprises should invest in F.A.S.T analytics. Credit: Thinkstock Not all data is created equal. One of the first questions faced by enterprises while deciding their big data strategy is how and where will they store this data. The rush to this decision often underscores the need to understand what should be stored or thrown away and how will the data be processed to generate value. There is nothing wrong with casting a big net to instrument as much as you can and capture as much as you can. However, the danger of collecting too much data is that it makes it harder for the relevant data to be discovered, used and processed. More data means more storage, more processing and more noise that needs to be dealt with by the user wanting to analyze this data. Big data that remains big is a problem; and a growing problem for most enterprises. A guaranteed requirement towards a successful, value oriented big data strategy is the ability to quickly reduce big data down to small, more meaningful data. This is easier said than done and is quite often overlooked or ignored because it requires a blend of strategic and analytical thinking to be applied upfront. However, there are certain key strategies that can help enterprises get to this point. These strategies are what I term as F.A.S.T for Filter, Aggregate, Sample and Transform. F is for Filter Filtering is a capability that enables a divide and conquer approach to data organization and management. By taking raw or enriched data and dividing it up based on logical groupings such as entities or events (entities can be things, people, places, organizations that participate in events) or by where it was generated or when it was generated or the problem it will be used to solve, large data sets can be reduced down to more manageable sets. Similarly, different use cases that require multiple data sets to be merged and enriched can benefit from separate storage of their relevant data. Use case based storage enables the selection of the optimal technology for storage and processing given the unique needs of the use case (for example, interactive and ad hoc analysis vs. batch reporting). Filtering also enables segmentation of data into data sets that a unique set of data consumers care about. For example, sales rep looking into the sale of a particular product through iPhones in Europe should not need to analyze data for sales in the US unless the analysis requires comparisons or stack ranking. Segment based organization of data enables quick discovery and analysis of data relevant to the consumer’s need. A is for Aggregation Aggregations enable compaction of a lot of data into smaller sets by reducing the fidelity of data. A strong strategy for aggregations is the top down analysis of what decisions will the analytics enable. For example, if the actions or decisions are required to be made every hour or every day, multiple events that arrive every second can be aggregated to per hour or per day levels. This ensures that a consumer looking to interact with this data does not have process all events every time and have the option to look at pre-aggregated data that represents the fidelity required. Another mechanism for aggregation is to aggregate metrics or KPIs of interest over the dimensions of analysis. For example, if events arriving into the system represent user demand and analysis that compares the demand signals between users belonging to different age groups, these events can be aggregated by the age groups such as 0 – 18 years, 18 – 40 years, 40 – 60 years and 60 and above. Aggregations based on event attributes like the above can be over a combination of several attribute-value combination such as user age and gender or user age, gender and location enabling ready to use data at the point of analysis and decision making. S is for Sampling Sampling is another mechanism that enables an iterative analysis over large data sets ensuring the users are able to quickly identify relevant data sets and progressively analyze larger versions of the data. This ensures that irrelevant data or analysis can be easily discarded and time is not spent analyzing the entire data set during the experimentation phase of analytics design. The use of incrementally larger samples while designing a technique or algorithm for data processing can save valuable time and enable the data scientist to fail fast on not so promising techniques. T is for Transform Transformation of data is the process through which new attributes and new records can be added or removed from a data set. new attributes could be generated by applying a mapping function on the one or more existing attributes or gathered by merging the data with another data set. New records can be generated by combining two data sets with the same schema. Transformed data is usually the data that powers dashboarding and reporting of insights represented by analytics, metrics and KPIs. The faster data can be transformed into its final state, the sooner can these updated insights be delivered to the consumers of insights. Needle in the haystack Converging big data into small, manageable data sets that directly lead to the generation of relevant insights is akin to finding the needle in a haystack. This divide and conquer technique makes data governance an easier problem and when/if carefully controlled and conducted, this technique can dramatically increase the speed to insights and value. Related content opinion Key cloud trends for 2018 Some realizations that enterprises are likely to have about how the cloud impacts their business. By Kumar Srivastava Jan 23, 2018 4 mins Technology Industry Cloud Computing opinion Planning for disaster recovery Let's face it: downtimes are not only frequent, but expected. What is your company doing to to ensure speedy recovery and restoration of service when the inevitable occurs? By Kumar Srivastava Jan 04, 2018 5 mins Backup and Recovery Disaster Recovery IT Strategy opinion Planning for disaster recovery How do leaders of enterprises plan for outages to minimize the impact on the users of all the individual service providers running their services on the enterprises' platforms? By Kumar Srivastava Jan 03, 2018 5 mins Disaster Recovery IT Strategy IT Leadership opinion Nobody likes apps that crash Why developers should pay attention to their crash reports. By Kumar Srivastava Dec 27, 2017 5 mins Application Performance Management Developer Technology Industry Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe