With the summer thunderstorm season now upon us, many Americans are checking the weather forecasts with a fresh sense of urgency, hoping for good news and spot-on accuracy that helps them avoid rained-out barbecues or unexpectedly soaked summer outings.\nOf course, forecasts sometimes get it wrong. A storm that was supposed to \u201cjust miss us\u201d instead settles in and drops two days of drenching rain \u2014 or, alternatively, a much-ballyhooed front turns out to be little more than cloud-cover that burns off by midmorning, revealing blue skies above.\nSummer storms can leave us stranded inside, staring out at the dimpled puddles illuminated by flashes of lightning and counting the seconds until the thunderclap (then dividing by five) to gauge how far away the bolt struck. Dogs whimper and hide. Camping trips are canceled, picnic baskets are unpacked. Kids fidget by the window, asking over and over again why it\u2019s raining and what they\u2019re supposed to do now, questions to which there seems to be no satisfactory answer \u2014 the very basis for The Cat in the Hat. \nNear-term predictions, shifts in the data, and the challenge of forecasting complex outcomes in physical environments \nIt was on one such stormy day in Chicago that I began thinking about the challenge of forecasting weather. I\u2019d been reading Nate Silver\u2019s The Signal and the Noise, and was reflecting on his exploration of forecasting data and the accuracy of weather forecasting models in general (nicely summed up in this post from Dr. Randy Olsen).\nThe train of thought led me to consider the parallels between predicting precipitation (or other types of weather) and healthcare predictive analytics, meaning the effort to forecast occurrences such as a missing charge, an inaccurately assigned billing code, or even more clinical data points, such as patient-risk score or likelihood of unplanned hospital readmission.\nFor, indeed, both of these challenges come down to extracting the signal from the noise \u2014 pinpointing the (most) relevant data points or combinations of data points within a rich, ever-expanding stream of data tied to an inherently physical environment. The effort is to find the story that the data is telling \u2014 and use this insight to build predictive models that help uncover when and how the story is meaningfully diverging from an already understood narrative or pattern, as opposed to when it is just an iterative but fundamentally similar version of the same old thing.\nThe parallels between atmospheric science and healthcare predictive analytics are becoming ever stronger as healthcare data expands to include electronic health records, claims, data collected by wearables, lab results, medical images and prescriptions and other pharmacy information, as well as more purely financial and lifestyle data points that are gathered and documented as the patient\/consumer moves through the healthcare continuum. This is analogous to the massive (and still growing) volume and variety of weather data collected and made available every day \u2014 everything from satellite imagery to surface weather observations \u2014 that paints an increasingly precise picture of the atmosphere and Earth\u2019s weather patterns.\nYet even with this wealth of data, the difficulty of longer-term predictions shows us something about the nature of forecasting the outcomes that occur in complex physical (in the sense of tangible or real-world) systems.\nWhy is it more difficult for meteorologists to create an accurate 10-day forecast? Because the changes and patterns that evidence themselves over Days 1-9 may very well be different than anticipated on Day 0 (when the forecast was originally created). As Day 10 draws nearer, the adjusted forecast for Day 10 likely, though not absolutely, becomes more accurate \u2014 or rather, more likely to be more accurate. The signals in the noise \u2014 that is, the meaningful data points or correlations across all the available data \u2014 are more apparent because a significant percentage of potential options are now much more obviously unlikely. Or significantly more likely. Or, in some cases, equally likely, which would mean that the original 10-day forecast was pretty spot-on.\nThis is also why, while we can know generally what to expect when phenomena such as El Ni\u00f1o occur, it is general in the truest sense. From climate.gov:\nEl Ni\u00f1o:\u00a0 A warming of the ocean surface, or above-average sea surface temperatures (SST), in the central and eastern tropical Pacific Ocean. Over Indonesia, rainfall tends to become reduced while rainfall increases over the tropical Pacific Ocean.\u00a0The low-level surface winds, which normally blow from east to west along the equator (\u201ceasterly winds\u201d), instead weaken or, in some cases, start blowing the other direction (from west to east or \u201cwesterly winds\u201d).\nSimilarly, in healthcare predictive analytics, we can observe general patterns and trends \u2014 the signals in the noise \u2014 and leverage machine-learning to map and understand how these signals should be weighted and interpreted (or, in some cases, disregarded). And these can lead us down the path to more interesting and accurate predictions.\nLet\u2019s imagine that a medical diagnosis code may or may not have been improperly assigned on a healthcare claim. We can start by knowing that, generally, the family of codes that this particular diagnosis code is related to has a slightly elevated likelihood of error simply due to the complexity of the coding and medical condition. From there, data-mining algorithms can begin to parse the additional available data. Each data source will have its own \u201ckind\u201d of noise \u2014 historical claims data will have different nonrelevant data points than EHR records and labs, and so forth \u2014 so the critical information may at first be difficult to extract or even recognize. But by grouping similar or otherwise correlated data points, we can amplify the signal in the noise. We can say, for example \u2014 given the available information contained within similar claims related to similar diagnoses when treated in similar clinical settings (for patients with similar medical histories and demographic attributes) \u2014 that the likelihood of the code being improperly assigned is in fact far higher than it appears if assessed purely on the merits of its own anecdotal occurrence. We can follow a similar line of thinking to uncover and identify missing charges, or improperly calculated patient-risk scores, or skewed cost and utilization metrics that don\u2019t fully incorporate true long-term costs and probable care regimens. We can use these techniques to predict claims denials with an astonishingly high degree of accuracy. We can amplify the signals \u2014 and filter out the noise.\nAt its core, the challenge is one of data optimization and data-usage optimization \u2014 in addition to data aggregation and normalization. By pairing high-quality data aggregation and normalization with data-optimization strategies, the signals in the noise can be made more apparent. They can be measured and modeled in comparison to one another to gauge how likely they are to have a meaningful impact on the outcome. We can see how true a predictor they actually are; how much they matter. We can tune down the noise \u2014 and tune in instead to a clear-eyed, accurate forecast.