Carbon Signal’s analysis has two key steps: baseline generation and intervention analysis.
Baseline generation
In the baseline generation step, Carbon Signal identifies combinations of building characteristics that explain observed patterns in monthly energy use, or - when no energy use data is available - characteristics that are likely to represent the building, given other known data points such as building type and location.
Intervention analysis
Once the baseline is established, interventions are analyzed by adjusting specific model characteristics and comparing the resulting energy use to the baseline.
Ensemble modeling
During both steps, we use ensemble modeling, a statistical technique that combines multiple models to produce more robust predictions by accounting for uncertainty. When we refer to the “the model,” it means the entire ensemble, which is composed of many individual models. Understanding this technique is key to interpreting information on Carbon Signal.
A simple example can help illustrate how ensemble modeling is used and how to interpret results: Imagine a building in a cold climate with high wintertime energy use. Using thermodynamic equations, you could calculate the exact wall R-value if you knew the values for all other variables that affect heat transfer – everything from infiltration rates to heating system efficiency. However, since we don’t know any of those other values, the best we can do is predict a likely range for the wall R value. To capture this uncertainty, we use a collection of energy models – an ensemble – where each ensemble member carries a slightly different set of assumptions but results in the same pattern of energy use. Within the ensemble, there might be a member with R-5 walls and a member with R-15 walls. When we evaluate an intervention, such as adding insulation, we compute the effect on each member of the ensemble. Since there are a range of baseline values, the impact of these changes will vary across the ensemble, producing a range of savings estimates.
Statistical interpretation
When we show statistics on Carbon Signal, we typically show the median value along with the 5th and 95th percentiles to represent the range of values in the ensemble.
Means vs medians
The mean value represents the average, calculated by adding all the numbers and dividing by the total count. The median is the middle value in an ordered dataset. It divides the data into two halves, with 50% of the values falling below and 50% above it. Medians are less sensitive to outliers and can give a better sense of the “typical” value in skewed distributions.
It’s important to understand that these statistics summarize the distribution of individual metrics across the ensemble. As a result, some numbers may not add up in an intuitive way. For instance, the median percentage for heating and the median percentage for cooling might come from different ensemble members, so adding them may not equal the median of their combined total. Similarly, for an intervention that reduces both electricity and natural gas use, the sum of the median savings for each utility may differ from the median of the total savings.