One of the major issues we encounter with many facilities looking to reduce their electricity or natural gas consumption is the lack of available data to dig deeper. Luckily, there are some ways around that, but they’re going to take some effort on your part. Let’s explain how it works and get you started, then we can help you out if you get stuck.
Imagine you have over one million square feet of floor space within a building with no data except the total KWH used each month. What would you do in this situation? Here’s what we did…
One approach we have used that has had lots of success is called regression analysis. If you don’t know what this is, or maybe you recall something about it in your Statistics class, you can watch an hour presentation that goes through the basics, but you don’t need to watch it right now in order to read the rest of the article below.
Basically, we want to create a model of how much electricity we have been using in the past, so we can use that information to predict how much we will use in the future.
Let’s go grab the utility bills, and get it into a spreadsheet, so we can analyze it. All you need is the billing month and the total KWH consumed within each month.
Let’s create a chart in Excel, to see the pattern of usage over time. Try and gather as much history as possible, but start from the most recent years first, and work backwards, in case you run out of time, or lose confidence in the data. The more data you collect, the longer it takes, but the more accurate our model will be. Graphing this data will help us identify major changes in the usage, so we can decide how much of the past data we want to include (only include it if it is representative of what we expect in the future).
As you can see, there is a fairly consistent pattern to the usage, so we will use all of this data in our model. The next step is to identify variables or factors that cause the usage to change each month. Since the highest points occur in the summer, and the lowest points occur in the winter, the most obvious factor is the weather outside. So let’s gather the outside temperature during the billing periods to see how much of a predictor of electricity usage it is. You can find historical weather data from the NOAA website.
The regression equation is calculated from Minitab statistical software, but you can use other programs and get similar results:
Predictor Coef SE Coef T P
Constant 680862 33686 20.21 0.000
Avg High Temp 7187.0 548.8 13.10 0.000
S = 85040.6 RSq = 74.7% RSq(adj) = 74.3%
If you’re a little rusty on your statistics, let me explain what these mean…
The only variable we have included is Avg High Temp (you can probably use Max or Min temp as well, and get similar results). The last column is the pvalue, which is the probability that the variable has no influence on the electricity consumption (our output). If that number is less than 0.05 (which it is), that means the variable is statistically important. Therefore, it is a good predictor for electricity usage (not a surprise, based on the graph). The other important number to look at is the Rsq (adj), which is short for Rsquared adjusted, and that tells us how much of the change in KWH usage is based upon the variation in temperature. We would like that number to be above 70%, which we are, with 74.3%. This is pretty good, since we are looking at only one variable so far. Therefore, we can take the recommended equation, and use it to predict the electricity usage in the future.
KWH = 680862 + 7187*Avg High Temp
So we have a base consumption of 680,000 KWH, and for every degree increase in the month temperature, we will consume 7,187 additional KWH. So all we need to know is what the temperature will be (and assume the facility doesn’t significantly change its function or behavior in the future), and we can predict the electricity consumption farily accurately. Pretty cool! The good thing is that we CAN predict the future monthly temperature based on past history (not factoring in global warming!).
Let’s see what our model looks like graphically:
The red line is our equation, and the blue line is our actual, so look how closely that matches! The last red points are the prediction for next year, based on the model and estimates of future temperature data. We can get into more details about regression later on, when looking at your specific data (analyzing residuals, verifying assumptions, etc), but this should give you a good start.
We can also go further, and make our model even more accurate (higher Rsq adjusted value) by adding more variables to our dataset.
Let’s rerun the analysis, to see if any of these variables improve our equation.
Predictor Coef SE Coef T P
Constant 47240 456842 0.10 0.918
Avg High Temp 6281.3 547.5 11.47 0.000
Number of Production Days 22563 5759 3.92 0.000
Total Units Produced 14.61 33.39 0.44 0.663
Avg Number of Employees Working 124.7 165.0 0.76 0.453
S = 76423.5 RSq = 80.6% RSq(adj) = 79.2%
Our new analysis shows a higher Rsquared adjusted value of 79.2%, and one of the new factors was also significant (pvalue less than 0.05), which was “Number of Production Days”. This is a manufacturing facility, so that makes sense that during billing periods, where there are more weekends and holidays, the electricity consumption would change. The other variables are not significant, so we can remove them from the equation, to keep things simple. Let’s chart again with our new equation. We also need to predict the future on the number of production days, which should be easy to do by looking at a calendar and counting up the likely production days.
Our new equation is KWH = 260,892 + 6247*Avg High Temp + 23015*Number of Production Days
The difference isn’t real obvious, because we’ve only gone up a few percentage points, but hopefully you get the idea. Some facilities or buildings we have analyzed only need the outside temperature data, others get close with three or four other factors, while others are so complicated, we were not able to find or gather the right data to accurately predict our usage in the future.
So what do we do with this equation above, which had “Average Outside Temperature” and “Number of Production Days” as significant factors? Let’s first look at the equation in more detail, and figure out the priority of each factor.
If we create new columns in our spreadsheet, each with the coefficient of each factor from the equation that was significant, we can determine how much contribution each factor made to the overall electricity consumption. For example, we create a column called “Temp” which is calculated as 6247 times the Average High Temp for each month. In the table below, the grey area is the original data, and the yellow area is the prediction of usage for each variable in our equation.
Baseload is the Constant from our model, and it stays the same each month. We do the same thing for “Working Days” as we did for “Temp”, except we use the multiplier of 23,015 times the number of working days per month.
Next, we sum up the entire KWH consumption for each column in yellow, so we can determine how many KWH we predict was consumed from each factor, over the entire data set (time period).
Baseload = 15,653,520
Temp = 21,750,805
Working Days = 28,469,555
We create a pie chart, to see which one is the largest, to help us drive priority in where we focus our improvements.
Working Days came out with the largest influence. That is not a surprise. We would expect that the days the facility is running and people are working, the most amount of electricity is being consumed. We obviously want to be busy and have productive days each month, but again the question we need to ask is: what happens to the building during the working days? People are inside, and they produce body heat that has to be cooled, but they also use equipment and lights. Therefore, we could look at how the equipment is used during a work shift, how things are shut down, whether lights are on when not needed, and other forms of wasted electricity. Sounds like a perfect opportunity for an Energy Go and See during working hours!
Let’s assume we don’t find much opportunity during the working days. Outside temperature had the next biggest influence on our usage, so that should be the next focus area we investigate. We obviously cannot change the temperature outside, however, what happens to the facility or building when the temperature changes? For this building, their cooling was powered by electricity (chillers), so those would kick in during the summer. So our results tell us to focus our efforts on our chiller system and the equipment that supports the chillers (fans and vents). Therefore, we should make sure it is optimized, make sure we are cooling the correct areas at the right time of day, ensure we are maintaining the correct temperatures across the building, verify the equipment is working properly and being maintained properly, and that we aren’t wasting chilled air in other parts of the building (door and window leaks).
We’ve given you a lot of information and charts to digest, but if you are lost or confused, don’t worry, we will help you out.
Once you have gathered some data, and would like help analyzing it, please contact us and we’ll run a regression analysis on your data for free, or show you how to use Minitab to do it yourself.
Before you contact us, your Excel spreadsheet should look something similar to the following:
Here are some common data points you should consider gathering by month (geared for electricity, will need other factors for natural gas or water or landfill):

 Month/Year of billing period
 KWH consumption during billing period
 Number of days in billing period
 Number of employees working during billing period (number or total hours worked)
 Outside temperature (Avg, High or Low) for the billing period
 Number of working days per billing period (remove weekends and holidays)
 Measure of facility productivity (products, items processed, shipped items, overtime, etc) during billing period
 Number of extra shifts during billing period
 Any other factor that could cause electricity usage to fluctuate
So after the analysis, if nothing stands out as significant in predicting usage, we may have to consider other options, such as Energy Go and See, Energy audits, installing automated monitoring equipment, or manual data collection. We will cover these topics in the future.
So what did this facility do, based on the analysis? They focused on the temperature impact of the usage, and looked into the chiller system, and implemented an air handler unit (AHU) setback program across the facility. This automatically shut off the AHU systems in areas that were not occupied after hours. The results of the improvement reduced electricity costs by $300,000 annually. The data analysis made it easy to gain support for the effort from leadership, and provide potential savings calculations. This was helpful when potential problems came up, as they continued to work through them, knowing that there was big savings to be had.
The downside is that now we need to create a new equation after the AHU setback was implemented, because our past performance is no longer a valid prediction of our usage going forward. But the best part is we can get ACTUAL savings by looking at how much less our electricity usage compares to our past predictions from the equation.