Helios - Luke Plewa - September 2014

The Helios app is an energy consumption predictor. It takes the weather data (historical or forecasted) from a building's location and maps it to the energy consumption data of a specific device or a whole building. These predicted values can be used to support scenario planning for the end user's HVAC. This would allow them to determine their energy savings based on thier HVAC usage. An example of what this could look like is mocked up here, where the temperature set points of an A/C unit could help a user predict their daily energy consumption compared to the forecasted weather:

Mock Up

The weather data used includes temperatures, dew points, humidities, wind speeds, and pressures. Pressures and humidities may be taken out in the future as it has been observed that they have less effect compared to the other three features. This weather data is collected on an hourly scale.

The energy data was collected in the winter and spring of 2012 of a major building in San Francisco, California. Major devices include chiller, chiller pump, hot water pump, and MCC. This example demonstrates the chiller device as it has the highest energy consumption.

The Solution

Helios uses a Support Vector Machine for Regression. If you want to read up on SVRs, here are two useful resources:

SVR Basics

SVR Tutorial

The parameters C and Gamma are optimized using basic grid search. Values of C=1000 and gamma=0.0001 yielded the lowest training and testing errors.

The values used (energy data and weather features) are normalized from 0 to 1. In the following graphs, the predicted energy data is changed back to its original scale (real kW values).

Each hour is treated as a separate data point. Because of this, there is no differentiation between in-office hours and out-of-office hours. If such a differentiation were to be made, the prediction results would be greatly improved.

Training Error

The training error is the error computed on the set that was used to train. Because of this, it has the highest expectation of accuracy.

This example displays both the Mean Square Error (the average of all [actual-expected]^2) and the Root Mean Square Error (the square root of the MSE).

In [5]:
training_errors("CH")
training_errors("HWP")
training_errors("CP")
training_errors("MCC/CT")
Device: CH
Number of samples: 4080.0 (Days: 170.0)
Training MSE: 0.0264665646589 (Accuracy: 0.973533435341)
Training RMSE: 0.162685477714 (Accuracy: 0.837314522286)

Device: HWP
Number of samples: 4080.0 (Days: 170.0)
Training MSE: 0.0137543065512 (Accuracy: 0.986245693449)
Training RMSE: 0.117278755754 (Accuracy: 0.882721244246)

Device: CP
Number of samples: 4080.0 (Days: 170.0)
Training MSE: 0.0567546019869 (Accuracy: 0.943245398013)
Training RMSE: 0.238232243802 (Accuracy: 0.761767756198)

Device: MCC/CT
Number of samples: 4080.0 (Days: 170.0)
Training MSE: 0.0255032435954 (Accuracy: 0.974496756405)
Training RMSE: 0.159697349995 (Accuracy: 0.840302650005)

Testing Error

Testing error is a measure of the accuracy of a learner based on data that was not used in the learning process. This example always pulls the latest data as the testing sample, as this will best reflect what occurs in the production environment. Testing sizes are grabbed in weekly chunks. The testing error fluctuates from 16.8% to 19.4%, which is expectedly higher than the training error.

One obvious flaw in doing testing error this way is that the training data is all from colder months (February to May), while the testing data is from June and July. Ideally, testing error would be on the same month or cover multiple months (i.e. testing on the summer to fall transition).

In [5]:
testing_error(7, "CH")
testing_error(14, "CH")
testing_error(21, "CH")
testing_error(28, "CH")
testing_error(35, "CH")
testing_error(49, "CH")
Device CH
Number of samples: 168.0 (Days: 7.0)
Training MSE: 0.0265107308732 (Accuracy: 0.973489269127)
Training RMSE: 0.16282116224 (Accuracy: 0.83717883776)

Device CH
Number of samples: 336.0 (Days: 14.0)
Training MSE: 0.0352307329427 (Accuracy: 0.964769267057)
Training RMSE: 0.187698516091 (Accuracy: 0.812301483909)

Device CH
Number of samples: 504.0 (Days: 21.0)
Training MSE: 0.0277024741731 (Accuracy: 0.972297525827)
Training RMSE: 0.166440602538 (Accuracy: 0.833559397462)

Device CH
Number of samples: 672.0 (Days: 28.0)
Training MSE: 0.0295021754711 (Accuracy: 0.970497824529)
Training RMSE: 0.171761973298 (Accuracy: 0.828238026702)

Device CH
Number of samples: 840.0 (Days: 35.0)
Training MSE: 0.0328648939104 (Accuracy: 0.96713510609)
Training RMSE: 0.181286772574 (Accuracy: 0.818713227426)

Device CH
Number of samples: 1176.0 (Days: 49.0)
Training MSE: 0.0302800468816 (Accuracy: 0.969719953118)
Training RMSE: 0.174011628582 (Accuracy: 0.825988371418)

In [6]:
testing_error(28, "HWP")
testing_error(28, "CP")
testing_error(28, "MCC/CT")
Device HWP
Number of samples: 672.0 (Days: 28.0)
Training MSE: 0.00649274057851 (Accuracy: 0.993507259421)
Training RMSE: 0.0805775438848 (Accuracy: 0.919422456115)

Device CP
Number of samples: 672.0 (Days: 28.0)
Training MSE: 0.0802685627252 (Accuracy: 0.919731437275)
Training RMSE: 0.283317071009 (Accuracy: 0.716682928991)

Device MCC/CT
Number of samples: 672.0 (Days: 28.0)
Training MSE: 0.041280778547 (Accuracy: 0.958719221453)
Training RMSE: 0.203176717532 (Accuracy: 0.796823282468)

Future Work

Moving ahead, there are many next steps.

  1. One avenue would include indoor temperature in the features, as it is necessary for controlling set temperatures points for facilities managers to do scenario planning on their buildings.

  2. Another option is to display the predicted values as-is, and allow facilities managers to use the graphs as they see fit.

  3. Gamification could be integrated into the app, similar to how the Nest rewards leafs at the end of every month to its users. If a user frequently uses less energy than was predicted, then they will get some sort of award.

  4. Currently, work hours and weekends (out of office vs. in office) data is treated the same. Integrating some sort of way to tell the difference between the two could greatly increase accuracy.

  5. Notifications or alerts for grossly unexpected usages. If an A/C system is overdrawing or underdrawing a facilities manager probably wants to know!

Related Works

High Level Implementation

Provided below is a loose overview of the implementation. I apologize for the slopiness. A lot of it is specific to this demonstration and is not very descriptive.

In [2]:
%matplotlib inline
from helios.models import Weather, Forecaster, grab_moscone_data, collect_weathers, normalize_array
import matplotlib.pyplot as plt
import helios.tests.weather_fixtures
from core.models import Building, Organization
from datetime import datetime, timedelta
import pytz
import numpy
import warnings
warnings.filterwarnings('ignore')
import math
In [3]:
def training_errors(key):
    # first day = 2/2
    # last day = 7/20
    summed_error = 0.0
    num_samples = 0.0
    day_samples = 170

    # Collect data and train SVR
    forecaster = helios.models.Forecaster()
    forecaster.c_param = 1000.0
    forecaster.gamma_param = 0.0001
    energy_data = grab_moscone_data()[key]
    max_energy = max(energy_data)
    energy_data = normalize_array(energy_data, min(energy_data), max(energy_data))
    svr = forecaster.train(collect_weathers('CA/San_Francisco'), energy_data)

    # Run predictions
    for date in range(day_samples):
        day = datetime(2012,2,2) + timedelta(days=date)
        cast = forecaster.predict(svr, Weather.objects.get(day=day, location='CA/San_Francisco'))
        actual = energy_data[(24*date):(24*(date+1))]

        display = []
        for value in cast:
            display.append(value * max_energy)
        plt.plot(display)

        for index in range(24):
            summed_error += (actual[index] - cast[index]) ** 2
            num_samples += 1

    plt.show()

    print "Device: " +  key
    print "Number of samples: " + str(num_samples) + " (Days: " + str(num_samples / 24) + ")"
    print "Training MSE: " + str(summed_error / num_samples) + " (Accuracy: " + str(1 - summed_error / num_samples) + ")"
    print "Training RMSE: " + str(math.sqrt(summed_error / num_samples)) + \
        " (Accuracy: " + str(1 - math.sqrt(summed_error / num_samples)) + ")"
In [4]:
def testing_error(test_size, key):
    # first day = 2/2
    # last day = 7/20
    summed_error = 0.0
    num_samples = 0.0
    day_offset = 28
    
    # Build forecaster object
    forecaster = helios.models.Forecaster()
    forecaster.c_param = 1000.0
    forecaster.gamma_param = 0.0001
    
    # Separate training and testing data
    energy_data = grab_moscone_data()[key][:-day_offset * 24]
    max_energy = max(energy_data)
    energy_data = normalize_array(energy_data, min(energy_data), max_energy)
    training_energy_data = energy_data[:-test_size * 24]
    testing_energy_data = energy_data[-test_size * 24:]
    
    # Grab training weathers
    weathers = collect_weathers('CA/San_Francisco')[:-day_offset]
    
    # Train
    svr = forecaster.train(weathers[:-test_size], training_energy_data)
    day = datetime(2012,7,20) - timedelta(days=test_size + 1) - timedelta(days=day_offset)

    # Predict and compare against actual energy values
    for date in range(test_size):
        day += timedelta(days=1)
        cast = forecaster.predict(svr, Weather.objects.get(day=day, location='CA/San_Francisco'))
        actual = testing_energy_data[(24*date):(24*(date+1))]

        display = []
        for value in cast:
            display.append(value * max_energy)
        plt.plot(display)

        for index in range(24):
            error = (actual[index] - cast[index]) ** 2
            summed_error += error
            num_samples += 1

    plt.show()

    print "Device " + key
    print "Number of samples: " + str(num_samples) + " (Days: " + str(num_samples / 24) + ")"
    print "Training MSE: " + str(summed_error / num_samples) + " (Accuracy: " + str(1 - summed_error / num_samples) + ")"
    print "Training RMSE: " + str(math.sqrt(summed_error / num_samples)) + \
        " (Accuracy: " + str(1 - math.sqrt(summed_error / num_samples)) + ")"