Anomaly detection with IoT dataOne of the goals of smart buildings is to use real feedback data in order to improve the well-being of the occupants, to optimize building costs, and to protect the environment.
ChallengeHow should you use IoT data from electricity sensors to be able to build an anomaly detection model for electricity consumption? This AI model should inform building managers whether there are any anomalies regarding electricity consumption; for example, if there are any peaks of electricity consumption that should not have happened, considering the normal building behavior.
The challenge is a classic time-series problem with the goal of forecasting the electricity consumption and comparing the predicted values with the actual values, thus identifying anomaly points.
General steps for tackling the problem:
- Exploratory Data Analysis
- Pre-processing data
- Run experiments by testing models and hyperparameters
- Choose model based on performance metrics
- Identify anomaly points and visualize results
Exploratory Data AnalysisAvailable data: time-series data with electricity consumption values of a building coming from one sensor. Understanding the data is one of the most important steps when developing models. This is done by Exploratory Data Analysis where we visualize the data, compute traditional summaries (mean, standard deviation) and try to come up with more features which could explain the target variable, electricity consumption in this case.
One example of an extra feature is weather information. For the use of weather info in the Netherlands, one option is to use the python package KNMI from where one could extract features such as: temperature, temperature at dew point, precipitations which can be used in the model.
- Cleaning the data: remove NaN values, resample, remove outliers
- Splitting the data into train (0.7) and test (0.3) subsets. Training data is used to train the model; the test subset is used to make prediction on new data; this way, we can compute accuracy metrics to evaluate the performance of the model
- Normalizing the data: for improving the numerical stability of the model and speeding up the training process. We used MinMaxScaler from the sklearn package. The scaler is trained on the training set and, after the model is built, test data will be scaled before feeding it to the model for new predictions.
Experimenting and choosing the forecasting model
One of the recommended models for forecasting time-series data is Long Short-Term Memory, LSTM. We have built the architecture of the LSTM model and created experiments where we tuned hyperparameters like epochs, timesteps, dropout factor that we were able to compare them in Azure Machine Learning Studio. Based on the metric Root-Mean-Square-Error we chose the best architecture and hyperparameters.
Identify anomaly points and visualize results
Final step is to identify anomaly points by comparing predicted and actual values of electricity consumption. Using business rules and formulas, such as rolling average and standard deviation, we can identify anomaly points.
For being able to use the model, one needs to deploy it as a web service (API) which can be then called as follows: feeding new data as input to the API and receiving as output the forecasted values. For our business case, we used the web service in Microsoft PowerBI in order to make predictions on real time data available in PowerBI. One can create insightful dashboards visualizing electricity consumption forecasts, the anomaly points, together with graphs of IoT sensors such as water consumption, comfort information (temperature, humidity, people count), such that in the end, the building managers have a general overview of the conditions of the building.
For experiments and registering the model on the cloud we used Azure Machine Learning Studio. Deployment of the model was done using Azure Container Instances (ACI).