House Digital Twin

Intro

There is the stereotype of the person who hawkishly watches for changes in the thermostat. I have to admit that I am very conscientious of what the thermostat is set at during winter, in no small part because of how drafty the windows are. One question I’ve wondered about is how much difference it really makes keeping the house just a bit warmer during the winter months. I recently installed a google Nest thermostat and analyzed the data to estimate the how much changing the thermostat a few degrees really matters.

Motivation

A little while ago I saw that for people who have a Google nest, you can export temperature data (both indoor and outdoor temperature) averaged over 15 minute windows. Shown below is a sample of the first 5 of over 2,200 rows of data for the month of November.

I often adjust the temperature down at night and increase it during the day. I am pretty energy conscious and can feel a difference between 68 degrees F and 70 degrees Fahrenheit. One thing I have wondered about is how much energy (or money) is saved by keeping the house at 68 degrees during the day versus 70. This is a difficult thing to assess as it depends not just on outdoor temperature, but things like wind, rain, and snow. That said, the largest factor that determines heating cost is likely just the temperature difference between indoors and outdoors. Since there is some other data included in the Nest output, I also included it in my heating model developed below.

Target Time spent heating
Predictors Indoor temp, Outdoor temp, Indoor humidity, Outdoor humidity

Mathematical Model

I modelled heat exchange based on a few parameters I had easy access to. The simplest heat equation would be to assume that, with no source of exogenous heat, the change in temperature of the house is proportional to the difference between the indoor and outdoor temperature.

\[\frac{dT}{dt} = k (T_{indoor} - T_{outdoor})\]

Where $k$ is commonly referred to as the thermal conductivity of the apartment or how well the apartment “resists” changes from the outside. Of course, this model does not account for the furnace, which is essentially an energy source.

\[\frac{dT}{dt} = k (T_{indoor} - T_{outdoor}) + k_{furnace}(t)\]

Here $k_furnace$ tells is the heating rate of the furnace or how many degrees Fahrenheit the home warms up per second (t) of active heating. If this is a low number, it might indicate an underpowered furnace.

This is a simple model, and it does okay, especially during night when the furnace is off. However, there is an exogenous source of heat (most days of the week): The sun! We can model this source of heat by assuming there is some heat being added into the system during daylight hours, that is proportional to the height of the sun in the sky! Of course, if we could include the cloud coverage, this could help the model, but that is perhaps a future project. Our new equation is:

\[\frac{dT}{dt} = k (T_{indoor} - T_{outdoor}) + k_{furnace}(t) + k_{sun}(\text{solar elevation})\]

This is pretty good. Another factor worth adding is the fact that the house is heated by radiators which have some considerable thermal inertia. That is, if the heat was recently on, it is likely the radiators are still emitting some heat. We can add an additional term for this.

Similarly, it is not worth delving into here, but I also added a term for humidity, mainly because it is included in the nest data. The high-level reason to include it is because the water content of the air can significantly impact the thermal inertia of the system (humid air can hold more heat than dry air)

Our final equation is:

\[\frac{dT}{dt} = k (T_{indoor} - T_{outdoor}) + k_{furnace}(t) + k_{sun}(\text{solar elevation}) + k_{\text{furnace-lag}} + k_{\text{humidity}}\]

Time to model the data!

Fitting the data

Our first step is to simply estimate the physical constants (the K’s) in the above equation. This can be done with linear regression! We create a lagged variable that contains whether the furnace was on in the past 15 minutes.

features = ['temp_difference', 'heating_on', 
    'heating_lag', 'solar_gain', 'indoor_humidity']
X = model_df[features]
y = model_df['delta_indoor_temp']

model = LinearRegression()
model.fit(X, y)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:      delta_indoor_temp   R-squared:                       0.730
Model:                            OLS   Adj. R-squared:                  0.730
Method:                 Least Squares   F-statistic:                     1554.
Date:                Tue, 23 Dec 2025   Prob (F-statistic):               0.00
Time:                        16:20:36   Log-Likelihood:                 2316.3
No. Observations:                2878   AIC:                            -4621.
Df Residuals:                    2872   BIC:                            -4585.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          -0.0277      0.033     -0.845      0.398      -0.092       0.037
temp_difference     0.0026      0.000      9.466      0.000       0.002       0.003
heating_on          0.0003   1.14e-05     24.718      0.000       0.000       0.000
heating_lag         0.0004   1.14e-05     34.494      0.000       0.000       0.000
solar_gain          0.0025      0.000     12.767      0.000       0.002       0.003
indoor_humidity     0.0001      0.000      0.277      0.782      -0.001       0.001
==============================================================================
Omnibus:                      345.239   Durbin-Watson:                   1.525
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1608.370
Skew:                           0.489   Prob(JB):                         0.00
Kurtosis:                       6.529   Cond. No.                     6.60e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.6e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

The result of the regression is an $R^2 = 0.73$.

After the physical constants are determined, we can now create a “virtual thermostat”. This thermostat takes in the current indoor temperature, a target-temperature, and a value for hysteresis (threshold before turning on) and determines whether to turn on the furnace or not. With this system we have a full model for our system! We can see how well it estimates the temperature of the house throughout the month!

Overall, the $R^2 = 0.75$. While this is a good fit, the quantity we are actually trying to estimate is the time the furnace is on as this is what is connected to the utility bill for a month. To be clear, what the model is doing is:

Using only the initial indoor temperature on Nov 1, outdoor temperature, and humidity level throughout the month, estimate how long the furnace would be on for each day of the month for any given target indoor temperature.

So how well does it work? Let’s start by simulating how long the furnace would be expected to be on each day given the target-heat settings actually used for November.

While there is some variance on individual days (overall $R^2 = 0.70$), the total for the month is very close to the true value. It is actually within 0.2% of the true value!

Great! This means we can now look at a counter-factual case.

Hypothetically Speaking

I am frequently adjusting the thermostat throughout the day (especially since I gained the ability to control it remotely through my phone). That said, I tend to average 68 during the day and 66 at night. What I am interested in are a few hypothetical scenarios. For the month of November, decided to consider 3 plans shown below.

Scenario Day (7AM - 9PM) Night
Actual roughly 68 roughly 66
Warm 72 68
Moderate 70 65
Cool 68 63

The results of these simulated heating schedules are below:

Of course, while the results are imperfect, this means that keeping the house 4 degrees warmer during the day would have likely led to approximately 11% increase in the heating time. Not a small amount, but also not terribly large either!

Thanks for reading,

Stephen