Calculating Intraclass Correlations

Python code to calculate ICC values

Background

A while go I had a project where I had to calculate intraclass correlation coefficents. As it turns out, theere isn’t much python code to do so! The main library you will find online implements it, but it is quite slow. Here I will use it the reference implementation.

The context where I use it is to calculate the correlation between multiple RDMs. The following python code demonstrates

While this code is not for motivating the use of RSA, we can tackle multiple birds with one stone!

import time
import numpy as np
import pandas as pd
import pingouin as pg

# Make data
numSubjs  = 40
numObs    = 10000   
true_signal  = np.random.rand(numObs)
observations = np.array([true_signal + np.random.randn(numObs) for _ in range(numSubjs)])

# Calculate ICC using Pingoin
df         = pd.DataFrame(observations)
df['subj'] = df.index
tallDF     = pd.melt(df,id_vars='subj')
start = time.time() 
res = pg.intraclass_corr(data=tallDF,
                         targets = 'variable',
                         raters  = 'subj',
                         ratings = 'value')
print("%.2f seconds" % (time.time() - start))

Execution takes 96.60 seconds!!

This amount of time makes it prohibitive to use this in a searchlight procedure! Let’s write our own function. My main reference for writing the function is to take it directly from here.

Just writing out the code with no optimizations gives the following code

def icc3(sample):
    """
    Each row is a target, and each column is a rater/subject
    """
    x  = sample  
    n  = x.shape[0]
    k  = x.shape[1]
    
    S  = np.sum(x,axis=1) / k # Mean across rows
    M  = np.sum(x,axis=0) / n # Mean down columns
    xb = np.mean(x)
    
    # Start big loop
    SST=0; SSBS=0; SSBM=0; SSWS=0; SSWM=0
    for i in range(n):
        for j in range(k):
            SST  += (x[i,j]-xb)**2
            SSBS += (S[i] - xb)**2
            SSBM += (M[j] - xb)**2
            SSWS += (x[i,j] - S[i])**2
            SSWM += (x[i,j] - M[j])**2
    SSE  = SST - SSBS - SSBM
    MSE  = SSE / ((n-1)*(k-1))
    MSBS = SSBS / (n-1)
    # This is ICC(c,1)
    p = (MSBS - MSE) / (MSBS + (k-1)*MSE)  
    
    return p

start = time.time() 
result2 = icc3(observations.T)
print("%.2f seconds" % (time.time() - start))

print(res[res.Type=='ICC3'].ICC.item(), '   |    ',result2)
0.68 seconds
0.07847971556491169    |     0.07847971556491638

One benefit (and indeed intention) of writing the funtion without only basic numpy calls without any arguments is so that it can be fed into numba!

Simply adding the @njit decorator we get the following improvement!

Original:

%timeit icc3(observations.T)
589 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With numba:

%timeit icc3(observations.T)
1.37 ms ± 5.84 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

That’s a pretty good improvement! We went from 96 seconds, to 1.3 mili-seconds

Further modification

In some environments, numba might not be available. We can remove the loops and resort to just pure numpy functions.

def icc3(sample):
    """
    Each row is a target, and each column is a rater/subject
    """
    x  = sample  
    n  = x.shape[0]
    k  = x.shape[1]
    
    S  = np.sum(x,axis=1) / k # Mean across rows
    M  = np.sum(x,axis=0) / n # Mean down columns
    xb = np.mean(x)
    
    # Loop-free approach
    SST   = np.sum(x**2)
    MSE  = x.var(1,ddof=1).mean() 
    MSBS = np.sum(k*((S-xb)**2)) / (n-1)

    # This is ICC(c,1)
    p = (MSBS - MSE) / (MSBS + (k-1)*MSE) 

    return p

The timing for this function

%timeit icc3(observations.T)
1.69 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

We find it is almost the same speed as the numba!