Python code to calculate ICC values
A while go I had a project where I had to calculate intraclass correlation coefficents. As it turns out, theere isn’t much python code to do so! The main library you will find online implements it, but it is quite slow. Here I will use it the reference implementation.
The context where I use it is to calculate the correlation between multiple RDMs. The following python code demonstrates
While this code is not for motivating the use of RSA, we can tackle multiple birds with one stone!
import time
import numpy as np
import pandas as pd
import pingouin as pg
# Make data
numSubjs = 40
numObs = 10000
true_signal = np.random.rand(numObs)
observations = np.array([true_signal + np.random.randn(numObs) for _ in range(numSubjs)])
# Calculate ICC using Pingoin
df = pd.DataFrame(observations)
df['subj'] = df.index
tallDF = pd.melt(df,id_vars='subj')
start = time.time()
res = pg.intraclass_corr(data=tallDF,
targets = 'variable',
raters = 'subj',
ratings = 'value')
print("%.2f seconds" % (time.time() - start))
Execution takes 96.60 seconds!!
This amount of time makes it prohibitive to use this in a searchlight procedure! Let’s write our own function. My main reference for writing the function is to take it directly from here.
Just writing out the code with no optimizations gives the following code
def icc3(sample):
"""
Each row is a target, and each column is a rater/subject
"""
x = sample
n = x.shape[0]
k = x.shape[1]
S = np.sum(x,axis=1) / k # Mean across rows
M = np.sum(x,axis=0) / n # Mean down columns
xb = np.mean(x)
# Start big loop
SST=0; SSBS=0; SSBM=0; SSWS=0; SSWM=0
for i in range(n):
for j in range(k):
SST += (x[i,j]-xb)**2
SSBS += (S[i] - xb)**2
SSBM += (M[j] - xb)**2
SSWS += (x[i,j] - S[i])**2
SSWM += (x[i,j] - M[j])**2
SSE = SST - SSBS - SSBM
MSE = SSE / ((n-1)*(k-1))
MSBS = SSBS / (n-1)
# This is ICC(c,1)
p = (MSBS - MSE) / (MSBS + (k-1)*MSE)
return p
start = time.time()
result2 = icc3(observations.T)
print("%.2f seconds" % (time.time() - start))
print(res[res.Type=='ICC3'].ICC.item(), ' | ',result2)
0.68 seconds
0.07847971556491169 | 0.07847971556491638
One benefit (and indeed intention) of writing the funtion without only basic numpy
calls without any arguments is so that it can be fed into numba
!
Simply adding the @njit
decorator we get the following improvement!
Original:
%timeit icc3(observations.T)
589 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With numba:
%timeit icc3(observations.T)
1.37 ms ± 5.84 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
That’s a pretty good improvement! We went from 96 seconds, to 1.3 mili-seconds
In some environments, numba
might not be available. We can remove the loops and resort to just pure numpy functions.
def icc3(sample):
"""
Each row is a target, and each column is a rater/subject
"""
x = sample
n = x.shape[0]
k = x.shape[1]
S = np.sum(x,axis=1) / k # Mean across rows
M = np.sum(x,axis=0) / n # Mean down columns
xb = np.mean(x)
# Loop-free approach
SST = np.sum(x**2)
MSE = x.var(1,ddof=1).mean()
MSBS = np.sum(k*((S-xb)**2)) / (n-1)
# This is ICC(c,1)
p = (MSBS - MSE) / (MSBS + (k-1)*MSE)
return p
The timing for this function
%timeit icc3(observations.T)
1.69 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
We find it is almost the same speed as the numba!