ARTICLE AD BOX
I'm fitting compartmental models to biological data and encountering a counterintuitive result: a model that dramatically improves fit (18% NRMSE reduction) has worse AICc than a model with minimal improvement (0.63% NRMSE reduction), despite both having the same number of parameters. I believe this is related to different variance scales across compartments, but I'm seeking clarification on the correct statistical approach. I've read Burnham & Anderson (2002), but I couldn't find any help specific to my case.
Background
I have three datasets from the same system:
Dataset 1 (Lets say A): Single compartment (C1), n=19 (19 experimental data points)
Dataset 2 (B): Single compartment (C1), n=19
Dataset 3 (C): Two compartments (C1 + P1), n=38 (19 + 19)
I'm using the standard small-sample correction:
AICc = -2*log(L) + 2K + (2K(K+1)/n-k-1)
The scale ratio is approximately 130:1 between C1 and P1. I want to compare the same model across all three datasets to determine which dataset it fits best. To make comparisons meaningful, I calculate AICc per observation (AICc/n) as a normalized metric. I'm comparing two models, each with 1 free parameter (k=1):
| Model A | -0.63% | Higher AICc | 411.90 |
| Model B | -18.06% | Lower AICc | 412.87 |
Model B fits much better but has worse AICc. Why?
I initially used a homoscedastic (pooled variance) approach:
all_residuals = concatenate([compartment1_residuals, compartment2_residuals]) n_total = 38 # Single pooled variance rss_total = sum(all_residuals**2) sigma2_pooled = rss_total / n_total # Log-likelihood logL = -n_total/2 * (log(2π * sigma2_pooled) + 1) # AICc k = 1 + 1 # parameter + variance AICc = 2*k - 2*logL + 2*k*(k+1)/(n_total - k - 1)Looking at the actual RSS contributions:
Model A:
C1 RSS: 76,750 (99.98% of total) P1 RSS: 14 (0.02% of total) Total RSS: 76,764Model B:
C1 RSS: 79,153 (99.98% of total) [+2,403 worse!] P1 RSS: 10 (0.01% of total) [-4 better] Total RSS: 79,163 [Net: +2,399 worse]So I suspected that the P1 improvement is statistically worthless because P1 contributes 0.02% to total RSS. So I thought that I need separate variances for each compartment because they have vastly different scales and I used a heteroscedastic approach.
# Separate variances for each compartment sigma2_csf = rss_csf / n_csf sigma2_plasma = rss_plasma / n_plasma # Separate log-likelihoods logL_csf = -n_csf/2 * (log(2π * sigma2_csf) + 1) logL_plasma = -n_plasma/2 * (log(2π * sigma2_plasma) + 1) # Sum log-likelihoods logL_total = logL_csf + logL_plasma # AICc (but what is k?) k = 1 AICc = 2*k - 2*logL_total + 2*k*(k+1)/(n_total - k - 1)With heteroscedastic approach, Model B now shows lower AICc.
| Model A | -0.63% | Higher AICc | 255.12 |
| Model B | -18.06% | Lower AICc | 239.64 |
Heteroscedastic approach correctly ranks Model B as better (aligns with NRMSE). But now cross-dataset comparison breaks!
I need to compare the same model across datasets to see where it fits best. Here are AICc/n values for Model A:
| Dataset 1 | 1 compartment | Standard | 212.11 | 19 | 11.16 | 1 |
| Dataset 2 | 1 compartment | Standard | 209.21 | 19 | 11.01 | 1 |
| Dataset 3 | 2 compartments | Homoscedastic | 411.90 | 38 | 10.84 | 1 |
| Dataset 3 | 2 compartments | Heteroscedastic | 255.02 | 38 | 6.71 | 1 |
Problem: With heteroscedastic approach, Dataset 3's AICc/n drops to the 6-7 range, while Datasets 1 & 2 are in the 10-11 range. With homoscedastic approach, all three datasets give comparable AICc/n values (10-11), but this gives wrong model ranking for Dataset 3.
The standard reference (Burnham & Anderson 2002) doesn't explicitly address this case, and I'm seeking guidance from the statistical community on best practices. I will be really thankful if I can get some help and advice as to know if I am doing the correct way or not? And if the heteroscedastic approach is right then how to deal with the lower range issue.
