Why does heteroscedastic AICc/n give lower values than single-compartment models? Parameter counting and cross-dataset comparison issues

1 week ago 15

ARTICLE AD BOX

I'm fitting compartmental models to biological data and encountering a counterintuitive result: a model that dramatically improves fit (18% NRMSE reduction) has worse AICc than a model with minimal improvement (0.63% NRMSE reduction), despite both having the same number of parameters. I believe this is related to different variance scales across compartments, but I'm seeking clarification on the correct statistical approach. I've read Burnham & Anderson (2002), but I couldn't find any help specific to my case.

Background

I have three datasets from the same system:

Dataset 1 (Lets say A): Single compartment (C1), n=19 (19 experimental data points)

Dataset 2 (B): Single compartment (C1), n=19

Dataset 3 (C): Two compartments (C1 + P1), n=38 (19 + 19)

I'm using the standard small-sample correction:

AICc = -2*log(L) + 2K + (2K(K+1)/n-k-1)

The scale ratio is approximately 130:1 between C1 and P1. I want to compare the same model across all three datasets to determine which dataset it fits best. To make comparisons meaningful, I calculate AICc per observation (AICc/n) as a normalized metric. I'm comparing two models, each with 1 free parameter (k=1):

Model NRMSE Reduction vs Baseline Expected Result Actual AICc

Model A	-0.63%	Higher AICc	411.90
Model B	-18.06%	Lower AICc	412.87

Model B fits much better but has worse AICc. Why?

I initially used a homoscedastic (pooled variance) approach:

all_residuals = concatenate([compartment1_residuals, compartment2_residuals]) n_total = 38 # Single pooled variance rss_total = sum(all_residuals**2) sigma2_pooled = rss_total / n_total # Log-likelihood logL = -n_total/2 * (log(2π * sigma2_pooled) + 1) # AICc k = 1 + 1 # parameter + variance AICc = 2*k - 2*logL + 2*k*(k+1)/(n_total - k - 1)

Looking at the actual RSS contributions:

Model A:

C1 RSS: 76,750 (99.98% of total) P1 RSS: 14 (0.02% of total) Total RSS: 76,764

Model B:

C1 RSS: 79,153 (99.98% of total) [+2,403 worse!] P1 RSS: 10 (0.01% of total) [-4 better] Total RSS: 79,163 [Net: +2,399 worse]

So I suspected that the P1 improvement is statistically worthless because P1 contributes 0.02% to total RSS. So I thought that I need separate variances for each compartment because they have vastly different scales and I used a heteroscedastic approach.

# Separate variances for each compartment sigma2_csf = rss_csf / n_csf sigma2_plasma = rss_plasma / n_plasma # Separate log-likelihoods logL_csf = -n_csf/2 * (log(2π * sigma2_csf) + 1) logL_plasma = -n_plasma/2 * (log(2π * sigma2_plasma) + 1) # Sum log-likelihoods logL_total = logL_csf + logL_plasma # AICc (but what is k?) k = 1 AICc = 2*k - 2*logL_total + 2*k*(k+1)/(n_total - k - 1)

With heteroscedastic approach, Model B now shows lower AICc.

Model NRMSE Reduction vs Baseline Expected Result Actual AICc

Model A	-0.63%	Higher AICc	255.12
Model B	-18.06%	Lower AICc	239.64

Heteroscedastic approach correctly ranks Model B as better (aligns with NRMSE). But now cross-dataset comparison breaks!

I need to compare the same model across datasets to see where it fits best. Here are AICc/n values for Model A:

Dataset Structure Approach Raw AICc n AICc/n k

Dataset 1	1 compartment	Standard	212.11	19	11.16	1
Dataset 2	1 compartment	Standard	209.21	19	11.01	1
Dataset 3	2 compartments	Homoscedastic	411.90	38	10.84	1
Dataset 3	2 compartments	Heteroscedastic	255.02	38	6.71	1

Problem: With heteroscedastic approach, Dataset 3's AICc/n drops to the 6-7 range, while Datasets 1 & 2 are in the 10-11 range. With homoscedastic approach, all three datasets give comparable AICc/n values (10-11), but this gives wrong model ranking for Dataset 3.

The standard reference (Burnham & Anderson 2002) doesn't explicitly address this case, and I'm seeking guidance from the statistical community on best practices. I will be really thankful if I can get some help and advice as to know if I am doing the correct way or not? And if the heteroscedastic approach is right then how to deal with the lower range issue.

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Why does heteroscedastic AICc/n give lower values than single-compartment models? Parameter counting and cross-dataset comparison issues

ARTICLE AD BOX

Related

Jupyter Kernel stuck on "Connecting" in Docker (CellOracle) on macOS Silicon (M2)

Flask API pagination and filtering works, but totalRecords count seems incorrect

Submit job without having access to worker code

LEFT SIDEBAR AD