Applying Kolmogorov-Smirnov (KS) test to evaluate multivariate synthetic tabular data (TVAE/TabDDPM vs. Empirical baseline)

12 hours ago 1

ARTICLE AD BOX

I am benchmarking several tabular Generative AI models (including TVAE, TabDDPM, and WGAN-GP) to synthesize sensor data. I need to rigorously evaluate the statistical similarity between my generated synthetic datasets and the empirical baseline dataset.

I want to use the Two-Sample Kolmogorov-Smirnov test (scipy.stats.ks_2samp). However, my dataset is multivariate (consisting of features like Efficiency, Load, and Object Type). The standard KS test is designed for 1D distributions.

My question: What is the standard programmatic approach in Python to compute an aggregated KS score for a multidimensional tabular dataset?

Should I iterate through each continuous column independently, run ks_2samp, and average the p-values/statistics?

Or is there a specific library/method better suited for multivariate empirical cumulative distribution functions (eCDFs) in this context?

Thank you for your insights!

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Applying Kolmogorov-Smirnov (KS) test to evaluate multivariate synthetic tabular data (TVAE/TabDDPM vs. Empirical baseline)

ARTICLE AD BOX

Related

Browser not Responsive when using Profile

How can I reliably recover and preserve page numbers from legal-document HTML/PDF text in Python at scale?

Can't install pywifi in python terminal

LEFT SIDEBAR AD