Magic Numbers for LZMA File Type-Based Compression Estimation

1 day ago 2
ARTICLE AD BOX

I am writing a custom File Transfer app (yes, I know, it already exists a million times over, but its a learning process for me and helps me self-teach coding) in Python. As part of the pre-file sharing exchange, one of the steps I wanted to take was for the client to estimate the compressed size of a given file or directory before compressing to .tar with LZMA. The problem is that I don't really know how to estimate the file size. I don't want to run through compression to grab the size without writing files since this would effectively compress the file/folder twice. Not to mention the point of the pre-compression estimation is to check if the server (and client) have the space to store the file before the time-consuming resource-intensive compression. The only other option I'm aware of is magic numbers defined to guesstimate a size which is super fast and takes next-to no resources.

def estimateCompressionSize(filePath: Path) -> int: if filePath.is_dir(): totalSize = 0 for x in filePath.rglob("*"): totalSize += estimateCompressionSize(x) return totalSize ext = filePath.suffix.lower() if ext in ("a", "b", "c"): compEstimate = 1.0 elif ext in ("x", "y", "z"): compEstimate = 0.20 else: compEstimate = 0.65 return int(filePath.stat().st_size * compEstimate)

So I already made an estimated compression iterator that just checks file types, grabs the current size and divides it by magic numbers based on the file type to estimate a size, as you can see above. I just don't know what magic numbers to use. I can't find any information online other than one source from tukaani.org but that only gave a few file types and I don't have the knowledge or abilities to infer similar file types based on those tukaani did. I've tried many many Google searches, tried searching here, even tried asking AI (it just spat out psychosis nonsense I couldn't verify or backup) but came up empty handed.

So I was wondering if any smart individual here would have any answers for me?

Read Entire Article