Optimizing large file processing with 250k+ rows and 350+ validation rules [closed]

1 day ago 1
ARTICLE AD BOX

I have a WPF application that loads large Excel/CSV files (~ 250,000 rows) and then runs ~ 357 validation methods on the data.

Current flow:

string filePath = openFileDialog.FileName; string fileExtension = Path.GetExtension(filePath); DataTable dataTable = ReadExcel(filePath, fileExtension); var sourceData = ReadExcelDirect(filePath); processedData = (from item in sourceData select new ProcessedRow { // many mapped properties }).ToList();

Then I run many validation methods like:

ValidateRuleA(processedData); ValidateRuleB(processedData); ValidateRuleC(processedData); ValidateRuleD(processedData); ValidateRuleE(processedData);

Each rule contains multiple regex checks against a large concatenated string field:

private void ValidateRuleA(IEnumerable<ProcessedRow> inputData) { var queryResult = (from row in inputData where Regex.IsMatch(row.ALL_CODES, @"\bA12[01346]\b") && (Regex.IsMatch(row.ALL_CODES, @"\bB45[0-9]\b") || Regex.IsMatch(row.ALL_CODES, @"\bC78[0-9]\b")) select new ValidationResult { ID = row.ID, GROUP_ID = row.GROUP_ID, ERROR = "Validation Error", ERROR_REF = "RULE_A" }).ToList(); }

What is the best architecture for handling 357 validation rules efficiently?

Read Entire Article