How to group connected materials and aggregate stocks in PySpark/SQL on Databricks

6 hours ago 1
ARTICLE AD BOX

How can I solve my current issue I'll give you some sample raw data and expected data below. The process can use SQL and Python, currently using a Databricks notebook.

As you can see in the data, I don't have a direct link between all of the materials. I have A1, B3 — any material is fine as long as it's within the same group. But in the group column, if I already have group A1 even though I add another material within that group, it still needs to be A1. I want to retain it. So my solution is to save it into ADLS and for the STOCKS column I want to aggregate the stocks also within the same group.

Sample:

https://i.sstatic.net/b2Q05lUr.png

Any help would be greatly appreciated. Thanks!

Read Entire Article