Group DataFrame unique line counts by first entry

23 hours ago 2
ARTICLE AD BOX

I'm doing some data analysis on a Ranked Choice Voting election, and I'm bumping up against some hierarchical sorting.

I have a jagged dataframe like

Ranked 1 | Ranked 2 | Ranked 3 | [...] | Ranked N A | B A B A | B | C abstain A | B | D A | B | D B | C A | C C | D

What I'm looking for is subcounts like this:

Ranked 1 | Ranked 2 | Ranked 3 | [...] | Ranked N | Count | Level Count A | 6 B| | 4 D| -| -| 2 | - -| -| -| 1 | - C| -| -| 1 | - -| -| -| -| 1 | - C| -| -| -| 1 | - B | 2 -| -| -| -| 1 | - C| -| -| -| 1 | - C D| -| -| -| 1 | - abstain | -| -| -| -| 1 | -

Where the dashes represent NA values for clarity (the extra spaces to make groupings clear are also for clarity)

What I want to do is a bit complex.

More or less the idea is "sort the table by level count, hierarchically, of each group". The important bit is eliminating redundant information by NOT creating hierarchies with 1-size subgroups. But otherwise I'm not tied to this EXACT presentation, just as long as it's readable. For instance, if the "leaf nodes" of the tree have a level count that's fine.

By which I mean above you could have, for C

Ranked 1 | Ranked 2 | Ranked 3 | [...] | Ranked N | Count | Level Count C | 1 D | 1 -| | 1 -| | 1 -| 1 | -

Which would be way too much visual noise to parse.

So far, this is the best I've gotten

ranks = ["Ranked {}".format(i+1) for i in range(num_candidates)] df = pd.DataFrame(ballots, columns=ranks).fillna("-") # Turns out to be effectively equivalent to value_counts return df.groupby(ranks).size().sort_values(ascending=False).to_frame()

Table with broken hierarchy

Extremely close, but is a bit convoluted. The count is the count of that specific instance when it LOOKS hierarchically counted. In addition, sorting by size means not everything is grouped properly. You can see A appear as Rank 1 twice, instead of all being under one clump. But most attempts to fix that just sort by key. It took me a while to figure out what this is even saying.

I'm probably going to have to iterate and construct this data frame by hand if I had to guess, but I'm having trouble even getting this sort of complex grouping to work at all and resources online mostly cover much simpler cases.

Read Entire Article