Group DataFrame unique line counts by first entry

23 hours ago 2

ARTICLE AD BOX

I'm doing some data analysis on a Ranked Choice Voting election, and I'm bumping up against some hierarchical sorting.

I have a jagged dataframe like

Ranked 1 | Ranked 2 | Ranked 3 | [...] | Ranked N A | B A B A | B | C abstain A | B | D A | B | D B | C A | C C | D

What I'm looking for is subcounts like this:

Ranked 1 | Ranked 2 | Ranked 3 | [...] | Ranked N | Count | Level Count A | 6 B| | 4 D| -| -| 2 | - -| -| -| 1 | - C| -| -| 1 | - -| -| -| -| 1 | - C| -| -| -| 1 | - B | 2 -| -| -| -| 1 | - C| -| -| -| 1 | - C D| -| -| -| 1 | - abstain | -| -| -| -| 1 | -

Where the dashes represent NA values for clarity (the extra spaces to make groupings clear are also for clarity)

What I want to do is a bit complex.

More or less the idea is "sort the table by level count, hierarchically, of each group". The important bit is eliminating redundant information by NOT creating hierarchies with 1-size subgroups. But otherwise I'm not tied to this EXACT presentation, just as long as it's readable. For instance, if the "leaf nodes" of the tree have a level count that's fine.

By which I mean above you could have, for C

Ranked 1 | Ranked 2 | Ranked 3 | [...] | Ranked N | Count | Level Count C | 1 D | 1 -| | 1 -| | 1 -| 1 | -

Which would be way too much visual noise to parse.

So far, this is the best I've gotten

ranks = ["Ranked {}".format(i+1) for i in range(num_candidates)] df = pd.DataFrame(ballots, columns=ranks).fillna("-") # Turns out to be effectively equivalent to value_counts return df.groupby(ranks).size().sort_values(ascending=False).to_frame()

Table with broken hierarchy

Extremely close, but is a bit convoluted. The count is the count of that specific instance when it LOOKS hierarchically counted. In addition, sorting by size means not everything is grouped properly. You can see A appear as Rank 1 twice, instead of all being under one clump. But most attempts to fix that just sort by key. It took me a while to figure out what this is even saying.

I'm probably going to have to iterate and construct this data frame by hand if I had to guess, but I'm having trouble even getting this sort of complex grouping to work at all and resources online mostly cover much simpler cases.

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Group DataFrame unique line counts by first entry

ARTICLE AD BOX

Related

Need roadmap where to start I frds call me old so

"hello chat" Door

Why is my Python While Loop Executing More than Necessary?

LEFT SIDEBAR AD