Python re.compile and REGEX patterns to clean-up name variations

3 hours ago 1
ARTICLE AD BOX

I am looking for help with a REGEX pattern that will help me clean-up names to just the first, last, and suffix. It needs to be able to handle the following variations.

Drops any middle name

Keeps last name if any

Drops any nicknames

Drops any maiden or dead names, standard format (name-)

Keeps suffixes without punctuation. Always follows ,\s after last name.

Does not keep titles of nobility. Always follows ,\s after last name.

Sarah Michelle Gellar -> Sarah Gellar

Leslie Marie DeGuzman Santorini -> Leslie Santorini

Beyonce -> Beyonce

Stefani "Lady Gaga" Germanotta -> Stefani Germanotta

Lebron James, Jr. -> Lebron James Jr

Hailey Bieber (Baldwin-) -> Hailey Bieber

Shania (Irene-) Twain -> Shania Twain

Charles "Chaz" Fluffy Wong, Duke of CatTown -> Charles Wong

I attempted a few different REGEX patterns, and can't figure out how a single pattern can match all of these cases. Below is my most recent attempt.

re_names = re.compile(r'''^ \s* (?P<first>[a-zA-Z.'\-]+) (?P<middle>(\s[a-zA-Z.'"\-\(\)]+)*)? (?P<last>\s[a-zA-Z.'\-]+)? (?P<maiden>(\s\([a-zA-Z,.'"\-]+\)*))? (,\s)? (?P<suffix>([a-zA-Z'\-]+)*)? .* $ ''', re.VERBOSE)

Unfortunately this pattern puts all names after the first name as middle names. It also captures "Duke" as a suffix for the final test case, when it should return no suffix.

I have also considered using separate re.compile with separate REGEX patterns, but still cannot find a way to capture all cases successfully.

Read Entire Article