ARTICLE AD BOX
I am looking for help with a REGEX pattern that will help me clean-up names to just the first, last, and suffix. It needs to be able to handle the following variations.
Drops any middle name
Keeps last name if any
Drops any nicknames
Drops any maiden or dead names, standard format (name-)
Keeps suffixes without punctuation. Always follows ,\s after last name.
Does not keep titles of nobility. Always follows ,\s after last name.
Sarah Michelle Gellar -> Sarah Gellar
Leslie Marie DeGuzman Santorini -> Leslie Santorini
Beyonce -> Beyonce
Stefani "Lady Gaga" Germanotta -> Stefani Germanotta
Lebron James, Jr. -> Lebron James Jr
Hailey Bieber (Baldwin-) -> Hailey Bieber
Shania (Irene-) Twain -> Shania Twain
Charles "Chaz" Fluffy Wong, Duke of CatTown -> Charles Wong
I attempted a few different REGEX patterns, and can't figure out how a single pattern can match all of these cases. Below is my most recent attempt.
re_names = re.compile(r'''^ \s* (?P<first>[a-zA-Z.'\-]+) (?P<middle>(\s[a-zA-Z.'"\-\(\)]+)*)? (?P<last>\s[a-zA-Z.'\-]+)? (?P<maiden>(\s\([a-zA-Z,.'"\-]+\)*))? (,\s)? (?P<suffix>([a-zA-Z'\-]+)*)? .* $ ''', re.VERBOSE)Unfortunately this pattern puts all names after the first name as middle names. It also captures "Duke" as a suffix for the final test case, when it should return no suffix.
I have also considered using separate re.compile with separate REGEX patterns, but still cannot find a way to capture all cases successfully.
