I tried scraping a part of wiki and it's taking the wrong section

18 hours ago 4

ARTICLE AD BOX

You search ul in wrong way.

You search h2 with text Dialogue and next you search sibling ul but ul with dialogs is not sibling because it is inside div.

There is sibling ul but after Location and you get text from location.

You have to search ul in <div id="needolin-dialogues> (without "sibling").

You may even search directly li in this div - ie. using CSS selector.

soup.select("div#needolin-dialogues li")

Working function:

def scrape_enemy_dialogue(self, url): #print("url:", url) soup = self.fetch_page(url) dialogue = [ li.get_text(strip=True) for li in soup.select("div#needolin-dialogues li") ] #print("dialogue:", dialogue) return dialogue

Part of output:

Scraping Mossgrub... url: https://hollowknight.wiki/w/Mossgrub dialogue: ['Protect us, mother!', 'Call for danger, hide away...', 'Young must eat, grow or die...', 'Sleep and change, have no fear...'] Scraping Massive Mossgrub... url: https://hollowknight.wiki/w/Massive_Mossgrub dialogue: ["Mother's voice... distant...", 'Little sisters... hide away...', 'Eat and grow... larger...', 'Change... hidden change...'] Scraping Mossmir... url: https://hollowknight.wiki/w/Mossmir dialogue: ['Protect us, mother!', 'Call for danger, hide away...', 'Young must eat, grow or die...', 'Sleep and change, have no fear...']

By the way:

the same with find() find_all()

def scrape_enemy_dialogue(self, url): soup = self.fetch_page(url) dialogue = [] div = soup.find("div", id="needolin-dialogues") # div = soup.find("div", {"id": "needolin-dialogues"}) if div: dialogue = [li.get_text(strip=True) for li in div.find_all("li")] return dialogue

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

I tried scraping a part of wiki and it's taking the wrong section

ARTICLE AD BOX

Related

Django Admin not loading static files

ANTLR4 TokenStreamRewriter.getText() loses spaces in custom DOCTYPE parser rule

yt-dlp works locally but fails on Render (FastAPI) with “Sign in to confirm you’re not a bot” error [closed]

LEFT SIDEBAR AD