ARTICLE AD BOX
I'm building a web crawler and need to parse anchor tags to extract URLs. However, I'm running into issues identifying whether an href attribute contains a full URL path or a relative/internal path. For example, when crawling Wikipedia, I encounter hrefs like:
https://en.wikipedia.org/wiki/Page (full URL) /wiki/Saint_Lucia_Labour_Party (absolute path, relative to domain) wiki/Saint_Lucia_Labour_Party (relative to current directory) #section (anchor link)
What's the most reliable way to identify if an href is absolute or relative and convert relative URLs to absolute URLs so I can crawl them?
