ARTICLE AD BOX
Today I came across an unexpected error in my userscript. It takes a elements from a page and does something based on the linked URL. The particular trouble was that it received a weirdly encoded URL from a link.
The script uses the URL global object to parse link URLs. Debugging revealed that while the source URL appears to be well formed in the HTML, retrieving its name query parameter returns a garbled value with a bunch of characters replaced with characters with the 65533 code, so subsequently my script fails to make sense of the result.
Research into the problem gave two leads: decodeURIComponent() call on the URL's search property fails, but unescape() produces something that isn't readable, but at least looks remotely like a valid string. This led me to try and decode the string, so I came up with this code:
let search = unescape(url.search); if(search != url.search) { const d = new TextDecoder('cp1251'); // encoding was just a guess that worked. search = d.decode(Uint8Array.from(search, c => c.charCodeAt(0))); url.search = search; }This code works and solves the particular issue with the garbled name URL parameter, but I want advice on whether there is a better approach? In particular I'm not sure if the code that turns the string into array of character codes robust enough to handle arbitrary input. Although it is a URL, so should be limited in what appears in it, I'd like to make it as errorproof as possible. TextDecoder works with Uint8Array, but what if a character code is greater than 255?
