Getting the Redirected URL from the Original URL

前端 未结 9 1493
终归单人心
终归单人心 2020-12-05 14:57

I have a table in my database which contains the URLs of some websites. I have to open those URLs and verify some links on those pages. The problem is that some URLs get red

9条回答
  •  情歌与酒
    2020-12-05 15:24

    This function will return the final destination of a link -- even if there are multiple redirects. It doesn't account for JavaScript-based redirects or META redirects. Notice that the previous solution didn't deal with Absolute & Relative URLs, since the LOCATION header could return something like "/newhome" you need to combine with the URL that served that response to identify the full URL destination.

        public static string GetFinalRedirect(string url)
        {
            if(string.IsNullOrWhiteSpace(url))
                return url;
    
            int maxRedirCount = 8;  // prevent infinite loops
            string newUrl = url;
            do
            {
                HttpWebRequest req = null;
                HttpWebResponse resp = null;
                try
                {
                    req = (HttpWebRequest) HttpWebRequest.Create(url);
                    req.Method = "HEAD";
                    req.AllowAutoRedirect = false;
                    resp = (HttpWebResponse)req.GetResponse();
                    switch (resp.StatusCode)
                    {
                        case HttpStatusCode.OK:
                            return newUrl;
                        case HttpStatusCode.Redirect:
                        case HttpStatusCode.MovedPermanently:
                        case HttpStatusCode.RedirectKeepVerb:
                        case HttpStatusCode.RedirectMethod:
                            newUrl = resp.Headers["Location"];
                            if (newUrl == null)
                                return url;
    
                            if (newUrl.IndexOf("://", System.StringComparison.Ordinal) == -1)
                            {
                                // Doesn't have a URL Schema, meaning it's a relative or absolute URL
                                Uri u = new Uri(new Uri(url), newUrl);
                                newUrl = u.ToString();
                            }
                            break;
                        default:
                            return newUrl;
                    }
                    url = newUrl;
                }
                catch (WebException)
                {
                    // Return the last known good URL
                    return newUrl;
                }
                catch (Exception ex)
                {
                    return null;
                }
                finally
                {
                    if (resp != null)
                        resp.Close();
                }
            } while (maxRedirCount-- > 0);
    
            return newUrl;
        }
    

提交回复
热议问题