Introduction:
This tutorial is on how to create a HTTPWebRequest in C# using a proxy connection.
Why Proxies?
Proxies are sometimes required in certain web scraping applications due to the sheer amount of data that needs to scraped. Some sites temporarily - or even permanently - block IP addresses which are requesting/receiving large amounts of data in a short time to avoid the website's full bandwidth being used on that single IP, and therefore stopping the site from going offline for all other users as well. Some users abuse HTTP connections via DoS (Denial of Service) and DDoS (Distributed Denial of Service) to knock a site offline, which is why having IP blocking/Anti DDoS is a good idea.
Getting a Proxy:
Before we begin, you want to get a working, fast, and reliable proxy for your needs. I would recommend getting one from the following url; http://proxylist.hidemyass.com/.
Basic Function:
Now for our application. First create a new Visual Studio C# Form/Console Application solution, and enter the entry point of the application (form1 load for forms, the 'main' function for console applications) and enter a call to a function we are yet to create, name it 'getSource'...
namespace ProxyShower
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
getSource();
}
}
}
Next, create the function 'getSource'...
public static void getSource() {
}
Get Source Function:
Within our function, we require some additional namespaces. So, go to the top of your file, and add the following lines of code to include the appropriate namespaces in to your program's class/file...
using System.Web;
using System.Net;
using System.IO;
Now we are ready to create our basic function. First we create a HTTPWebRequest (WebRequest casted to a HTTPWebRequest) to the website we want to connect to...
HTTPWebRequest r = (HTTPWebRequest) WebRequest.Create("http://www.google.com");
Next we create a HTTPWebResponse and get the request's response...
HTTPWebResponse re = r.GetResponse();
Finally, if we really wanted to, we would read the response stream of our HTTPWebResponse object ('re' in this case) to a string...
string src
= new StreamReader
(re
.GetResponseStream()).ReadToEnd();
Proxy:
The proxy is now ready to be added to our script. To add a proxy, it gets added as property to our HTTPWebRequest ('r') object as the property 'proxy'. The property 'proxy' takes an additional 'WebProxy' object, so let's create a new one now...
WebProxy proxy
= new WebProxy
();
Then we want to set the proxy address to our proxies IP:Port...
proxy.address = "{ip}:{port}";
(Remove the braces, and add your proxy ip followed by a colon, followed by the port).
This next part is very important, you must add the protocol to the beginning of your proxy address. My proxy is simply 'HTTP' so I add the protocol as 'http://', like so..
proxy.address = "http://{ip}:{port}";
Finally we set the HTTPWebRequest's proxy to our newly created WebProxy 'proxy', like so...
Finished!
Here is the full source if required...
using System.Web;
using System.Net;
using System.IO;
namespace ProxyShower
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
getSource();
}
public static void getSource() {
WebProxy proxy
= new WebProxy
();
proxy.address = "http://{ip}:{port}";
HTTPWebRequest r = (HTTPWebRequest) WebRequest.Create("http://www.google.com");
r.proxy = proxy;
HTTPWebResponse re = r.GetResponse();
string src
= new StreamReader
(re
.GetResponseStream()).ReadToEnd();
}
}
}