Creating a Web Page Scraper in C#
Submitted by Yorkiebar on Tuesday, July 8, 2014 - 05:16.
Introduction:
This tutorial will teach you how to make a web scraper in C#, .NET framework.
Theory:
Here are the steps we will follow;
Get webpage source
Disect source
Output results
Getting the Source:
So first we need to get the web page source. Our target URL is going to be the home page of sourcecodester.com. First we create a basic HTTPWebRequest to the site, we then receive the response, and read it to a string which we return to the calling location of the function...
Disectting the Source:
Now that we have the source, we want to disect. As a side note; here is what the main function where we are calling everything from looks like...
So first we want to look for patterns in the source. You can either save the webpage in your page and open the saved documents in a text editor on your PC, or you can use a file stream to save the httpresponse from our program.
Looking at the source, we can see that all the articles are surrounded by divs with the class of '
- static string getSource()
- {
- HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://www.sourcecodester.com/");
- req.UserAgent = "curl"; // this simulate curl linux command
- req.Method = "GET";
- HttpWebResponse res = (HttpWebResponse) req.GetResponse();
- req = null;
- }
- static void Main(string[] args) {
- string src = getSource();
- }
Add new comment
- 117 views