How to Create a Any Page Web Scraper in Visual Basic
Submitted by GeePee on Tuesday, April 21, 2015 - 23:43.
Introduction:
Welcome to a tutorial on how to make a visual basic program which will scrape between two given points from a given page and create a list of output.
Pre-Creation:
My form will have:
Textbox1 Extract From
Textbox2 Extract To
Textbox3 Page to extract from
Button1 Begin extraction
Steps of Creation:
Step 1:
First we want some imports and a function. The function will enable us to scrape the data between the two given points.
Step 2:
Next we want to create the code to begin the process. First we check that all forms are filled out and if they are we produce a SaveFileDialog to select a save path as .txt.
Step 3:
Following initialization of the forms and save path, we get the source code of the page url, extract the data and save it to the save path. (Below is the full button code).
Project Complete!
Below is the full source code along with download of the files.
- Imports System.IO
- Imports System.Text.RegularExpressions
- Imports System.Net
- Private Function GetBetweenAll(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String) As String()
- Dim Results, T As New List(Of String)
- T.AddRange(Regex.Split(Source, Str1))
- T.RemoveAt(0)
- For Each I As String In T
- Results.Add(Regex.Split(I, Str2)(0))
- Next
- Return Results.ToArray
- End Function
- If (Not TextBox1.Text = Nothing And Not TextBox2.Text = Nothing And Not TextBox3.Text = Nothing) Then
- Dim fo As New SaveFileDialog
- fo.Filter = "Text Files|*.txt"
- fo.FilterIndex = 1
- fo.Title = "Save Path"
- fo.ShowDialog()
- If (Not fo.FileName = Nothing) Then
- End If
- End If
- Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
- If (Not TextBox1.Text = Nothing And Not TextBox2.Text = Nothing And Not TextBox3.Text = Nothing) Then
- Dim fo As New SaveFileDialog
- fo.Filter = "Text Files|*.txt"
- fo.FilterIndex = 1
- fo.Title = "Save Path"
- fo.ShowDialog()
- If (Not fo.FileName = Nothing) Then
- Dim r As HttpWebRequest = HttpWebRequest.Create(TextBox3.Text)
- Dim re As HttpWebResponse = r.GetResponse()
- Dim src As String = New StreamReader(re.GetResponseStream()).ReadToEnd()
- Dim srcs As String() = getbetweenall(src, TextBox1.Text, TextBox2.Text)
- Using sw As New StreamWriter(fo.FileName)
- For Each s As String In srcs
- sw.WriteLine(s)
- Next
- End Using
- End If
- End If
- End Sub
- Imports System.IO
- Imports System.Text.RegularExpressions
- Imports System.Net
- Public Class Form1
- Private Function GetBetweenAll(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String) As String()
- Dim Results, T As New List(Of String)
- T.AddRange(Regex.Split(Source, Str1))
- T.RemoveAt(0)
- For Each I As String In T
- Results.Add(Regex.Split(I, Str2)(0))
- Next
- Return Results.ToArray
- End Function
- Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
- If (Not TextBox1.Text = Nothing And Not TextBox2.Text = Nothing And Not TextBox3.Text = Nothing) Then
- Dim fo As New SaveFileDialog
- fo.Filter = "Text Files|*.txt"
- fo.FilterIndex = 1
- fo.Title = "Save Path"
- fo.ShowDialog()
- If (Not fo.FileName = Nothing) Then
- Dim r As HttpWebRequest = HttpWebRequest.Create(TextBox3.Text)
- Dim re As HttpWebResponse = r.GetResponse()
- Dim src As String = New StreamReader(re.GetResponseStream()).ReadToEnd()
- Dim srcs As String() = getbetweenall(src, TextBox1.Text, TextBox2.Text)
- Using sw As New StreamWriter(fo.FileName)
- For Each s As String In srcs
- sw.WriteLine(s)
- Next
- End Using
- End If
- End If
- End Sub
- End Class
Add new comment
- 1804 views