News:

Choose a design and let our professionals help you build a successful website   - ITAcumens

Main Menu

Use HTML Parser To Extract Links From Web Page

Started by dhilipkumar, Nov 04, 2008, 09:21 PM

Previous topic - Next topic

dhilipkumar

This article will describe how you can use our HTML Parser library HTMLParser.Net to parse and analyze a web page to extract all outgoing links like Image, PageLinks, FTP, Mail etc. The library does all the hard work for you to create nice hierarchical view of all the tags. Only thing that you need to specify is what specific information you are interested in extracting.

The process of extracting links can be achieved by writing 3 lines of code which starts by creating a Parser object which takes page's URL as an argument. And then you call GetAllOutLinks method on it. And it will return you string collection containing URLs of all links.


Sub ExtractOutlinksFromPage(ByVal strUrl As String)
    Dim obParser As Parser
    Dim obPageData As PageData
    obParser = New Parser(New System.Uri(strUrl))
    obPageData = obParser.GetAllOutLinks(1, True)
    Console.WriteLine(obPageData.OutLinks.Count)
    For Each obLinkData As LinkData In obPageData.OutLinks
        Console.WriteLine("Depth[{0}] : Link Url={1}", obLinkData.Depth, obLinkData.Url)
    Next
End Sub