Jsoup - Working with URLs

John Doe ·

1347 Views

Following example will showcase methods which can provide relative as well as absolute URLs present in the html page.

Syntax

String url = "http://www.tutorialspoint.com/";
Document document = Jsoup.connect(url).get();
Element link = document.select("a").first();         

System.out.println("Relative Link: " + link.attr("href"));
System.out.println("Absolute Link: " + link.attr("abs:href"));
System.out.println("Absolute Link: " + link.absUrl("href"));

Where

  • document − document object represents the HTML DOM.

  • Jsoup − main class to connect to a url and get the html content.

  • link − Element object represent the html node element representing anchor tag.

  • link.attr("href") − provides the value of href present in anchor tag. It may be relative or absolute.

  • link.attr("abs:href") − provides the absolute url after resolving against the document's base URI.

  • link.absUrl("href") − provides the absolute url after resolving against the document's base URI.

Description

Element object represent a dom elment and provides methods to get relative as well as absolute URLs present in the html page.

Example

1. sample html

<html>
<head>
    <title></title>
    <script type="text/javascript" src="/resource/js/jquery-1.7.1.min.js"></script>
    <link type="text/css" href="/resource/css/admin/general.css" rel="stylesheet" />
</head>
<body>
<span id="navi">
    <img src="https://www.phenomena.com/media/xstorage/banner/2022-03-17_PM1109_L46DRH2Y8P.png" alt="" />
</span>
</body>
</html>

2. code

package com.bad.blood.test;

import java.io.IOException;
import java.net.URL;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
 
 
public class Test  {
    public static void main(final String[] args) throws IOException{
        Document doc = Jsoup.parse(
                new URL("http://127.0.0.1:8080/index.html").openConnection().getInputStream(), 
                "UTF-8", 
                "http://127.0.0.1:8080/");
         
        Elements elems = doc.select("[src]");
        for( Element elem : elems ){
            if( !elem.attr("src").equals(elem.attr("abs:src")) ){
                elem.attr("src", elem.attr("abs:src"));
            }
        }
         
        elems = doc.select("[href]");
        for( Element elem : elems ){
            if( !elem.attr("href").equals(elem.attr("abs:href")) ){
                elem.attr("href", elem.attr("abs:href"));
            }
        }
         
        System.out.println(doc.toString());
    }
}

3. result html

<html>
<head>
    <title></title>
    <script type="text/javascript" src="http://127.0.0.1:8080/resource/js/jquery-1.7.1.min.js"></script>
    <link type="text/css" href="http://127.0.0.1:8080/resource/css/admin/general.css" rel="stylesheet" />
</head>
<body>
    <span id="navi"> <img src="https://www.phenomena.com/media/xstorage/banner/2022-03-17_PM1109_L46DRH2Y8P.png" alt="" /></span>
</body>
</html>

 

jsoup News Bugs Discussion Download API Reference Cookbook Try jsoup jsoup » jsoup: Java HTML Parser jsoup: Java HTML Parser jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and m

 

jsoup