Following example will showcase methods which can provide relative as well as absolute URLs present in the html page.
Syntax
String url = "http://www.tutorialspoint.com/";
Document document = Jsoup.connect(url).get();
Element link = document.select("a").first();
System.out.println("Relative Link: " + link.attr("href"));
System.out.println("Absolute Link: " + link.attr("abs:href"));
System.out.println("Absolute Link: " + link.absUrl("href"));
Where
-
document − document object represents the HTML DOM.
-
Jsoup − main class to connect to a url and get the html content.
-
link − Element object represent the html node element representing anchor tag.
-
link.attr("href") − provides the value of href present in anchor tag. It may be relative or absolute.
-
link.attr("abs:href") − provides the absolute url after resolving against the document's base URI.
-
link.absUrl("href") − provides the absolute url after resolving against the document's base URI.
Description
Element object represent a dom elment and provides methods to get relative as well as absolute URLs present in the html page.
Example
1. sample html
<html>
<head>
<title></title>
<script type="text/javascript" src="/resource/js/jquery-1.7.1.min.js"></script>
<link type="text/css" href="/resource/css/admin/general.css" rel="stylesheet" />
</head>
<body>
<span id="navi">
<img src="https://www.phenomena.com/media/xstorage/banner/2022-03-17_PM1109_L46DRH2Y8P.png" alt="" />
</span>
</body>
</html>
2. code
package com.bad.blood.test;
import java.io.IOException;
import java.net.URL;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Test {
public static void main(final String[] args) throws IOException{
Document doc = Jsoup.parse(
new URL("http://127.0.0.1:8080/index.html").openConnection().getInputStream(),
"UTF-8",
"http://127.0.0.1:8080/");
Elements elems = doc.select("[src]");
for( Element elem : elems ){
if( !elem.attr("src").equals(elem.attr("abs:src")) ){
elem.attr("src", elem.attr("abs:src"));
}
}
elems = doc.select("[href]");
for( Element elem : elems ){
if( !elem.attr("href").equals(elem.attr("abs:href")) ){
elem.attr("href", elem.attr("abs:href"));
}
}
System.out.println(doc.toString());
}
}
3. result html
<html>
<head>
<title></title>
<script type="text/javascript" src="http://127.0.0.1:8080/resource/js/jquery-1.7.1.min.js"></script>
<link type="text/css" href="http://127.0.0.1:8080/resource/css/admin/general.css" rel="stylesheet" />
</head>
<body>
<span id="navi"> <img src="https://www.phenomena.com/media/xstorage/banner/2022-03-17_PM1109_L46DRH2Y8P.png" alt="" /></span>
</body>
</html>