Removing Html tags except few specific ones from String in java
public String clean(String unsafe){
Whitelist whitelist = Whitelist.none();
whitelist.addTags(new String[]{"p","br","ul"});
String safe = Jsoup.clean(unsafe, whitelist);
return StringEscapeUtils.unescapeXml(safe);
}
For input string
String unsafe = "<p class='p1'>paragraph</p>< this is not html > <a link='#'>Link</a> <![CDATA[<sender>John Smith</sender>]]>";
I get following output which is pretty much I require.
<p>paragraph</p>< this is not html > Link <sender>John Smith</sender>