jsoup

富文本编辑器实现从word中复制图片(外挂)

自作多情 提交于 2020-07-28 09:32:23
1问题   基于web的富文本编辑器的功能普遍较弱,而word是公认的宇宙第一好用的文档编辑器,所以许多人都习惯先在word中编辑,然后再将内容粘到web富文本编辑器中。   但是,这种操作有一个问题:图片带不过来,无法显示。如下所示。   我找到了一个方法来解决这个问题。 2 测试环境 summernote 0.8.18 office 2013 java 8 jsoup 1.7.2 3 原理   当我们按ctrl+c复制word中的图文内容时,在系统的剪切板中会生成了一个类型为HTML的条目。这个条目的内容类似于:   如上图所示,在运行期间word会将图片释放至某个临时目录,src使用的是file协议。   由于web编辑器可以识别data协议,所以我们可以将img的src由file:改为data:image/png;base64,然后将修改后的新内容复制至剪切板。这样就解决了问题。   这种方式很像游戏中的外挂。 4 关键代码 1 /** 2 * 3 */ 4 private void handle() { 5 try { 6 // 从剪切板中复制内容 7 Clipboard clipboard = Clipboard.getSystemClipboard(); 8 String html = clipboard.getHtml(); 9 textArea1.setText

java使用jsoup时绕过https证书验证

送分小仙女□ 提交于 2020-07-27 12:00:00
java 代码 增加一个工具类 在jsoup获取之前调用此方法 //your code SSLHelper.init(); Connection connect = Jsoup.connect(url).userAgent(USER_AGENT); connect.header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); connect.header("Accept-Encoding", "gzip, deflate, sdch"); connect.header("Accept-Language", "zh-CN,zh;q=0.8"); connect.timeout(3000); connect.ignoreHttpErrors(true); Document doc = connect.get(); package com.bookmark.analysis.common.util; import javax.net.ssl.HttpsURLConnection; import javax.net.ssl.SSLContext; import javax.net.ssl.X509TrustManager; import java.security

使用 Jsoup 解析 HTML 文本内容

泪湿孤枕 提交于 2020-07-25 23:29:55
jsoup:Java HTML解析器。 它 是一个用于处理实际 HTML 的 Java 库。它使用 HTML5 最佳 DOM 方法和 CSS 选择器,为获取 URL 以及提取和处理数据提供了非常方便的 API ,在 web 开发中,可以用来解析富文本内容或者在爬虫抓取网页数据时候解析网页内容,等等都可以用到,在此记录下。 jsoup 实现 WHATWG HTML5 规范,并将HTML解析为与现代浏览器相同的DOM。 从URL,文件或字符串中抓取并 解析 HTML 使用DOM遍历或CSS选择器 查找 和提取数据 处理 HTML元素,属性和文本 根据安全的白名单 清除 用户提交的内容,以防止XSS攻击 输出 整洁的HTML 官网: https://jsoup.org/ 1、引入 Jsoup 依赖 <!-- 解析HTML --> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.12.1</version> </dependency> 2、简单使用示例 public static void main(String[] args) { String text = Jsoup.parse("<p style='text-align: center;'><strong

Android Jsoup, Why I cannot get correct src of img

社会主义新天地 提交于 2020-07-22 14:12:28
问题 I cannot get correct img src. This is HTML I want to get.This image is data scheme URI. <img class="rg_i Q4LuWd tx8vtf" src="data:image/jpeg;base64,9j/4AAQSkZJR ~~~ TOO LONG ~~~/Z" data-deferred="1" jsname="Q4LuWd" alt="大阪の保護猫カフェ - SAVE CAT CAFE" data-iml="610.9050000086427" data-atf="true"> And, This is my code. val url = "https://www.google.com/search?q=cat&sxsrf=ALeKk01jWgnZ1Jwok_XfrhRYTdkwZecETg:1587538774281&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiy3dTluvvoAhUPyosBHQtMAP8Q_AUoAXoECA8QAw

Problem in logging into website using Jsoup

余生长醉 提交于 2020-06-17 13:16:10
问题 When I try to scarp my college website using Jsoup its not filling the form in the website.But when i login website using browser with same credentials its logging into the page. I have given cookies with useragent too. Its the first time i am using jsoup. I tried all the ways using jsoup Please help me login to the page oncreate bundle the following is executed new C4977c().execute(); //enter captcha and onclick //onclick button btn.setOnClickListener(new View.OnClickListener() { @Override

How to convert Jsoup Document in a String without put spaces

|▌冷眼眸甩不掉的悲伤 提交于 2020-06-01 09:24:33
问题 I have converted an XML document within a Document object Jsoup. Turns out, when I need to output to String format it generates this result below: <?xml version="1.0" standalone="yes"?> <NewDataSet xmlns="http://www.portalfiscal.inf.br/nfe"> <nfeProc versao="2.00"> <NFe> <infNFe versao="2.00" id="NFe31140545453214002014550120002685744002685742"> <cUF> 31 </cUF> <cNF> 00268574 </cNF> ... Scores generated this brings me a lot of problems, since he Colca whitespace within elements, and this

what makes Jsoup faster than HttpURLConnection & HttpClient in most cases

拜拜、爱过 提交于 2020-05-16 05:58:25
问题 I want to compare performances for the three implementations mentioned in the title, I wrote a little JAVA program to help me doing this. The main method contains three blocks of testing, each block looks like this : nb=0; time=0; for (int i = 0; i < 7; i++) { double v = methodX(url); if(v>0){ nb++; time+=v; } } if(nb==0) nb=1; System.out.println("HttpClient : "+(time/ ((double) nb))+". Tries "+nb+"/7"); Variable nb is used to avoid failed requests. Now method methodX is one of : private

How to bypass cloudflare ddos or redirect after 5 seconds using JSOUP?

拥有回忆 提交于 2020-05-12 08:59:03
问题 I'm trying to get anime-list in this site, https://ww1.gogoanime.io this is the code, org.jsoup.Connection.Response usage = Jsoup.connect("https://ww1.gogoanime.io/anime-list-A") .header("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8") .header("accept-encoding", "gzip, deflate, sdch, br") .header("accept-language", "en-US,en;q=0.8") .header("cache-control", "max-age=0") .header("user-agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36

How to bypass cloudflare ddos or redirect after 5 seconds using JSOUP?

柔情痞子 提交于 2020-05-12 08:55:29
问题 I'm trying to get anime-list in this site, https://ww1.gogoanime.io this is the code, org.jsoup.Connection.Response usage = Jsoup.connect("https://ww1.gogoanime.io/anime-list-A") .header("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8") .header("accept-encoding", "gzip, deflate, sdch, br") .header("accept-language", "en-US,en;q=0.8") .header("cache-control", "max-age=0") .header("user-agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36

Use Jsoup to get all href values from a specific class

删除回忆录丶 提交于 2020-05-12 07:00:27
问题 I was trying to parse my university website, to get a list of news (title + link) from main site. However, as I'm trying to parse a full website, links that I am looking for are nested deep in other classes, tables etc. Here's the code I tried to use: String url = "http://www.portal.pwr.wroc.pl/index,241.dhtml"; Document doc = Jsoup.connect(url).get(); Elements links = doc.select("table.cwrapper .tbody .tr td.ccol2 div.cwrapper_padd div#box_main_page_news.cbox.grey div#dyn_main_news.cbox