jsoup

JSoup: Difficulty extracting a single element

蓝咒 提交于 2020-03-05 04:12:08
问题 For my college coding project, I am tasked with grabbing the live value of bitcoin from the internet and incorporating it into a mini "bitcoin program." The issue is that I am having difficulty extracting the value of bitcoin from certain websites. Any and all help would be greatly appreciated. I have tried using different websites, with mixed results. Example 1 final String url = "https://www.coindesk.com/price/bitcoin"; try { Document doc = Jsoup.connect(url).get(); Element ele = doc.select

使用zt-exec库定时清理linux休眠进程

夙愿已清 提交于 2020-03-01 14:52:36
在几个月前上线的一个采集项目,构架是基于java + selenium + chromedriver + chrome实现的采集。至于为哈不直接用jsoup或httpclient实现采集功能,是因为很多被采集页面都是通过js来渲染内容的,所以必须用webdriver+chrome来模拟真正的浏览器访问来采集。 每隔一段时间就会出采集失败问题,出现的时间没有规律,可能两天出现一次,可能一星期出现一次,可能一个月出现一次.... 用linux top命令来查看服务器,会发现很多的chromedriver和chrome的进程 用ps命令查看服务器 ps -aux | grep chrome 存在状态为Sl和Z的休眠进程和僵尸进程,启动时间都不是当天,根据系统本身业务逻辑,进程不会存在运行那么长时间的情况。而java进程则全部都能正常关闭,但java进程启动的chromedriver和chrome进程不一定能同时关闭,目前出现这种问题的原因未找到。 最初想用命令把卡死的进程查出来批量杀掉 ps -A -o stat,ppid,pid,cmd | grep -e '^[Zz]' | awk '{print $2}' | xargs kill -9 //杀死僵尸进程 结果发现只能查杀Z状态的僵尸进程,Sl状态的进程,一部分是正常的,一部分是需要杀死的(启动时间为Nov07

Preventing XSS with JSOUP

最后都变了- 提交于 2020-02-28 13:46:32
JSOUP is XSS prevention tool. Jsoup can detect xss script in html and url also. Now i am giving example with url. Jsoup can validate the url with the help of "isValidate()" method. "isValidate()" method return type is boolean. If return type is true that means url having xss script so we need to clean the url with the help of "clean()" method. "clean()" method will return clean url as string. JSOUP can handle all cheat sheet scenarios. url of cheat sheet is: "https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet" JSOUP has clear api. "http://jsoup.org/apidocs/" Antisamy also one of

数据爬取一例

≡放荡痞女 提交于 2020-02-27 04:17:28
最近赋闲在家, 就想找点事情做,就想着爬爬小视频什么的,于是就有了下面这个程序, 没什么难度, 就是给大家分享下思路 下面就用 https:// www。avtb6677 。 com来举例: 1, 找一个工具来下载相关页面代码, 我用的是teleport来只下载页面, 不下载其他的, 这大概等了1个小时就好了, 这工具挺好的,比写代码来爬全站来得快. 2, 这一个小时期间, 我写完了以下代码, 大概只用了20分钟吧. package aa; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import com.alibaba.fastjson.JSON; public class AvTaoBao6677 { public static String list = "c:/aaa.list"; //下载页面的目录 static String

【Java Web_07】XML

喜你入骨 提交于 2020-02-26 17:02:07
一、XML概述 1. 什么是XML * Extensible Markup Language 可扩展标记语言 2. XML的基本语法 ① 基本语法: * xml文档的后缀名 .xml * xml首行必须定义为文档声明 * xml文档中有且仅有一个根标签 * 属性值必须使用引号(单双都可)引起来 * 标签必须正确关闭 * xml标签名称区分大小写 # XML语法严格,标签自定义,主要用来存储数据 # CDATA 区【 <![CDATA[ 数据 ]]>】内数据将原样展示 ② 示例 <?xml version='1.0' encoding="UTF-8" ?> <users> <user id='001'> <name>tom</name> <age>18</age> <gender>male</gender> <br/> </user> <user id='002'> <name>jack</name> <age>18</age> <gender>female</gender> </user> </users> 3. XML约束 ① 分类 * DTD【简单、有漏洞】 * Schema ② DTD使用方法 * 本地:<!DOCTYPE 根标签名 SYSTEM "dtd文件的位置"> * 网络:<!DOCTYPE 根标签名 PUBLIC "dtd文件名字" "dtd文件的位置URL"> ③

How to get mobile response of webpage using java or jsoup

时光毁灭记忆、已成空白 提交于 2020-02-24 11:41:48
问题 I am trying to get response of youtube.com using java with JSoup. I am able to get the response of youtube using JSoup as follows, it returns the desktop website's response String str = "https://www.youtube.com/"; doc = Jsoup.connect(str) .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36") .get(); Same way, I am trying to get the response for mobile version to this same site as follows, doc = Jsoup.connect("https://"+url2

how to parse a table from HTML using jsoup

落花浮王杯 提交于 2020-02-10 05:20:47
问题 <td width="10"></td> <td width="65"><img src="/images/sparks/NIFTY.png" /></td> <td width="65">5,390.85</td> <td width="65">5,428.15</td> <td width="65">5,376.15</td> <td width="65">5,413.85</td> This is the HTML source from which i have to extract the values 5390.85,5428.15 , 5376.15 , 5413.85. I wanted to do this using jsoup. But i am relatively new to jsoup( today i started using it). So how should i do this? URL url = new URL("http://www.nseindia.com/content/equities/niftysparks.htm");

Parsing a tag with JSOUP force closing for nullPointerException

你离开我真会死。 提交于 2020-02-08 02:35:44
问题 When trying to parse the link http://pc.gamespy.com/pc/bastion/ using Element overview = doc.select("div#object-overview").last(); Element paragraph = overview.select("p").last(); It gives me a nullpointerexception. And also with this one http://wii.gamespy.com/wii/jerry-rice-nitus-dog-football/ it gives null pointer here Element featureList = doc.select("div.callout-box").last(); featuresText.setText("FEATURE: " + featureList.text()); Why is this? I am trying to retrieve the overview section

I can't get value from script to jsoup

浪子不回头ぞ 提交于 2020-02-06 07:41:46
问题 I want to code get value stream but fail. How to get value stream : http://123.30.215.65/hls/4545780bfa790819/5/3/d836ad614748cdab11c9df291254cf836f21144da20bf08142455a8735b328ca/dnR2MQ==_m.m3u8 using Jsoup ? <html> <head> <style>html,body{margin:0;padding:0;background:#000;;}</style> <meta charset="utf-8"> <script src="https://code.jquery.com/jquery-2.1.4.js"></script> <script type="text/javascript" src="https://cdn.jsdelivr.net/clappr/latest/clappr.min.js"></script> <meta name="referrer"