data-extraction

How can extract data from .h5 file and save it in .txt or .csv properly?

扶醉桌前 提交于 2021-01-29 19:35:46
问题 After searching a lot I couldn't find a simple way to extract data from .h5 and pass it to a data.Frame by Numpy or Pandas in order to save in .txt or .csv file. import h5py import numpy as np import pandas as pd filename = 'D:\data.h5' f = h5py.File(filename, 'r') # List all groups print("Keys: %s" % f.keys()) a_group_key = list(f.keys())[0] # Get the data data = list(f[a_group_key]) pd.DataFrame(data).to_csv("hi.csv") Keys: <KeysViewHDF5 ['dd48']> When I print data I see following results:

Extract date from multiple webpages with Python

﹥>﹥吖頭↗ 提交于 2021-01-28 08:35:08
问题 I want to extract date when news article was published on websites. For some websites I have exact html element where date/time is (div, p, time) but on some websites I do not have: These are the links for some websites (german websites): (3 Nov 2020) http://www.linden.ch/de/aktuelles/aktuellesinformationen/?action=showinfo&info_id=1074226 (Dec. 1, 2020) http://www.reutigen.ch/de/aktuelles/aktuellesinformationen/welcome.php?action=showinfo&info_id=1066837&ls=0&sq=&kategorie_id=&date_from=

Extract text from pdf converted from webpage using Pypdf2

狂风中的少年 提交于 2020-06-29 04:34:38
问题 I used chrome to convert a webpage into Pdf using save as pdf option. Now the problem is that when I extract the data from it using PyPDF2, it shows Null whereas it works on other pdf files easily. I know that I can extract data directly from the website but I want to understand why this is not working. It shows the correct number of pages but when I extracttext(), it shows nothing. Does anyone know what is the problem? The link to the page is https://en.wikipedia.org/wiki/Rapping. I

How to digitize (extract data from) a heat map image using Python?

浪子不回头ぞ 提交于 2020-06-24 10:52:06
问题 There are several packages available to digitize the line graphs e.g. GetData Graph Digitizer. However, for digitzation of heat maps I could not find any packages or programs. I want to digitize the heat map (images from png or jpg format) using Python. How to do it? Do I need to write the entire code from scratch? Or there are any packages available? 回答1: There are multiple ways to do it, many Machine Learning libraries offering custom visualization functions...easier or harder. You need to

How to create a new list item with FLOWR and XQuery?

我的梦境 提交于 2020-02-25 06:30:49
问题 I'm looking to select non-numerical data from an XML file towards shredding it into database columns, or at least an xmltable -like structure. This FLWOR gives a somewhat useful result: xquery version "3.0"; declare namespace office="urn:oasis:names:tc:opendocument:xmlns:text:1.0"; <ul> { for $foo in db:open("foo") return <li>{$foo//text()[not(matches(., '[0-9]'))]}</li> } </ul> However, it outputs all results into a single li tag, like: a b c d Preferred output would be of the form: a b Most

How can I get the first string from a div that has a div embedded beautifulsoup4

巧了我就是萌 提交于 2020-02-02 13:02:31
问题 I'm trying to extract prices from a website. The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price. This is an example of the code without the old price (which my code returns as a string) <div class="xl-price rangePrice"> 535.000 € </div> This is an example of the code WITH the old price (which my code returns as "none") < div class ="xl-price rangePrice" > 487.000 € < span class ="old-price" >

issue of the recognize people by their clothes color with not severe illumination environments

倾然丶 夕夏残阳落幕 提交于 2020-01-24 20:23:04
问题 I am interested in the human following using a real robot. I'd like to use the color of clothes as a key feature to identify the target person in front of the robot to follow him/ her but I am suffering due to it is a weak feature with a very simple illumination changing. So, I need to alter this algorithm to another or update values (RGB) online in real-time but I don't have enough experience with image processing. this is my full code for color detection: import cv2 import numpy as np from