extraction

Extract all bounding boxes using OpenCV Python

眉间皱痕 提交于 2019-11-28 06:04:21
I have an image that contains more than one bounding box. I need to extract everything that has bounding boxes in them. So far, from this site I've gotten this answer: y = img[by:by+bh, bx:bx+bw] cv2.imwrite(string + '.png', y) It works, however, it only gets one. How should I modify the code? I tried putting it in the loop for contours but it still spews out one image instead of multiple ones. Thank you so much in advance. Zaw Lin there you go: import cv2 im = cv2.imread('c:/data/ph.jpg') gray=cv2.cvtColor(im,cv2.COLOR_BGR2GRAY) contours, hierarchy = cv2.findContours(gray,cv2.RETR_LIST,cv2

How to extract data from a PDF file while keeping track of its structure?

和自甴很熟 提交于 2019-11-28 05:02:30
My objective is to extract the text and images from a PDF file while parsing its structure. The scope for parsing the structure is not exhaustive; I only need to be able to identify headings and paragraphs. I have tried a few of different things, but I did not get very far in any of them: Convert PDF to text. It does not work for me as I lose images and the structure of the document. Convert PDF to HTML. I found a few tools that helped me with this, and the best one so far is pdftohtml. The tool is really good presentation wise, but I haven't been able to successfully parse the HTML. Convert

Extract part of a git repository?

孤人 提交于 2019-11-28 04:31:12
Assume my git repository has the following structure: /.git /Project /Project/SubProject-0 /Project/SubProject-1 /Project/SubProject-2 and the repository has quite some commits. Now one of the subprojects (SubProject-0) grows pretty big, and I want to take SubProject-0 out and set it up as a standalone project. Is it possible to extract all the commit history involving SubProject-0 from the parent git repository and move it to a new one? See http://git-scm.com/docs/git-filter-branch I think you need something like git filter-branch --subdirectory-filter Project/SubProject-0 --prune-empty -- -

How do you extract a column from a multi-dimensional array?

血红的双手。 提交于 2019-11-28 02:39:11
Does anybody know how to extract a column from a multi-dimensional array in Python? >>> import numpy as np >>> A = np.array([[1,2,3,4],[5,6,7,8]]) >>> A array([[1, 2, 3, 4], [5, 6, 7, 8]]) >>> A[:,2] # returns the third columm array([3, 7]) See also: "numpy.arange" and "reshape" to allocate memory Example: (Allocating a array with shaping of matrix (3x4)) nrows = 3 ncols = 4 my_array = numpy.arange(nrows*ncols, dtype='double') my_array = my_array.reshape(nrows, ncols) Martin Geisler Could it be that you're using a NumPy array ? Python has the array module, but that does not support multi

How to randomly extract FASTA sequences using Python?

 ̄綄美尐妖づ 提交于 2019-11-28 02:11:16
I have the following sequences which is in a fasta format with sequence header and its nucleotides. How can I randomly extract the sequences. For example I would like to randomly select 2 sequences out of the total sequences. There are tools provided to do so is to extract according to percentage but not the number of sequences. Can anyone help me? A.fasta >chr1:1310706-1310726 GACGGTTTCCGGTTAGTGGAA >chr1:901959-901979 GAGGGCTTTCTGGAGAAGGAG >chr1:983001-983021 GTCCGCTTGCGGGACCTGGGG >chr1:984333-984353 CTGGAATTCCGGGCGCTGGAG >chr1:1154147-1154167 GAGATCGTCCGGGACCTGGGT Expected Output >chr1

How to extract full url with HtmlAgilityPack - C#

偶尔善良 提交于 2019-11-27 22:26:10
Alright with the way below it is extracting only referring url like this the extraction code : foreach (HtmlNode link in hdDoc.DocumentNode.SelectNodes("//a[@href]")) { lsLinks.Add(link.Attributes["href"].Value.ToString()); } The url code <a href="Login.aspx">Login</a> The extracted url Login.aspx But i want to get real link what browser parsed like http://www.monstermmorpg.com/Login.aspx I can do it with checking the url whether containing http and if not add the domain value but it may cause some problems at some occasions and i think not a very wise solution. c# 4.0 , HtmlAgilityPack.1.4.0

Is it possible to decompile a .dll/.pyd file to extract Python Source Code?

随声附和 提交于 2019-11-27 14:11:59
问题 Are there any ways to decompile a dll and/or a .pyd file in order to extract source code written in Python? Thanks in advance 回答1: I assume the .pyd/.dll files were created in Cython, not Python? Anyway, generally it's not possible, unless there's a decompiler designed specifically for the language the file was originally compiled from. And while I know about C, C++, Delphi, .NET and some other decompilers, I've yet to hear about Cython decompiler. Of course, what Cython does is convert your

How to extract data from a PDF file while keeping track of its structure?

狂风中的少年 提交于 2019-11-27 10:49:49
问题 My objective is to extract the text and images from a PDF file while parsing its structure. The scope for parsing the structure is not exhaustive; I only need to be able to identify headings and paragraphs. I have tried a few of different things, but I did not get very far in any of them: Convert PDF to text. It does not work for me as I lose images and the structure of the document. Convert PDF to HTML. I found a few tools that helped me with this, and the best one so far is pdftohtml. The

How to get audio data from a MP3?

一笑奈何 提交于 2019-11-27 10:01:58
问题 I'm working on an application that has to process audio files. When using mp3 files I'm not sure how to handle data (the data I'm interested in are the the audio bytes, the ones that represent what we hear). If I'm using a wav file I know I have a 44 bytes header and then the data. When it comes to an mp3, I've read that they are composed by frames, each frame containing a header and audio data. Is it possible to get all the audio data from a mp3 file? I'm using java (I've added MP3SPI,

how can we extract text from pdf using itextsharp with spaces?

▼魔方 西西 提交于 2019-11-27 03:39:56
问题 I am using below method to extract pdf text line by line. But problem that, it is not reading spaces between words and figures. what could be the solution for this ?? I just want to create a list of string, each string in list object has a text line from pdf as it is in pdf including spaces. public void readtextlinebyline(string filename) { List<string> strlist = new List<string>(); PdfReader reader = new PdfReader(filename); string text = string.Empty; for (int page = 1; page <= 1; page++) {