extract | 易学教程

How to extract short sequence using window with specific step size?

阅读更多关于 How to extract short sequence using window with specific step size?

问题 The code below extract short sequence in every sequence with the window size 4. How to shift the window by step size 2 and extract 4 base pairs? Example code from Bio import SeqIO with open("testA_out.fasta","w") as f: for seq_record in SeqIO.parse("testA.fasta", "fasta"): i = 0 while ((i+4) < len(seq_record.seq)) : f.write(">" + str(seq_record.id) + "\n") f.write(str(seq_record.seq[i:i+4]) + "\n") i += 2 Example Input of testA.fasta >human1 ACCCGATTT Example Output of testA_out >human1 ACCC

extract specific text using multiple regex in python?

阅读更多关于 extract specific text using multiple regex in python?

问题 I have a problem using regular expressions in python 3 so I would be gladful if someone could help me. I have a text file like the one below: Header A text text text text Header B text text text text Header C text text here is the end what I would like to do is to have a list of the text between the headers but including the headers themselves. I am using this regular expression: re.findall(r'(?=(Header.*?Header|Header.*?end))',data, re.DOTALL) the result is here ['Header A\ntext text\n text

How to extract JSON data from a response containing a header and body?

阅读更多关于 How to extract JSON data from a response containing a header and body?

问题 this is my first question posed to Stack Overflow, because typically I can find the solutions to my problem here, but for this particular situation, I cannot. I am writing a Python plugin for my compiler that outputs REST calls in various languages for interaction with an API. I am authenticating with the socket and ssl modules by sending a username and password in the request body in JSON form. Upon successful authentication, the API returns a response in the following format with important

How to extract JSON data from a response containing a header and body?

阅读更多关于 How to extract JSON data from a response containing a header and body?

How to extract JSON data from a response containing a header and body?

阅读更多关于 How to extract JSON data from a response containing a header and body?

How to crawl links on all pages of a web site with Scrapy

阅读更多关于 How to crawl links on all pages of a web site with Scrapy

问题 I'm learning about scrapy and I'm trying to extract all links that contains: "http://lattes.cnpq.br/andasequenceofnumbers" , example: http://lattes.cnpq.br/0281123427918302 But I don't know what is the page on the web site that contains these information. For example this web site: http://www.ppgcc.ufv.br/ The links that I want are on this page: http://www.ppgcc.ufv.br/?page_id=697 What could I do? I'm trying to use rules but I don't know how to use regular expressions correctly. Thank you 1

Extract only body text from arXiv articles formatted as .tex

阅读更多关于 Extract only body text from arXiv articles formatted as .tex

问题 My dataset is composed of arXiv astrophysics articles as .tex files, and I need to extract only text from the article body, not from any other part of the article (e.g. tables, figures, abstract, title, footnotes, acknowledgements, citations, etc.). I've been trying with Python3 and tex2py, but I'm struggling with getting a clean corpus, because the files differ in labeling & the text is broken up between labels. I have attached a SSCCE, a couple sample Latex files and their pdfs, and the

Extract all .gz file in folder using VBA Shell command

阅读更多关于 Extract all .gz file in folder using VBA Shell command

问题 I have the following VBA code to extract all the files within a given directory. Sub extractAllFiles() Dim MyObj As Object, MySource As Object, file As Variant Dim shellStr As String file = Dir("C:\Downloads\") While (file <> "") If InStr(file, ".gz") > 0 Then shellStr = "winzip32 -e C:\Downloads\" & file & " C:\Downloads\" Call Shell(shellStr, vbHide) End If file = Dir Wend End Sub When I execute this sub routine I get a Run-Time error 53, "File Not Found" error. When I copy the shellStr...

Extract text from PDF in code

阅读更多关于 Extract text from PDF in code

问题 I'm making an app for my school which people can check with if they've got a schedule change. All schedule changes are listed here: http://www.augustinianum.eu/roosterwijzigingen/14062012.pdf. I want to search that page for a keyword (the user's group, which is entered in an EditText). I've found out how to make the app check if the edittext matches a certain string, so now I only need to download all of the text on that page to a string. But the problem is that it's not a simple webpage, but

Extract Meta Keywords From Webpage?

阅读更多关于 Extract Meta Keywords From Webpage?

问题 I need to extract the meta keywords from a web page using Python. I was thinking that this could be done using urllib or urllib2, but I'm not sure. Anyone have any ideas? I am using Python 2.6 on Windows XP 回答1: lxml is faster than BeautifulSoup (I think) and has much better functionality, while remaining relatively easy to use. Example: 52> from urllib import urlopen 53> from lxml import etree 54> f = urlopen( "http://www.google.com" ).read() 55> tree = etree.HTML( f ) 61> m = tree.xpath( "/