text-extraction | 易学教程

Excel VBA to return Page Count from protected PDF file

阅读更多关于 Excel VBA to return Page Count from protected PDF file

问题 I need to retrieve the number of pages in PDF files (with security ), using Excel VBA. The following code works when there is no security enabled in the PDF file: Sub PDFandNumPages() Dim Folder As Object Dim file As Object Dim fso As Object Dim iExtLen As Integer, iRow As Integer Dim sFolder As String, sExt As String Dim sPDFName As String sExt = "pdf" iExtLen = Len(sExt) iRow = 1 ' Must have a '\' at the end of path sFolder = "C:\test\" Set fso = CreateObject("Scripting.FileSystemObject")

Excel VBA to return Page Count from protected PDF file

阅读更多关于 Excel VBA to return Page Count from protected PDF file

Excel VBA to return Page Count from protected PDF file

阅读更多关于 Excel VBA to return Page Count from protected PDF file

Extract street address from a string

阅读更多关于 Extract street address from a string

问题 Is there any way to extract a street address from a string (say, email) using python? The address does not come in a set format. It can come without state, zip code, city, but I can guess and supply these parameters if they are missing. Also, the address may be represented by a corner of two streets. Once I extract the address, I want to send it to Google Map or other similar service to get back the real, formatted address. It doesn't need to be 100% accurate, but is there any library to do

Extract street address from a string

阅读更多关于 Extract street address from a string

How can I get the first string from a div that has a div embedded beautifulsoup4

阅读更多关于 How can I get the first string from a div that has a div embedded beautifulsoup4

问题 I'm trying to extract prices from a website. The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price. This is an example of the code without the old price (which my code returns as a string) <div class="xl-price rangePrice"> 535.000 € </div> This is an example of the code WITH the old price (which my code returns as "none") < div class ="xl-price rangePrice" > 487.000 € < span class ="old-price" >

What would be the best way to extract square meters from a string that also mentions the amount of bedrooms?

阅读更多关于 What would be the best way to extract square meters from a string that also mentions the amount of bedrooms?

问题 I'm trying to extract: <div class="xl-surface-ch"> 84 m² 2 bed. </div> from link the problem is, I only need the "84" in this string (they sometimes go over 2 or 3 digits as well). Added difficulty is that sometimes the square meters are not mentioned, which looks like this: <div class="xl-surface-ch"> 2 bed. </div> and in that case I'd need to return a 0 My best attempt is: sqm = [] for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}): item = item.contents[0].strip()[0:4]

How to extract text from table in image?

阅读更多关于 How to extract text from table in image?

问题 I have data which in a structured table image. The data is like below: I tried to extract the text from this image using this code: import pytesseract from PIL import Image value=Image.open("data/pic_table3.png") text = pytesseract.image_to_string(value, lang="eng") print(text) and, here is the output: EA Domains Traditional role Future role Technology e Closed platforms ¢ Open platforms e Physical e Virtualized Applicationsand |e Proprietary e Inter-organizational Integration e Siloed

How to extract text from table in image?

阅读更多关于 How to extract text from table in image?

C# Extract Text by using PdfSharp return unreadable content

阅读更多关于 C# Extract Text by using PdfSharp return unreadable content

问题 I managed to extract text from PDF version 1.2 by using PdfSharp as refer to this link My code to extract text private string ExtractText(CObject cObject, ref string pdfcontentstr) { if (cObject is COperator) { var cOperator = cObject as COperator; if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() || cOperator.OpCode.Name == OpCodeName.TJ.ToString()) { foreach (var cOperand in cOperator.Operands) { ExtractText(cOperand, ref pdfcontentstr); } } } else if (cObject is CSequence) { var