text-extraction

Excel VBA to return Page Count from protected PDF file

人走茶凉 提交于 2020-07-24 05:47:49
问题 I need to retrieve the number of pages in PDF files (with security ), using Excel VBA. The following code works when there is no security enabled in the PDF file: Sub PDFandNumPages() Dim Folder As Object Dim file As Object Dim fso As Object Dim iExtLen As Integer, iRow As Integer Dim sFolder As String, sExt As String Dim sPDFName As String sExt = "pdf" iExtLen = Len(sExt) iRow = 1 ' Must have a '\' at the end of path sFolder = "C:\test\" Set fso = CreateObject("Scripting.FileSystemObject")

Excel VBA to return Page Count from protected PDF file

夙愿已清 提交于 2020-07-24 05:46:51
问题 I need to retrieve the number of pages in PDF files (with security ), using Excel VBA. The following code works when there is no security enabled in the PDF file: Sub PDFandNumPages() Dim Folder As Object Dim file As Object Dim fso As Object Dim iExtLen As Integer, iRow As Integer Dim sFolder As String, sExt As String Dim sPDFName As String sExt = "pdf" iExtLen = Len(sExt) iRow = 1 ' Must have a '\' at the end of path sFolder = "C:\test\" Set fso = CreateObject("Scripting.FileSystemObject")

Excel VBA to return Page Count from protected PDF file

谁说胖子不能爱 提交于 2020-07-24 05:45:09
问题 I need to retrieve the number of pages in PDF files (with security ), using Excel VBA. The following code works when there is no security enabled in the PDF file: Sub PDFandNumPages() Dim Folder As Object Dim file As Object Dim fso As Object Dim iExtLen As Integer, iRow As Integer Dim sFolder As String, sExt As String Dim sPDFName As String sExt = "pdf" iExtLen = Len(sExt) iRow = 1 ' Must have a '\' at the end of path sFolder = "C:\test\" Set fso = CreateObject("Scripting.FileSystemObject")

Extract street address from a string

喜你入骨 提交于 2020-03-28 07:04:44
问题 Is there any way to extract a street address from a string (say, email) using python? The address does not come in a set format. It can come without state, zip code, city, but I can guess and supply these parameters if they are missing. Also, the address may be represented by a corner of two streets. Once I extract the address, I want to send it to Google Map or other similar service to get back the real, formatted address. It doesn't need to be 100% accurate, but is there any library to do

Extract street address from a string

僤鯓⒐⒋嵵緔 提交于 2020-03-28 07:02:28
问题 Is there any way to extract a street address from a string (say, email) using python? The address does not come in a set format. It can come without state, zip code, city, but I can guess and supply these parameters if they are missing. Also, the address may be represented by a corner of two streets. Once I extract the address, I want to send it to Google Map or other similar service to get back the real, formatted address. It doesn't need to be 100% accurate, but is there any library to do

How can I get the first string from a div that has a div embedded beautifulsoup4

巧了我就是萌 提交于 2020-02-02 13:02:31
问题 I'm trying to extract prices from a website. The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price. This is an example of the code without the old price (which my code returns as a string) <div class="xl-price rangePrice"> 535.000 € </div> This is an example of the code WITH the old price (which my code returns as "none") < div class ="xl-price rangePrice" > 487.000 € < span class ="old-price" >

What would be the best way to extract square meters from a string that also mentions the amount of bedrooms?

跟風遠走 提交于 2020-01-24 00:23:10
问题 I'm trying to extract: <div class="xl-surface-ch">  84 m²    2 bed. </div> from link the problem is, I only need the "84" in this string (they sometimes go over 2 or 3 digits as well). Added difficulty is that sometimes the square meters are not mentioned, which looks like this: <div class="xl-surface-ch">    2 bed. </div> and in that case I'd need to return a 0 My best attempt is: sqm = [] for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}): item = item.contents[0].strip()[0:4]

How to extract text from table in image?

半世苍凉 提交于 2020-01-15 04:53:07
问题 I have data which in a structured table image. The data is like below: I tried to extract the text from this image using this code: import pytesseract from PIL import Image value=Image.open("data/pic_table3.png") text = pytesseract.image_to_string(value, lang="eng") print(text) and, here is the output: EA Domains Traditional role Future role Technology e Closed platforms ¢ Open platforms e Physical e Virtualized Applicationsand |e Proprietary e Inter-organizational Integration e Siloed

How to extract text from table in image?

断了今生、忘了曾经 提交于 2020-01-15 04:53:05
问题 I have data which in a structured table image. The data is like below: I tried to extract the text from this image using this code: import pytesseract from PIL import Image value=Image.open("data/pic_table3.png") text = pytesseract.image_to_string(value, lang="eng") print(text) and, here is the output: EA Domains Traditional role Future role Technology e Closed platforms ¢ Open platforms e Physical e Virtualized Applicationsand |e Proprietary e Inter-organizational Integration e Siloed

C# Extract Text by using PdfSharp return unreadable content

感情迁移 提交于 2020-01-04 09:40:05
问题 I managed to extract text from PDF version 1.2 by using PdfSharp as refer to this link My code to extract text private string ExtractText(CObject cObject, ref string pdfcontentstr) { if (cObject is COperator) { var cOperator = cObject as COperator; if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() || cOperator.OpCode.Name == OpCodeName.TJ.ToString()) { foreach (var cOperand in cOperator.Operands) { ExtractText(cOperand, ref pdfcontentstr); } } } else if (cObject is CSequence) { var