extract

Scraping text from file within HTML tags

泄露秘密 提交于 2019-11-29 08:51:36
I have a file that I want to extract dates from, it's a HTML source file so it's full of code and phrases I don't need. I need to extract every instance of a date that's wrapped in a specific HTML tag: abbr title="((this is the text I need))" data-utime=" What's the easiest way to achieve this? Dick Kusleika If you're using Excel VBA, set a reference (Tools - References) to the MSHTML library (entitled Microsoft HTML Object Library in the reference menu) Sub ScrapeDateAbbr() Dim hDoc As MSHTML.HTMLDocument Dim hElem As MSHTML.HTMLGenericElement Dim sFile As String, lFile As Long Dim sHtml As

How can you extract Hardware ID using Python?

末鹿安然 提交于 2019-11-29 07:16:08
How do you extract an HD and Bios Unique ID, using python script? Go Get Microsoft's Scriptomatic Run it, Select the appropriate class from the dropdown (WIN32_BIOS) It will produce the necessary Python/WMI code for you. (It will also generate VBScript, Perl, and JScript) Solutions that come to my mind: use Win32 Python Extensions and call Windows APIs to do that directly Use a WMI-wrapper for Python (some WMI interface code for reference) Edit : I assumed your OS was MS Windows :) On Linux, look in the /proc directory. You'll have to parse the files to find what you are looking for. This

Extract embedded PDF fonts to an external ttf file using some utility or script

时光怂恿深爱的人放手 提交于 2019-11-29 05:22:38
Is it possible to extract fonts that are embedded in a PDF file to an external ttf file using some utility or script? If the fonts that are embedded (or not embedded) to a PDF file are present in system. Using pdf2swf and swfextract tools from swftools I am able to determine names of the fonts used in a PDF file. Then I can compile respective system font(s) at run-time and then load to my AIR application. BUT if the fonts used in the PDF are absent in the system there are two possibilities: 2.1. If they are absent in the PDF files as well (not embedded), we can only use similar system font

Save tiff CCITTFaxDecode (from PDF page) using iText and Java

北战南征 提交于 2019-11-29 04:44:15
I'm using iText to extract embedded images and save them as separate files. The .jpg and .png files come out ok, but I cannot extract tiff images that have the CCITTFaxDecode encoding. Does anyone have a way of saving the tiff files? I found some sample C# code that uses iTextSharp at Extracting image from PDF with /CCITTFaxDecode filter It indicates a separate tiff library is needed to write out the results. According to that article, the "CCITTFaxDecode" compression is Compression.CCITTFAX4 for the tiff library. To use that article's method, I need: 1. get a tiff library. The Java Image I/O

Extract first word from a column and insert into new column [duplicate]

我怕爱的太早我们不能终老 提交于 2019-11-29 03:20:41
This question already has an answer here: Remove everything after space in string 4 answers I have a dataframe below and want to extract the first word and insert it into a new column Dataframe1: COL1 Nick K Jones Dave G Barros Matt H Smith Convert it to this: Dataframe2: COL1 COL2 Nick K Jones Nick Dave G Barros Dave Matt H Smith Matt You can use a regex ( "([A-Za-z]+)" or "([[:alpha:]]+)" or "(\\w+)" ) to grab the first word Dataframe1$COL2 <- gsub("([A-Za-z]+).*", "\\1", Dataframe1$COL1) Colibri We can use function stringr::word : library(stringr) Dataframe1$COL2 <- word(Dataframe2$COL1, 1)

How to extract metadata from a image using python?

拈花ヽ惹草 提交于 2019-11-29 02:55:21
问题 Hi im working on a program that will open an image and then extract the metadata from it How do i extract metadata using python ? Thanks 回答1: Use Pillow , it's a fork of PIL that is still in active development, and supports python3. Here I use a dict generator to map the exif data to a dict from PIL import Image, ExifTags img = Image.open("/path/to/file.jpg") exif = { ExifTags.TAGS[k]: v for k, v in img._getexif().items() if k in ExifTags.TAGS } 回答2: There is couple of ways by which you can

how to extract only the year from the date in sql server 2008?

半城伤御伤魂 提交于 2019-11-29 02:47:23
In sql server 2008, how to extract only the year from the date. In DB I have a column for date, from that I need to extract the year. Is there any function for that? Dumitrescu Bogdan year(@date) year(getdate()) year('20120101') update table set column = year(date_column) whre .... or if you need it in another table update t set column = year(t1.date_column) from table_source t1 join table_target t on (join condition) where .... select year(current_timestamp) SQLFiddle demo You can use year() function in sql to get the year from the specified date. Syntax: YEAR ( date ) For more information

Save Icon File To Hard Drive

百般思念 提交于 2019-11-29 02:06:52
I know that this must be incredibly easy - It's unbelievable how long I have searched for an answer to this question based on how simple it is in VB6. I simply want to extract an Icon from an EXE File using Icon.ExtractAssociatedIcon, and then save this icon file to my hard drive. So, here is what I have, and I will also show you what I have tried so you don't think I'm being lazy. OpenFileDialog ofd = new OpenFileDialog(); ofd.ShowDialog(); string s = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + @"\IconData.ico"; Icon ico = Icon.ExtractAssociatedIcon(ofd.FileName); Bitmap

Extract separate non-zero blocks from array

筅森魡賤 提交于 2019-11-29 00:29:19
having an array like this for example: [1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1] What's the fastest way in Python to get the non-zero elements organized in a list where each element contains the indexes of blocks of continuous non-zero values? Here the result would be a list containing many arrays: ([0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]) >>> L = [1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1] >>> import itertools >>> import operator >>> [[i for i,value in it] for key,it in itertools.groupby(enumerate(L), key=operator.itemgetter(1)) if key !=

C/C++ Code to treat a character array as a bitstream

浪尽此生 提交于 2019-11-29 00:08:52
问题 I have a big lump of binary data in a char[] array which I need to interpret as an array of packed 6-bit values. I could sit down and write some code to do this but I'm thinking there has to be a good extant class or function somebody has written already. What I need is something like: int get_bits(char* data, unsigned bitOffset, unsigned numBits); so I could get the 7th 6-bit character in the data by calling: const unsigned BITSIZE = 6; char ch = static_cast<char>(get_bits(data, 7 * BITSIZE,