data-extraction

How can I extract/parse tabular data from a text file in Perl?

百般思念 提交于 2019-12-04 05:32:37
I am looking for something like HTML::TableExtract , just not for HTML input, but for plain text input that contains "tables" formatted with indentation and spacing. Data could look like this: Here is some header text. Column One Column Two Column Three a b a b c Some more text Another Table Another Column abdbdbdb aaaa Not aware of any packaged solution, but something not very flexible is fairly simple to do assuming you can do two passes over the file: (the following is partially Perlish pseudocode example) Assumption: data may contain spaces and is NOT quoted ala CSV if there's a space - if

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

早过忘川 提交于 2019-12-02 23:09:37
Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it into a data-warehouse. How much data? At peak times we are talking approx 80-100k rows per min hitting the OLTP side, off-peak this will drop significantly to 15-20k. The most frequently updated rows are ~64 bytes each but there are various tables etc so the data

How to extract a subset from a CSV file using NiFi

為{幸葍}努か 提交于 2019-12-02 09:02:33
I have a csv file say with 100+ columns and I want to extract only specific 60 columns as a subset(both column name + its value). I know we can use Extract Text processors. Can anyone tell me what regular expression to write ? Ex- Lets say from the given snapshot I only want NiFi to Extract 'BMS_sw_micro', 'BMU_Dbc_Dbg_Micro', 'BMU_Dbc_Fia_Micro' columns i.e. Extract only column 'F,L,O'. any help is much appreciated! As I said in the comment, you can Count the number of commas before the text, you want to match and use that in the RegEx, like this: /(?<=^([^,]+?,){5})[^,]+/ What the RegEx do

Extracting x value given y threshold from polyfit plot (Matlab)

懵懂的女人 提交于 2019-12-02 08:31:52
As shown by the solid and dashed line, I'd like to create a function where I set a threshold for y (Intensity) from that threshold it gives me corresponding x value (dashed line). Very simple but my while statement is off. Any help would be much appreciated! %% Curve fit plotting %% x1 = timeStamps(1:60); % taking timestamps from 1 - 120 given smoothed y1 values y1 = smooth(tic_lin(1:60),'sgolay',1); % Find coefficients for polynomial (order = 4 and 6, respectively) fitResults1 = polyfit(x1',y1, 7); % evaluate the fitted y-values yplot1 = polyval(fitResults1,x1'); % interpolates to find yi,

Weather data scraping and extraction in R [closed]

ⅰ亾dé卋堺 提交于 2019-12-02 08:28:30
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . I'm working on a research project and am assigned to do a bit of data scraping and writing code in R that can help extract current temperature for a particular zip code from a site such as wunderground.com. Now this may be a bit of an abstract question but does anyone know how to

How to extract a subset from a CSV file using NiFi

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-02 06:54:19
问题 I have a csv file say with 100+ columns and I want to extract only specific 60 columns as a subset(both column name + its value). I know we can use Extract Text processors. Can anyone tell me what regular expression to write ? Ex- Lets say from the given snapshot I only want NiFi to Extract 'BMS_sw_micro', 'BMU_Dbc_Dbg_Micro', 'BMU_Dbc_Fia_Micro' columns i.e. Extract only column 'F,L,O'. any help is much appreciated! 回答1: As I said in the comment, you can Count the number of commas before the

PostgreSQL Query to Excel Sheet

孤街浪徒 提交于 2019-11-30 06:41:40
I need to export some data from PostgreSQL to Excel (quick customer wish), and the last time Excel had serious problems opening or importing my COPYd csv files (line endings, utf-8 encoding, etc), and it took me an hour at best. Does someone know a quick, elegant solution that generates a real Excel file? Like a small shell script or the like? I want this to be done either on my Linux box (Debian 5.0 Lenny) or on Windows (XP or higher). You could install the PostgreSQL ODBC driver on the Windows machine, and then connect Excel to the database like explained in this blog post (except using ODBC

How to extract a floating number from a string [duplicate]

时光毁灭记忆、已成空白 提交于 2019-11-25 23:12:01
问题 This question already has an answer here: Extract float/double value 4 answers I have a number of strings similar to Current Level: 13.4 db. and I would like to extract just the floating point number. I say floating and not decimal as it\'s sometimes whole. Can RegEx do this or is there a better way? 回答1: If your float is always expressed in decimal notation something like >>> import re >>> re.findall("\d+\.\d+", "Current Level: 13.4 db.") ['13.4'] may suffice. A more robust version would be: