Perl: How to copy/mirror remote MYSQL table(s) to another database? Possibly different structure too?

不问归期 提交于 2019-12-24 05:32:19

问题


I am very new to this and a good friend is in a bind. I am at my wits end. I have used gui's like navicat and sqlyog to do this but, only manually.

His band info data (schedules and whatnot) is in a MYSQL database on a server (admin server).

I am putting together a basic site for him written in Perl that grabs data from a database that resides on my server (public server) and displays schedule info, previous gig newsletters and some fan interaction.

He uses an administrative interface, which he likes and desires to keep, to manage the data on the admin server.

The admin server db has a bunch of tables and even table data the public db does not need.

So, I created tables on the public side that only contain relevant data.

I basically used a gui to export the data, then insert to the public side whenever he made updates to the admin db (copy and paste).

(FYI I am using DBI module to access the data in/via my public db perl script.)

I could access the admin server directly to grab only the data I need but, the whole purpose of this is to "mirror" the data not access the admin server on every query. Also, some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me. There is however a "time" column which could be utilized to compare to.

I cannot "sync" due to the fact that the structures are different, I only need the relevant table data from only three tables.

SO...... I desire to automate!

I read "copy" was a fast way but, my findings in how to implement were too advanced for my level.

I do not have the luxury of placing a script on the admin server to notify when there was an update.

1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db. I would then desire to update or insert the new or changed data to the public servers db.

This "check" could be set up in a cron job I guess or triggered when a specific page loads on the public side. (the same sub routine called by the cron I would assume).

This data does not need to be "real time" but, if he updates something it would be nice to have it appear as quickly as possible.

I have done much reading, module research and experimenting but, here I am again at stackoverflow where I always get great advice and examples.

Much of the terminology is still quite over my head so verbose examples with explanations really help me learn quicker.

Thanks in advance.


回答1:


The two terms you are looking for are either "replication" or "ETL".

First, replication approach.

Let's assume your admin server has tables T1, T2, T3 and your public server has tables TP1, TP2.

So, what you want to do (since you have different table structres as you said) is:

  1. Take the tables from public server, and create exact copies of those tables on the admin server (TP1 and TP2).

  2. Create a trigger on the admin server's original tables to populate the data from T1/T2/T3 into admin server's copy of TP1/TP2.

  3. You will also need to do initial data population from T1/T2/T3 into admin server's copy of TP1/TP2. Duh.

  4. Set up the "replication" from admin server's TP1/TP2 to public server's TP1/TP2

A different approach is to write a program (such programs are called ETL - Extract-Transform-Load) which will extract the data from T1/T2/T3 on admin server (the "E" part of "ETL"), massage the data into format suitable for loading into TP1/TP2 tables (the "T" part of "ETL"), transfer (via ftp/scp/whatnot) those files to public server, and the second half of the program (the "L") part will load the files into the tables TP1/TP2 on public server. Both halfs of the program would be launched by cron or your scheduler of choice.

There's an article with a very good example of how to start building Perl/MySQL ETL: http://oreilly.com/pub/a/databases/2007/04/12/building-a-data-warehouse-with-mysql-and-perl.html?page=2

If you prefer not to build your own, here's a list of open source ETL systems, never used any of them so no opinions on their usability/quality: http://www.manageability.org/blog/stuff/open-source-etl




回答2:


I think you've misunderstood ETL as a problem domain, which is complicated, versus ETL as a one-off solution, which is often not much harder than writing a report. Unless I've totally misunderstood your problem, you don't need a general ETL solution, you need a one-off solution that works on a handful of tables and a few thousand rows. ETL and Schema mapping sound scarier than they are for a single job. (The generalization, scaling, change-management, and OLTP-to-OLAP support of ETL are where it gets especially difficult.) If you can use Perl to write a report out of a SQL database, you probably know enough to handle the ETL involved here.

1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db. I would then desire to update or insert the new or changed data to the public servers db.

If every table you need to pull from has an update timestamp column, then your cron job includes some SELECT statements with WHERE clauses based on the last time the cron job ran to get only the updates. Tables without an update timestamp will probably need a full dump.

I'd use a one-to-one table mapping unless normalization was required... just simpler to my opinion. Why complicate it with "big" schema changes if you don't have to?

some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me.

Limit your queries to only the columns you need (and if there are no BLOBs or exceptionally big columns in what you need) a few thousand rows should not be a problem via DBI with a FETCHALL method. Loop all you want locally, just make as few trips to the remote database as possible.

If a row is has a newer date, update it. I will also have to check for new rows for insertion.

Each table needs one SELECT ... WHERE updated_timestamp_columnname > last_cron_run_timestamp. That result set will contain all rows with newer timestamps, which contains newly inserted rows (if the timestamp column behaves like I'd expect). For updating your local database, check out MySQL's ON DUPLICATE KEY UPDATE syntax... this will let you do it in one step.

... how to implement were too advanced for my level ... Yes, I have actually done this already but, I have to manually update...

Some questions to help us understand your level... Are you hitting the database from the mysql client command-line or from a GUI? Have you gotten to the point where you've wrapped your SQL queries in Perl and DBI, yet?




回答3:


If the two databases have different, you'll need an ETL solution to map from one schema to another.

If the schemas are the same, all you have to do is replicate the data from one to the other.




回答4:


Why not just create identical structure on the 'slave' server to the master server. Then create a small table that keeps track of the last timestamp or id for the updated tables.

Then select from the master all rows changed since the last timestamp or greater than the id. Insert them into the matching table on the slave server.

You will need to be careful of updated rows. If a row on the master is updated but the timestamp doesn't change then how will you tell which rows to fetch? If that's not an issue the process is quite simple.

If it is an issue then you need to be more sophisticated, but without knowing the data structure and update mechanism its a goose chase to give pointers on it.

The script could be called by cron every so often to update the changes.

if the database structures must be different on the two servers then a simple translation step may need to be added, but most of the time that can be done within the sql select statement and maybe a join or two.



来源:https://stackoverflow.com/questions/4570877/perl-how-to-copy-mirror-remote-mysql-tables-to-another-database-possibly-dif

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!