etl

What are the pros and cons of RDB2RDF tools? [closed]

偶尔善良 提交于 2019-12-01 14:52:28
I need to know the difference between RDB2RDF tools. Could anybody tell me what are the pros and cons of RDB2RDF tools? Especially for the following ones: Virtuoso, Ultrawrap, Ontop, Morph, Xsparql, D2RQ,.... There are two W3C-standardized ways to convert relational data to RDF: Direct Mapping — non-customizable default mapping. Direct Mapping is suitable when relational data is well normalized, there are primary keys, foreign keys etc. R2RML — customizable mapping. In the survey below, I consider R2RML implementations only. Many R2RML implementations are listed here . I do not consider tools

Move SQL Server Database data to SAP BW

风格不统一 提交于 2019-12-01 12:11:19
I have read a few articles about moving data out of SAP BW and into SQL Server. I cant find any articles on moving the data from SQL Server to SAP BW, is it even possible and if so what would be the best way to handle this? After searching on this topic, i found many link addressing this issue, in this answer i will try to summarize them all and to provide all links that can help you achieving your goal. There are many way to import data from SQL Server into SAP BW: (1) SAP BW DB Connect With DB Connect, you can load data from a database system that is supported by SAP, by linking a database

Is there a better way to parse [Integer].[Integer] style dates in SSIS?

元气小坏坏 提交于 2019-12-01 10:46:16
I'm working on an SSIS ELT script that needs to parse dates from a TSV file that are stored in the format [INTEGER].[INTEGER] ( Excel integer dates followed by second since midnight, e.g., 42825.94097; or microseconds since midnight, e.g., 42831.1229166667). I've come up with the following approach: Derived Column function to split the input into a date part and a time part Derived Column function to append the parsed dates together, e.g., DATEADD("day",StartTime_Date,DATEADD("second",StartTime_Time,(DT_DATE)"1/1/1900")) Is there a more elegant way to do this without resorting to a Script

Is there a better way to parse [Integer].[Integer] style dates in SSIS?

て烟熏妆下的殇ゞ 提交于 2019-12-01 08:01:54
问题 I'm working on an SSIS ELT script that needs to parse dates from a TSV file that are stored in the format [INTEGER].[INTEGER] (Excel integer dates followed by second since midnight, e.g., 42825.94097; or microseconds since midnight, e.g., 42831.1229166667). I've come up with the following approach: Derived Column function to split the input into a date part and a time part Derived Column function to append the parsed dates together, e.g., DATEADD("day",StartTime_Date,DATEADD("second"

Fill SQL database from a CSV File

人走茶凉 提交于 2019-12-01 03:06:59
I need to create a database using a CSV file with SSIS. The CSV file includes four columns: I need to use the information of that table to populate the three tables I created in SQL below. I have realized that what I need is to use one column of the Employee Table, EmployeeNumber , and Group Table, GroupID , to populate the EmployeeGroup table. For that, I thought that a Join Merge table is what I needed, but I created the Data Flow Task in SSIS, and the results are the same, no data displayed. The middle table is the one used to relate the other tables. I created the package in SSIS and the

docker 停止服务 部署服务

微笑、不失礼 提交于 2019-12-01 02:56:04
//显示所有容器 一个容器只管理一个服务 即使容器挂掉也不会影响其他容器的服务 做到业务之间不影响root@river-NUC8i7HNK:/# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e8cfcbe6a280 etl-online:1.0 "java -Djava.securit…" About an hour ago Up About an hour 0.0.0.0:8070->8070/tcp etl-online_etl-online_1 377db9b29f0f web:3.0 "java -Djava.securit…" 11 days ago Up 11 days 0.0.0.0:30003->30003/tcp web_power_1 cf44fc608372 power:2.0 "java -Djava.securit…" 11 days ago Up 11 days 0.0.0.0:8000->30001/tcp power_power_1 4f951b13e170 etl:2.0 "java -Djava.securit…" 2 weeks ago Up 2 weeks 0.0.0.0:8090->8090/tcp etl_power_1 524d7d7ae738

Python - CSV: Large file with rows of different lengths

核能气质少年 提交于 2019-12-01 01:10:40
In short, I have a 20,000,000 line csv file that has different row lengths. This is due to archaic data loggers and proprietary formats. We get the end result as a csv file in the following format. MY goal is to insert this file into a postgres database. How Can I do the following: Keep the first 8 columns and my last 2 columns, to have a consistent CSV file Add a new column to the csv file ether at the first or last position. 1, 2, 3, 4, 5, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, img_id.jpg, -50 1, 2, 3, 4, 5, 0,0,0,0,0,0,0,0,0, img_id.jpg, -50 1, 2, 3, 4, 5, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, img_id

Python - CSV: Large file with rows of different lengths

烂漫一生 提交于 2019-11-30 19:12:57
问题 In short, I have a 20,000,000 line csv file that has different row lengths. This is due to archaic data loggers and proprietary formats. We get the end result as a csv file in the following format. MY goal is to insert this file into a postgres database. How Can I do the following: Keep the first 8 columns and my last 2 columns, to have a consistent CSV file Add a new column to the csv file ether at the first or last position. 1, 2, 3, 4, 5, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, img_id.jpg, -50 1,

基于Broadcast 状态的Flink Etl Demo

若如初见. 提交于 2019-11-30 16:49:28
社区中有好几个同学问过这样的场景:   flink 任务中,source 进来的数据,需要连接数据库里面的字段,再做后面的处理 这里假设一个 ETL 的场景,输入数据包含两个字段 “type, userid....” ,需要根据 type,连接一张 mysql 的配置表,关联 type 对应的具体内容。相对于输入数据的数量,type 的值是很少的(这里默认只有10种), 所以对应配置表就只有10条数据,配置是会定时修改的(比如跑批补充数据),配置的修改必须在一定时间内生效。 实时 ETL,需要用里面的一个字段去关联数据库,补充其他数据,进来的数据中关联字段是很单一的(就10个),对应数据库的数据也很少,如果用 异步 IO,感觉会比较傻(浪费资源、性能还不好)。同时数据库的数据是会不定时修改的,所以不能在启动的时候一次性加载。 Flink 现在对应这种场景可以使用 Boradcase state 做,如: 基于Broadcast 状态的Flink Etl Demo 这里想说的是另一种跟简单的方法: 使用定时器,定时加载数据库的数据 (就是简单的Java定时器) 先说一下代码流程: 1、自定义的 source,输入逗号分隔的两个字段 2、使用 RichMapFunction 转换数据,在 open 中定义定时器,定时触发查询 mysql 的任务,并将结果放到一个 map 中 3

ETL SSIS : Redirecting error rows to a seperate table

假如想象 提交于 2019-11-30 15:24:31
I am working on a package that contains a Source, about 80 lookups and 1 destination. The data in the source table is not consistent enough and hence my package fails very often. Is there a way by which I can transfer all the rows which are giving at the time of inserting them in destination table? For eg. I have 5 rows in Source and out of which 1st and 4th will give error. Now the result should be that 2nd, 3rd and 5th should go in destination but 1st and 4th should be stored in some flat file or a db table. Thanks in advance You can create a second OLE DB Destination and direct the red