teradata

Connecting Python with Teradata using Teradata module

偶尔善良 提交于 2019-11-27 10:11:50
问题 I have installed python 2.7.0 and Teradata module on Windows 7. I am not able to connect and query TD from python. pip install Teradata Now I want to import teradata module in my source code and perform operations like - Firing queries to teradata and get result set. Check if connection is made to teradata. Please help me writing code for the same as I am new to Python and there is no information available with me to connect to teradata. 回答1: There are a number of ways to connect to Teradata

Find which rows have different values for a given column in Teradata SQL

霸气de小男生 提交于 2019-11-27 01:18:31
问题 I am trying to compare two addresses from the same ID to see whether they match. For example: Id Adress Code Address 1 1 123 Main 1 2 123 Main 2 1 456 Wall 2 2 456 Wall 3 1 789 Right 3 2 100 Left I'm just trying to figure out whether the address for each ID matches. So in this case I want to return just ID 3 as having a different address for Address Code 1 and 2. 回答1: Join the table with itself and give it two different aliases ( A and B in the following example). This allows to compare

What is ROWS UNBOUNDED PRECEDING used for in Teradata?

流过昼夜 提交于 2019-11-27 00:10:47
问题 I am just starting on Teradata and I have come across an Ordered Analytical Function called "Rows unbounded preceding" in Teradata. I tried several sites to learn about the function but all of them uses a complicated example explaining the same. Could you please provide me with a naive example so that I can get the basics clear. 回答1: It's the "frame" or "range" clause of window functions, which are part of the SQL standard and implemented in many databases, including Teradata. A simple

Teradata MERGE yielding no results when executed through SQLAlchemy

谁都会走 提交于 2019-11-26 23:33:37
问题 I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below: MERGE INTO perm_table as p USING temp_table as t ON p.Id = t.Id WHEN MATCHED THEN UPDATE SET col1 = t.col1, col2 = t.col2, ... col50 = t.col50 WHEN NOT MATCHED THEN

Compare 3 Consecutive rows in a table

送分小仙女□ 提交于 2019-11-26 23:17:55
问题 Hi I have an interesting problem. I Have an Employee Table AS Follows CREATE TABLE EMPLOYEE( EMPLOYEE_ID INTEGER, SALARY DECIMAL(18,2), PAY_PERIOD DATE) Now the tables have employees some of whom get paid monthly,some weekly, some biweekly and some daily. What we want is to find an Indicator saying 'Y' if the salary of three consecutive Pay Periods is equal. Lets take the following example. Employee Pay_Period Salary 1 01/01/2012 $500 1 08/01/2012 $200 1 15/01/2012 $200 1 22/01/2012 $200 1 29

Teradata equivalent for lead and lag function of oracle

本秂侑毒 提交于 2019-11-26 23:10:34
问题 I have been working ot see the equivalent function for Oracle lead and lag function. The oracle lead would look like LEAD(col1.date,1,ADD_MONTHS(col1.DATE,12)) OVER(Partition By tab.a,tab.b,tab.c Order By tab.a)-1 END_DATE LAG(col1.DATE + 7,1,col1.DATE-1) OVER(partition by tab.a,tab.b Order By tab.b) LAG_DATE Any better idea 回答1: I believe you can take the following SQL as a basis and modify it to meet your needs: SELECT CALENDAR_DATE , MAX(CALENDAR_DATE) OVER(PARTITION BY 1 ORDER BY CALENDAR

How to improve performance for slow Spark jobs using DataFrame and JDBC connection?

荒凉一梦 提交于 2019-11-26 15:29:52
I am trying to access a mid-size Teradata table (~100 million rows) via JDBC in standalone mode on a single node (local[*]). I am using Spark 1.4.1. and is setup on a very powerful machine(2 cpu, 24 cores, 126G RAM). I have tried several memory setup and tuning options to make it work faster, but neither of them made a huge impact. I am sure there is something I am missing and below is my final try that took about 11 minutes to get this simple counts vs it only took 40 seconds using a JDBC connection through R to get the counts. bin/pyspark --driver-memory 40g --executor-memory 40g df =