teradata | 易学教程

Connecting Python with Teradata using Teradata module

阅读更多关于 Connecting Python with Teradata using Teradata module

问题 I have installed python 2.7.0 and Teradata module on Windows 7. I am not able to connect and query TD from python. pip install Teradata Now I want to import teradata module in my source code and perform operations like - Firing queries to teradata and get result set. Check if connection is made to teradata. Please help me writing code for the same as I am new to Python and there is no information available with me to connect to teradata. 回答1: There are a number of ways to connect to Teradata

Find which rows have different values for a given column in Teradata SQL

阅读更多关于 Find which rows have different values for a given column in Teradata SQL

问题 I am trying to compare two addresses from the same ID to see whether they match. For example: Id Adress Code Address 1 1 123 Main 1 2 123 Main 2 1 456 Wall 2 2 456 Wall 3 1 789 Right 3 2 100 Left I'm just trying to figure out whether the address for each ID matches. So in this case I want to return just ID 3 as having a different address for Address Code 1 and 2. 回答1: Join the table with itself and give it two different aliases ( A and B in the following example). This allows to compare

What is ROWS UNBOUNDED PRECEDING used for in Teradata?

阅读更多关于 What is ROWS UNBOUNDED PRECEDING used for in Teradata?

问题 I am just starting on Teradata and I have come across an Ordered Analytical Function called "Rows unbounded preceding" in Teradata. I tried several sites to learn about the function but all of them uses a complicated example explaining the same. Could you please provide me with a naive example so that I can get the basics clear. 回答1: It's the "frame" or "range" clause of window functions, which are part of the SQL standard and implemented in many databases, including Teradata. A simple

Teradata MERGE yielding no results when executed through SQLAlchemy

阅读更多关于 Teradata MERGE yielding no results when executed through SQLAlchemy

问题 I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below: MERGE INTO perm_table as p USING temp_table as t ON p.Id = t.Id WHEN MATCHED THEN UPDATE SET col1 = t.col1, col2 = t.col2, ... col50 = t.col50 WHEN NOT MATCHED THEN

Compare 3 Consecutive rows in a table

阅读更多关于 Compare 3 Consecutive rows in a table

问题 Hi I have an interesting problem. I Have an Employee Table AS Follows CREATE TABLE EMPLOYEE( EMPLOYEE_ID INTEGER, SALARY DECIMAL(18,2), PAY_PERIOD DATE) Now the tables have employees some of whom get paid monthly,some weekly, some biweekly and some daily. What we want is to find an Indicator saying 'Y' if the salary of three consecutive Pay Periods is equal. Lets take the following example. Employee Pay_Period Salary 1 01/01/2012 $500 1 08/01/2012 $200 1 15/01/2012 $200 1 22/01/2012 $200 1 29

Teradata equivalent for lead and lag function of oracle

阅读更多关于 Teradata equivalent for lead and lag function of oracle

问题 I have been working ot see the equivalent function for Oracle lead and lag function. The oracle lead would look like LEAD(col1.date,1,ADD_MONTHS(col1.DATE,12)) OVER(Partition By tab.a,tab.b,tab.c Order By tab.a)-1 END_DATE LAG(col1.DATE + 7,1,col1.DATE-1) OVER(partition by tab.a,tab.b Order By tab.b) LAG_DATE Any better idea 回答1: I believe you can take the following SQL as a basis and modify it to meet your needs: SELECT CALENDAR_DATE , MAX(CALENDAR_DATE) OVER(PARTITION BY 1 ORDER BY CALENDAR

How to improve performance for slow Spark jobs using DataFrame and JDBC connection?

阅读更多关于 How to improve performance for slow Spark jobs using DataFrame and JDBC connection?

I am trying to access a mid-size Teradata table (~100 million rows) via JDBC in standalone mode on a single node (local[*]). I am using Spark 1.4.1. and is setup on a very powerful machine(2 cpu, 24 cores, 126G RAM). I have tried several memory setup and tuning options to make it work faster, but neither of them made a huge impact. I am sure there is something I am missing and below is my final try that took about 11 minutes to get this simple counts vs it only took 40 seconds using a JDBC connection through R to get the counts. bin/pyspark --driver-memory 40g --executor-memory 40g df =