问题
Once I do a join A by id, B by id
, I get an alias with fields A::f...
, B::f..
.
Is there a way to project it on only the A
fields?
C = join A by id, B by id;
D = filter C by B::n < 1000;
E = foreach D generate A::*;
I get
Unexpected character '*'
What I want is E
with the schema identical to A
(i.e., describe E
and describe A
should print the exact same things).
How do I do that?
回答1:
You can use a project-range expression to get part of the way there.
Unfortunately, there is no way to systematically strip the A::
prefix. If you know the name of the last field of A
(suppose it's last
), you can do this:
E = foreach D generate .. A::last;
If you wanted just the fields from B
you would do
E = foreach D generate B::first ..;
If you really need to apply a specific schema, perhaps you could just define a macro that applies this schema whenever you need it, so you can overwrite any of the changes that come from grouping, joining, etc.
回答2:
There is no way to have a common alias name after joining. but you can generate specific columns from the join results. For Example,
A = load 'data1' as (id,name,addr);
B = load 'data2' as (id,name2,addr2);
C = join A by id,B by id; //Now C has id,name,addr,id,name2,addr2
D = Foreach C generate($0,$1,$2);
Now the relation D has the 'A' relation columns such as id,name,addr only.
来源:https://stackoverflow.com/questions/24812457/how-to-project-an-alias-using-a-wildcard