the table is:
create table test (
id string,
name string,
age string,
modified string)
data like this:
id name age modife
There is a relatively recent feature of Hive SQL, analytic functions and the over clause. This should do the job without joins
select id, name, age, last_modified
from ( select id, name, age, modified,
max( modified) over (partition by id) as last_modified
from test ) as sub
where modified = last_modified
What's going on here is that the subquery produces a new row with an extra column last_modified which has the latest modified timestamp for the corresponding person's id. (Similar to what group by would do) The key here is that the subquery gets you again one row per row in your original table and then you filter from that.
There is a chance that even the simpler solution works:
select id, name, age,
max( modified) over (partition by id) last_modified
from test
where modified = last_modified
By the way, the same code would work in Impala, too.