join in a dataframe spark java

和自甴很熟 提交于 2019-12-03 16:05:32

You can use join method with column name to join two dataframes, e.g.:

Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport);
Dataset <Row> dfairport_city_state = Load.Csv (sqlContext,   data_airport_city_state);

Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport_city_state("City"));

There is also an overloaded version that allows you to specify the join type as third argument, e.g.:

Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport_city_state("City"), "left_outer");

Here's more on joins.

First, thank you very much for your response.

I have tried both of my solutions but none of them work, I get the following error: The method dfairport_city_state (String) is undefined for the type ETL_Airport

I can not access a specific column of the dataframe for join.

EDIT: Already got to do the join, I put here the solution in case someone else helps;)

Thanks for everything and best regards

//Join de tablas en las que comparten ciudad
Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport.col("leg_city").equalTo(dfairport_city_state.col("city")));
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!