发表新帖

发表新帖

How to assign unique contiguous numbers to elements in a Spark RDD

前端未结

关注

 5  2024

無奈伤痛 2020-12-04 14:00

I have a dataset of (user, product, review), and want to feed it into mllib\'s ALS algorithm.

The algorithm needs users and products to be numbers, whil

5条回答

执笔经年 (楼主)

2020-12-04 15:01

People have already recommended monotonically_increasing_id(), and mentioned the problem that it creates Longs, not Ints.

However, in my experience (caveat - Spark 1.6) - if you use it on a single executor (repartition to 1 before), there is no executor prefix used, and the number can be safely cast to Int. Obviously, you need to have less than Integer.MAX_VALUE rows.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题