I\'m using pysparkml library and its models for regression problem and my data have some categorical features with large amount of unique values (more then 1000). What is the ri