What is a good heuristic to detect if a column in a pandas.DataFrame is categorical?

后端未结

关注

 7  881

旧时难觅i 2021-02-01 18:12

I\'ve been developing a tool that automatically preprocesses data in pandas.DataFrame format. During this preprocessing step, I want to treat continuous and categorical data dif

7条回答

忘掉有多难 (楼主)

2021-02-01 18:39

IMO the opposite strategy, identifying categoricals is better because it depends on what the data is about. Technically address data can be thought of as unordered categorical data, but usually I wouldn't use it that way.

For survey data, an idea would be to look for Likert scales, e.g. 5-8 values, either strings (which might probably need hardcoded (and translated) levels to look for "good", "bad", ".agree.", "very .*",...) or int values in the 0-8 range + NA.

Countries and such things might also be identifiable...

Age groups (".-.") might also work.

0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...