I\'ve been developing a tool that automatically preprocesses data in pandas.DataFrame format. During this preprocessing step, I want to treat continuous and categorical data dif
IMO the opposite strategy, identifying categoricals is better because it depends on what the data is about. Technically address data can be thought of as unordered categorical data, but usually I wouldn't use it that way.
For survey data, an idea would be to look for Likert scales, e.g. 5-8 values, either strings (which might probably need hardcoded (and translated) levels to look for "good", "bad", ".agree.", "very .*",...) or int values in the 0-8 range + NA.
Countries and such things might also be identifiable...
Age groups (".-.") might also work.