Regression algorithms seem to be working on features represented as numbers. For example:
This data set doesn\'t contain categorical features/variables. It
You can use "Dummy Coding" in this case. There are Python libraries to do dummy coding, you have a few options:
scikit-learn
library. Take a look at here. pandas
, it has a built-in function to create dummy variables.An example with pandas is below:
import pandas as pd
sample_data = [[1,2,'a'],[3,4,'b'],[5,6,'c'],[7,8,'b']]
df = pd.DataFrame(sample_data, columns=['numeric1','numeric2','categorical'])
dummies = pd.get_dummies(df.categorical)
df.join(dummies)