Linear regression analysis with string/categorical features (variables)?

前端 未结 4 1674
面向向阳花
面向向阳花 2020-11-30 18:43

Regression algorithms seem to be working on features represented as numbers. For example:

This data set doesn\'t contain categorical features/variables. It

4条回答
  •  执笔经年
    2020-11-30 19:17

    You can use "Dummy Coding" in this case. There are Python libraries to do dummy coding, you have a few options:

    • You may use scikit-learn library. Take a look at here.
    • Or, if you are working with pandas, it has a built-in function to create dummy variables.

    An example with pandas is below:

    import pandas as pd
    
    sample_data = [[1,2,'a'],[3,4,'b'],[5,6,'c'],[7,8,'b']]
    df = pd.DataFrame(sample_data, columns=['numeric1','numeric2','categorical'])
    dummies = pd.get_dummies(df.categorical)
    df.join(dummies)
    

提交回复
热议问题