pandas - Running get_dummies on several DataFrame columns? -


how can 1 idiomatically run function get_dummies, expects single column , returns several, on multiple dataframe columns?

since pandas version 0.15.0, pd.get_dummies can handle dataframe directly (before that, handle single series, , see below workaround):

in [1]: df = dataframe({'a': ['a', 'b', 'a'], 'b': ['c', 'c', 'b'],    ...:                 'c': [1, 2, 3]})  in [2]: df out[2]:     b  c 0   c  1 1  b  c  2 2   b  3  in [3]: pd.get_dummies(df) out[3]:    c  a_a  a_b  b_b  b_c 0  1    1    0    0    1 1  2    0    1    0    1 2  3    1    0    1    0 

workaround pandas < 0.15.0

you can each column seperate , concat results:

in [111]: df out[111]:      b 0   x 1   y 2  b  z 3  b  x 4  c  x 5   y 6  b  y 7  c  z  in [112]: pd.concat([pd.get_dummies(df[col]) col in df], axis=1, keys=df.columns) out[112]:            b           b  c  x  y  z 0  1  0  0  1  0  0 1  1  0  0  0  1  0 2  0  1  0  0  0  1 3  0  1  0  1  0  0 4  0  0  1  1  0  0 5  1  0  0  0  1  0 6  0  1  0  0  1  0 7  0  0  1  0  0  1 

if don't want multi-index column, remove keys=.. concat function call.


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -