pandas - Running get_dummies on several DataFrame columns? -
how can 1 idiomatically run function get_dummies
, expects single column , returns several, on multiple dataframe columns?
since pandas version 0.15.0, pd.get_dummies
can handle dataframe directly (before that, handle single series, , see below workaround):
in [1]: df = dataframe({'a': ['a', 'b', 'a'], 'b': ['c', 'c', 'b'], ...: 'c': [1, 2, 3]}) in [2]: df out[2]: b c 0 c 1 1 b c 2 2 b 3 in [3]: pd.get_dummies(df) out[3]: c a_a a_b b_b b_c 0 1 1 0 0 1 1 2 0 1 0 1 2 3 1 0 1 0
workaround pandas < 0.15.0
you can each column seperate , concat results:
in [111]: df out[111]: b 0 x 1 y 2 b z 3 b x 4 c x 5 y 6 b y 7 c z in [112]: pd.concat([pd.get_dummies(df[col]) col in df], axis=1, keys=df.columns) out[112]: b b c x y z 0 1 0 0 1 0 0 1 1 0 0 0 1 0 2 0 1 0 0 0 1 3 0 1 0 1 0 0 4 0 0 1 1 0 0 5 1 0 0 0 1 0 6 0 1 0 0 1 0 7 0 0 1 0 0 1
if don't want multi-index column, remove keys=..
concat function call.
Comments
Post a Comment