r - unexpected ddply() output. Not grouping -
when calculate mean of numeric column using ddply output not expect:
ddply(df, .(df[,1]) summarize, sales = mean(df[,5]))
the output is:
df1[, 4] sales 1 x01.01.2012 49761.36 2 x01.02.2012 49761.36 3 x01.03.2012 49761.36 4 x01.04.2012 49761.36 5 x01.05.2012 49761.36 6 x01.06.2012 49761.36
i not understand why mean same, though sorted date. not expected output given each date sales different. calculates mean of whole column.
the second argument should .(variable name)
. df[,1]
refers values in column, not name of variable. same thing when use mean()
here's short example fake data, since did not supply any.
> df <- data.frame(val1 = 1:5, val2 = 6:10) > library(plyr) ## correct mean > ddply(df, .(val1, val2), summarize, mean = mean(c(val1, val2))) val1 val2 mean 1 1 6 3.5 2 2 7 4.5 3 3 8 5.5 4 4 9 6.5 5 5 10 7.5 ## incorrect mean > ddply(df, .(df[,1], df[,2]), summarize, mean = mean(c(df[,1], df[,2]))) df[, 1] df[, 2] mean 1 1 6 5.5 2 2 7 5.5 3 3 8 5.5 4 4 9 5.5 5 5 10 5.5
if doesn't resolve issue, please provide sample of data can reproduce problem.
Comments
Post a Comment