r - unexpected ddply() output. Not grouping -


when calculate mean of numeric column using ddply output not expect:

ddply(df, .(df[,1]) summarize, sales = mean(df[,5])) 

the output is:

 df1[, 4]    sales 1 x01.01.2012 49761.36 2 x01.02.2012 49761.36 3 x01.03.2012 49761.36 4 x01.04.2012 49761.36 5 x01.05.2012 49761.36 6 x01.06.2012 49761.36 

i not understand why mean same, though sorted date. not expected output given each date sales different. calculates mean of whole column.

the second argument should .(variable name). df[,1] refers values in column, not name of variable. same thing when use mean()

here's short example fake data, since did not supply any.

> df <- data.frame(val1 = 1:5, val2 = 6:10) > library(plyr) ## correct mean > ddply(df, .(val1, val2), summarize, mean = mean(c(val1, val2)))   val1 val2 mean 1    1    6  3.5 2    2    7  4.5 3    3    8  5.5 4    4    9  6.5 5    5   10  7.5 ## incorrect mean > ddply(df, .(df[,1], df[,2]), summarize, mean = mean(c(df[,1], df[,2])))   df[, 1] df[, 2] mean 1       1       6  5.5 2       2       7  5.5 3       3       8  5.5 4       4       9  5.5 5       5      10  5.5 

if doesn't resolve issue, please provide sample of data can reproduce problem.


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -