dataframe - subsetting a large txt file before reading it into the variable in R -
this question has answer here:
i have large txt file (approx 2 mil rows). first column date in format 01/01/2006. values separated ;
data <- read.table("largefile.txt", sep=";") datatouse <- data[data$date >= 01/02/2007 && data$date <= 02/02/2007,]
example row:
16/12/2006;17:36:00;5.224;0.478;232.990;22.400;0.000;1.000;16.000
above code doesn't work, there way subset first , load data data variable ? since file large, , takes time load ?
for subset work, need quotes , 1 less ampersand.
datatouse <- data[data$date >= "01/02/2007" & data$date <= "02/02/2007", ]
you use subset()
function.
subset(data, date >= "01/02/2007" & date <= "02/02/2007")
next, if date column should date class variable, can set class argument colclasses
in read.table()
. can set column classes way if choose, or 1 use. make sure dates in proper format before using colclasses
dates class variables.
finally, subset data before read onto r, recommend using shell/unix commands in terminal or shell. functions such grep
, awk
, sed
, etc easy , quick reduce data before sending r. on windows i'd recommend download cygwin (it's free , fast), , terminal in linux-based machines.
Comments
Post a Comment