dataframe - subsetting a large txt file before reading it into the variable in R -


this question has answer here:

i have large txt file (approx 2 mil rows). first column date in format 01/01/2006. values separated ;

data <- read.table("largefile.txt", sep=";")  datatouse <- data[data$date >= 01/02/2007 && data$date <= 02/02/2007,] 

example row:

16/12/2006;17:36:00;5.224;0.478;232.990;22.400;0.000;1.000;16.000 

above code doesn't work, there way subset first , load data data variable ? since file large, , takes time load ?

for subset work, need quotes , 1 less ampersand.

datatouse <- data[data$date >= "01/02/2007" & data$date <= "02/02/2007", ] 

you use subset() function.

subset(data, date >= "01/02/2007" & date <= "02/02/2007") 

next, if date column should date class variable, can set class argument colclasses in read.table(). can set column classes way if choose, or 1 use. make sure dates in proper format before using colclasses dates class variables.

finally, subset data before read onto r, recommend using shell/unix commands in terminal or shell. functions such grep, awk, sed, etc easy , quick reduce data before sending r. on windows i'd recommend download cygwin (it's free , fast), , terminal in linux-based machines.


Comments

Popular posts from this blog

C# random value from dictionary and tuple -

cgi - How do I interpret URLs without extension as files rather than missing directories in nginx? -

.htaccess - htaccess convert request to clean url and add slash at the end of the url -