r – count number of sequence matches using two rows in dataframe

r – count number of sequence matches using two rows in dataframe

We can use the following code to filter for the rows that meet the requirement. lead can move the entire vector forward. The answer of this dataset is three.

library(dplyr)

dat2 <- dat %>%
  filter(code %in% stilstaan & lead(code) %in% ruiken & lead(Modifier) %in% aan object) 

nrow(dat2)
# [1] 3

DATA

dat <- read.table(text =        Tijd nummer schaap                     code   Modifier comment status
1     2.971             1                stilstaan       NA      NA  START
                  2     5.457             1                   ruiken aan object      NA  POINT
                  3    10.703             1                stilstaan         NA      NA   STOP
                  4    10.704             1                    lopen         NA      NA  START
                  5    12.959             1                    lopen         NA      NA   STOP
                  6    12.960             1                stilstaan         NA      NA  START
                  7    22.732             1                   ruiken aan object      NA  POINT
                  8    29.383             1                stilstaan         NA      NA   STOP
                  9    29.384             1                    lopen         NA      NA  START
                  10   42.568             1                    lopen         NA      NA   STOP
                  11   42.569             1                   ruiken aan object      NA  POINT
                  12   49.206             1                    lopen         NA      NA  START
                  13   66.533             1                    lopen         NA      NA   STOP
                  14   66.534             1                stilstaan         NA      NA  START
                  15   67.134             1                   ruiken aan object      NA  POINT
                  16   72.999             1                stilstaan         NA      NA   STOP
                  17   73.000             1                    lopen         NA      NA  START
                  18   77.480             1                    lopen         NA      NA   STOP
                  19   77.481             1                stilstaan         NA      NA  START
                  20   81.773             1               rondkijken         NA      NA  START,
                  header = TRUE, stringsAsFactors = FALSE)

Using base R and wwws dat-data-frame:

sum(ifelse((dat$code == stilstaan) & 
             (c(dat$code[2:length(dat$code)], NA) == ruiken) &
             (c(dat$Modifier[2:length(dat$Modifier)], NA) == aan object),
           1, 0))

r – count number of sequence matches using two rows in dataframe

You can collapse the relevant columns into a single string

collapse <- paste(paste(dat$code, dat$Modifier), collapse= )
# [1] stilstaan NA ruiken aan object stilstaan NA lopen ...

And define the pattern you want to search for

pattern <- stilstaan NA ruiken aan object

Use stringr::str_count to count matches

stringr::str_count(pattern, collapse)
# 3

Leave a Reply

Your email address will not be published. Required fields are marked *