* using log directory ‘/data/gannet/ripley/R/packages/tests-clang/seqtrie.Rcheck’
* using R Under development (unstable) (2024-05-02 r86508)
* using platform: x86_64-pc-linux-gnu
* R was compiled by
    clang version 18.1.5
    flang-new version 18.1.5
* running under: Fedora Linux 36 (Workstation Edition)
* using session charset: UTF-8
* using option ‘--no-stop-on-test-error’
* checking for file ‘seqtrie/DESCRIPTION’ ... OK
* this is package ‘seqtrie’ version ‘0.2.6’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘seqtrie’ can be installed ... [121s/177s] OK
See 'https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/seqtrie-00install.html' for details.
* used C++ compiler: ‘clang version 18.1.5’
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking code files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking whether startup messages can be suppressed ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... [9s/11s] OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking LazyData ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... NOTE
GNU make is a SystemRequirements.
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking use of SHLIB_OPENMP_*FLAGS in Makefiles ... OK
* checking pragmas in C/C++ headers and code ... OK
* checking compilation flags used ... OK
* checking compiled code ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ... [16m/11m] ERROR
  Running ‘test_RadixForest.R’ [32s/45s]
  Running ‘test_RadixTree.R’ [753s/552s]
  Running ‘test_pairwise.R’ [158s/89s]
Running the tests in ‘tests/test_pairwise.R’ failed.
Complete output:
  > # This test file tests the `dist_matrix` and `dist_pairwise` functions
  > # These two functions are simple dynamic programming algorithms for computing pairwise distances and are themselves used to validate
  > # the RadixTree imeplementation (see test_radix_tree.R)
  > 
  > library(seqtrie)
  > library(stringi)
  > library(stringdist)
  > library(Biostrings)
  Loading required package: BiocGenerics
  
  Attaching package: 'BiocGenerics'
  
  The following objects are masked from 'package:stats':
  
      IQR, mad, sd, var, xtabs
  
  The following objects are masked from 'package:base':
  
      Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
      as.data.frame, basename, cbind, colnames, dirname, do.call,
      duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
      lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
      pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
      union, unique, unsplit, which.max, which.min
  
  Loading required package: S4Vectors
  Loading required package: stats4
  
  Attaching package: 'S4Vectors'
  
  The following object is masked from 'package:utils':
  
      findMatches
  
  The following objects are masked from 'package:base':
  
      I, expand.grid, unname
  
  Loading required package: IRanges
  Loading required package: XVector
  Loading required package: GenomeInfoDb
  
  Attaching package: 'Biostrings'
  
  The following object is masked from 'package:base':
  
      strsplit
  
  > library(dplyr)
  
  Attaching package: 'dplyr'
  
  The following objects are masked from 'package:Biostrings':
  
      collapse, intersect, setdiff, setequal, union
  
  The following object is masked from 'package:GenomeInfoDb':
  
      intersect
  
  The following object is masked from 'package:XVector':
  
      slice
  
  The following objects are masked from 'package:IRanges':
  
      collapse, desc, intersect, setdiff, slice, union
  
  The following objects are masked from 'package:S4Vectors':
  
      first, intersect, rename, setdiff, setequal, union
  
  The following objects are masked from 'package:BiocGenerics':
  
      combine, intersect, setdiff, union
  
  The following objects are masked from 'package:stats':
  
      filter, lag
  
  The following objects are masked from 'package:base':
  
      intersect, setdiff, setequal, union
  
  > 
  > # Use 2 threads on github actions and CRAN, 4 threads locally
  > IS_LOCAL  <- Sys.getenv("IS_LOCAL") != ""
  > NTHREADS  <- ifelse(IS_LOCAL, 4, 2)
  > NITER     <- ifelse(IS_LOCAL, 4, 1)
  > NSEQS     <- 10000
  > MAXSEQLEN <- 200
  > CHARSET   <- "ACGT"
  > 
  > random_strings <- function(N, charset = "abcdefghijklmnopqrstuvwxyz") {
  +   charset_stri <- paste0("[", charset, "]")
  +   len <- sample(0:MAXSEQLEN, N, replace=TRUE)
  +   result <- lapply(0:MAXSEQLEN, function(x) {
  +     nx <- sum(len == x)
  +     if(nx == 0) return(character())
  +     stringi::stri_rand_strings(nx, x, pattern = charset_stri)
  +   })
  +   sample(unlist(result))
  + }
  > 
  > mutate_strings <- function(x, prob = 0.025, indel_prob = 0.025, charset = "abcdefghijklmnopqrstuvwxyz") {
  +   charset <- unlist(strsplit(charset, ""))
  +   xsplit <- strsplit(x, "")
  +   sapply(xsplit, function(a) {
  +     r <- runif(length(a)) < prob
  +     a[r] <- sample(charset, sum(r), replace=TRUE)
  +     ins <- runif(length(a)) < indel_prob
  +     a[ins] <- paste0(sample(charset, sum(ins), replace=TRUE), sample(charset, sum(ins), replace=TRUE))
  +     del <- runif(length(a)) < indel_prob
  +     a[del] <- ""
  +     paste0(a, collapse = "")
  +   })
  + }
  > 
  > # Biostrings notes:
  > # subject (target) must be of length 1 or equal to pattern (query)
  > # To get a distance matrix, iterate over target and perform a column bind
  > # special_zero_case -- if both query and target are empty, Biostrings fails with an error
  > pairwiseAlignmentFix <- function(pattern, subject, ...) {
  +     results <- rep(0, length(subject))
  +     special_zero_case <- nchar(pattern) == 0 & nchar(subject) == 0
  +     if(all(special_zero_case)) {
  +       results
  +     } else {
  +       results[!special_zero_case] <- Biostrings::pairwiseAlignment(pattern=pattern[!special_zero_case], subject=subject[!special_zero_case], ...)
  +       results
  +     }
  + }
  > 
  > biostrings_matrix_global <- function(query, target, cost_matrix, gap_cost, gap_open_cost = 0) {
  +   substitutionMatrix <- -cost_matrix
  +   lapply(query, function(x) {
  +     query2 <- rep(x, length(target))
  +     -pairwiseAlignmentFix(pattern=query2, subject=target, substitutionMatrix = substitutionMatrix, gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global")
  +   }) %>% do.call(rbind, .)
  + }
  > 
  > biostrings_pairwise_global <- function(query, target, cost_matrix, gap_cost, gap_open_cost = 0) {
  +   substitutionMatrix <- -cost_matrix
  +   -pairwiseAlignment(pattern=query, subject=target, substitutionMatrix = substitutionMatrix,gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global")
  + }
  > 
  > biostrings_matrix_anchored <- function(query, target, query_size, target_size, cost_matrix, gap_cost, gap_open_cost = 0) {
  +   substitutionMatrix <- -cost_matrix
  +   lapply(seq_along(query), function(i) {
  +     query2 <- substring(query[i], 1, query_size[i,,drop=TRUE])
  +     target2 <- substring(target, 1, target_size[i,,drop=TRUE])
  +     -pairwiseAlignmentFix(pattern=query2, subject=target2, substitutionMatrix = substitutionMatrix, gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global")
  +   }) %>% do.call(rbind, .)
  + }
  > 
  > biostrings_pairwise_anchored <- function(query, target, query_size, target_size, cost_matrix, gap_cost, gap_open_cost = 0) {
  +   substitutionMatrix <- -cost_matrix
  +   query2 <- substring(query, 1, query_size)
  +   target2 <- substring(target, 1, target_size)
  +   -pairwiseAlignmentFix(pattern=query2, subject=target2, substitutionMatrix = substitutionMatrix, gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global")
  + }
  > 
  > for(. in 1:NITER) {
  + 
  +     print("Checking hamming search correctness")
  +     local({
  +       # Note: seqtrie returns `NA_integer_` for hamming distance when the lengths are different
  +       # whereas stringdist returns `Inf`
  +       # This is why we need to replace `NA_integer_` with `Inf` when comparing results
  + 
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       results_seqtrie <- dist_matrix(query, target, mode = "hamming", nthreads=NTHREADS)
  +       results_seqtrie[is.na(results_seqtrie)] <- Inf
  +       results_stringdist <- stringdist::stringdistmatrix(query, target, method = "hamming", nthread=NTHREADS)
  +       stopifnot(all(results_seqtrie == results_stringdist))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "hamming", nthreads=NTHREADS)
  +       results_seqtrie[is.na(results_seqtrie)] <- Inf
  +       results_stringdist <- stringdist::stringdist(query_pairwise, target, method = "hamming", nthread=NTHREADS)
  +       stopifnot(all(results_seqtrie == results_stringdist))
  +     })
  + 
  +     print("Checking levenshtein search correctness")
  +     local({
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       results_seqtrie <- dist_matrix(query, target, mode = "levenshtein", nthreads=NTHREADS)
  +       results_stringdist <- stringdist::stringdistmatrix(query, target, method = "lv", nthread=NTHREADS)
  +       stopifnot(all(results_seqtrie == results_stringdist))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "levenshtein", nthreads=NTHREADS)
  +       results_stringdist <- stringdist::stringdist(query_pairwise, target, method = "lv", nthread=NTHREADS)
  +       stopifnot(all(results_seqtrie == results_stringdist))
  +     })
  + 
  +     print("Checking anchored search correctness")
  +     local({
  +       # There is no anchored search in stringdist (or elsewhere). To get the same results, we substring the query and target sequences
  +       # By the results of the seqtrie anchored search and then compare the results
  + 
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       results_seqtrie <- dist_matrix(query, target, mode = "anchored", nthreads=NTHREADS)
  +       query_size <- attr(results_seqtrie, "query_size")
  +       target_size <- attr(results_seqtrie, "target_size")
  +       results_stringdist <- lapply(seq_along(query), function(i) {
  +         query_size2 <- query_size[i,,drop=TRUE]
  +         target_size2 <- target_size[i,,drop=TRUE]
  +         query2 <- substring(query[i], 1, query_size2) # query[i] is recycled
  +         target2 <- substring(target, 1, target_size2)
  +         stringdist::stringdist(query2, target2, method = "lv", nthread=NTHREADS)
  +       }) %>% do.call(rbind, .)
  +       stopifnot(all(results_seqtrie == results_stringdist))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "anchored", nthreads=NTHREADS)
  +       query_size <- attr(results_seqtrie, "query_size")
  +       target_size <- attr(results_seqtrie, "target_size")
  +       query2 <- substring(query_pairwise, 1, query_size)
  +       target2 <- substring(target, 1, target_size)
  +       results_stringdist <- stringdist::stringdist(query2, target2, method = "lv", nthread=NTHREADS)
  +       stopifnot(all(results_seqtrie == results_stringdist))
  +     })
  + 
  +     print("Checking global search with linear gap for correctness")
  +     local({
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET))
  +       diag(cost_matrix) <- 0
  +       colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]]
  +       gap_cost <- sample(1:3, size = 1)
  +       results_seqtrie <- dist_matrix(query, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS)
  +       results_biostrings <- biostrings_matrix_global(query, target, cost_matrix = cost_matrix, gap_cost = gap_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS)
  +       results_biostrings <- biostrings_pairwise_global(query_pairwise, target, cost_matrix = cost_matrix, gap_cost = gap_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  +     })
  + 
  +     print("Checking anchored search with linear gap for correctness")
  +     local({
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET))
  +       diag(cost_matrix) <- 0
  +       colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]]
  +       gap_cost <- sample(1:3, size = 1)
  +       results_seqtrie <- dist_matrix(query, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS)
  +       query_size <- attr(results_seqtrie, "query_size")
  +       target_size <- attr(results_seqtrie, "target_size")
  +       results_biostrings <- biostrings_matrix_anchored(query, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS)
  +       query_size <- attr(results_seqtrie, "query_size")
  +       target_size <- attr(results_seqtrie, "target_size")
  +       results_biostrings <- biostrings_pairwise_anchored(query_pairwise, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  +     })
  + 
  + 
  +     print("Checking global search with affine gap for correctness")
  +     local({
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET))
  +       diag(cost_matrix) <- 0
  +       colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]]
  +       gap_cost <- sample(1:3, size = 1)
  +       gap_open_cost <- sample(1:3, size = 1)
  +       results_seqtrie <- dist_matrix(query, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cos=gap_open_cost, nthreads=NTHREADS)
  +       results_biostrings <- biostrings_matrix_global(query, target, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost, nthreads=NTHREADS)
  +       results_biostrings <- biostrings_pairwise_global(query_pairwise, target, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  +     })
  + 
  +     print("Checking anchored search with affine gap for correctness")
  +     local({
  +       target <- c(random_strings(NSEQS, CHARSET),"") %>% unique
  +       query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET)))
  +       query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique
  + 
  +       # Check matrix results
  +       cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET))
  +       diag(cost_matrix) <- 0
  +       colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]]
  +       gap_cost <- sample(1:3, size = 1)
  +       gap_open_cost <- sample(1:3, size = 1)
  +       results_seqtrie <- dist_matrix(query, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost, nthreads=NTHREADS)
  +       query_size <- attr(results_seqtrie, "query_size")
  +       target_size <- attr(results_seqtrie, "target_size")
  +       results_biostrings <- biostrings_matrix_anchored(query, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  + 
  +       # Check pairwise results
  +       query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET)
  +       results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost, nthreads=NTHREADS)
  +       query_size <- attr(results_seqtrie, "query_size")
  +       target_size <- attr(results_seqtrie, "target_size")
  +       results_biostrings <- biostrings_pairwise_anchored(query_pairwise, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost)
  +       stopifnot(all(results_seqtrie == results_biostrings))
  +     })
  + }
  [1] "Checking hamming search correctness"
  [1] "Checking levenshtein search correctness"
  [1] "Checking anchored search correctness"
  [1] "Checking global search with linear gap for correctness"
  Error in h(simpleError(msg, call)) : 
    error in evaluating the argument 'args' in selecting a method for function 'do.call': Could not load package pwalign. Is it installed?
  
    Note that starting with BioC 3.19, calling pairwiseAlignment() requires
    the pwalign package. Please install it with:
  
      BiocManager::install("pwalign")
  Calls: <Anonymous> ... <Anonymous> -> .call_fun_in_pwalign -> .load_package_gracefully
  Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes ... OK
* checking re-building of vignette outputs ... [15s/12s] OK
* checking PDF version of manual ... [10s/10s] OK
* checking HTML version of manual ... OK
* checking for non-standard things in the check directory ... OK
* checking for detritus in the temp directory ... OK
* DONE
Status: 1 ERROR, 1 NOTE