* using log directory ‘/data/gannet/ripley/R/packages/tests-clang/seqtrie.Rcheck’ * using R Under development (unstable) (2024-05-02 r86508) * using platform: x86_64-pc-linux-gnu * R was compiled by clang version 18.1.5 flang-new version 18.1.5 * running under: Fedora Linux 36 (Workstation Edition) * using session charset: UTF-8 * using option ‘--no-stop-on-test-error’ * checking for file ‘seqtrie/DESCRIPTION’ ... OK * this is package ‘seqtrie’ version ‘0.2.6’ * package encoding: UTF-8 * checking package namespace information ... OK * checking package dependencies ... OK * checking if this is a source package ... OK * checking if there is a namespace ... OK * checking for executable files ... OK * checking for hidden files and directories ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking whether package ‘seqtrie’ can be installed ... [121s/177s] OK See 'https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/seqtrie-00install.html' for details. * used C++ compiler: ‘clang version 18.1.5’ * checking installed package size ... OK * checking package directory ... OK * checking ‘build’ directory ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking for left-over files ... OK * checking index information ... OK * checking package subdirectories ... OK * checking code files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking whether the package can be unloaded cleanly ... OK * checking whether the namespace can be loaded with stated dependencies ... OK * checking whether the namespace can be unloaded cleanly ... OK * checking loading without being on the library search path ... OK * checking whether startup messages can be suppressed ... OK * checking use of S3 registration ... OK * checking dependencies in R code ... OK * checking S3 generic/method consistency ... OK * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... [9s/11s] OK * checking Rd files ... OK * checking Rd metadata ... OK * checking Rd line widths ... OK * checking Rd cross-references ... OK * checking for missing documentation entries ... OK * checking for code/documentation mismatches ... OK * checking Rd \usage sections ... OK * checking Rd contents ... OK * checking for unstated dependencies in examples ... OK * checking contents of ‘data’ directory ... OK * checking data for non-ASCII characters ... OK * checking LazyData ... OK * checking data for ASCII and uncompressed saves ... OK * checking line endings in C/C++/Fortran sources/headers ... OK * checking line endings in Makefiles ... OK * checking compilation flags in Makevars ... OK * checking for GNU extensions in Makefiles ... NOTE GNU make is a SystemRequirements. * checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK * checking use of PKG_*FLAGS in Makefiles ... OK * checking use of SHLIB_OPENMP_*FLAGS in Makefiles ... OK * checking pragmas in C/C++ headers and code ... OK * checking compilation flags used ... OK * checking compiled code ... OK * checking installed files from ‘inst/doc’ ... OK * checking files in ‘vignettes’ ... OK * checking examples ... OK * checking for unstated dependencies in ‘tests’ ... OK * checking tests ... [16m/11m] ERROR Running ‘test_RadixForest.R’ [32s/45s] Running ‘test_RadixTree.R’ [753s/552s] Running ‘test_pairwise.R’ [158s/89s] Running the tests in ‘tests/test_pairwise.R’ failed. Complete output: > # This test file tests the `dist_matrix` and `dist_pairwise` functions > # These two functions are simple dynamic programming algorithms for computing pairwise distances and are themselves used to validate > # the RadixTree imeplementation (see test_radix_tree.R) > > library(seqtrie) > library(stringi) > library(stringdist) > library(Biostrings) Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply, union, unique, unsplit, which.max, which.min Loading required package: S4Vectors Loading required package: stats4 Attaching package: 'S4Vectors' The following object is masked from 'package:utils': findMatches The following objects are masked from 'package:base': I, expand.grid, unname Loading required package: IRanges Loading required package: XVector Loading required package: GenomeInfoDb Attaching package: 'Biostrings' The following object is masked from 'package:base': strsplit > library(dplyr) Attaching package: 'dplyr' The following objects are masked from 'package:Biostrings': collapse, intersect, setdiff, setequal, union The following object is masked from 'package:GenomeInfoDb': intersect The following object is masked from 'package:XVector': slice The following objects are masked from 'package:IRanges': collapse, desc, intersect, setdiff, slice, union The following objects are masked from 'package:S4Vectors': first, intersect, rename, setdiff, setequal, union The following objects are masked from 'package:BiocGenerics': combine, intersect, setdiff, union The following objects are masked from 'package:stats': filter, lag The following objects are masked from 'package:base': intersect, setdiff, setequal, union > > # Use 2 threads on github actions and CRAN, 4 threads locally > IS_LOCAL <- Sys.getenv("IS_LOCAL") != "" > NTHREADS <- ifelse(IS_LOCAL, 4, 2) > NITER <- ifelse(IS_LOCAL, 4, 1) > NSEQS <- 10000 > MAXSEQLEN <- 200 > CHARSET <- "ACGT" > > random_strings <- function(N, charset = "abcdefghijklmnopqrstuvwxyz") { + charset_stri <- paste0("[", charset, "]") + len <- sample(0:MAXSEQLEN, N, replace=TRUE) + result <- lapply(0:MAXSEQLEN, function(x) { + nx <- sum(len == x) + if(nx == 0) return(character()) + stringi::stri_rand_strings(nx, x, pattern = charset_stri) + }) + sample(unlist(result)) + } > > mutate_strings <- function(x, prob = 0.025, indel_prob = 0.025, charset = "abcdefghijklmnopqrstuvwxyz") { + charset <- unlist(strsplit(charset, "")) + xsplit <- strsplit(x, "") + sapply(xsplit, function(a) { + r <- runif(length(a)) < prob + a[r] <- sample(charset, sum(r), replace=TRUE) + ins <- runif(length(a)) < indel_prob + a[ins] <- paste0(sample(charset, sum(ins), replace=TRUE), sample(charset, sum(ins), replace=TRUE)) + del <- runif(length(a)) < indel_prob + a[del] <- "" + paste0(a, collapse = "") + }) + } > > # Biostrings notes: > # subject (target) must be of length 1 or equal to pattern (query) > # To get a distance matrix, iterate over target and perform a column bind > # special_zero_case -- if both query and target are empty, Biostrings fails with an error > pairwiseAlignmentFix <- function(pattern, subject, ...) { + results <- rep(0, length(subject)) + special_zero_case <- nchar(pattern) == 0 & nchar(subject) == 0 + if(all(special_zero_case)) { + results + } else { + results[!special_zero_case] <- Biostrings::pairwiseAlignment(pattern=pattern[!special_zero_case], subject=subject[!special_zero_case], ...) + results + } + } > > biostrings_matrix_global <- function(query, target, cost_matrix, gap_cost, gap_open_cost = 0) { + substitutionMatrix <- -cost_matrix + lapply(query, function(x) { + query2 <- rep(x, length(target)) + -pairwiseAlignmentFix(pattern=query2, subject=target, substitutionMatrix = substitutionMatrix, gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global") + }) %>% do.call(rbind, .) + } > > biostrings_pairwise_global <- function(query, target, cost_matrix, gap_cost, gap_open_cost = 0) { + substitutionMatrix <- -cost_matrix + -pairwiseAlignment(pattern=query, subject=target, substitutionMatrix = substitutionMatrix,gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global") + } > > biostrings_matrix_anchored <- function(query, target, query_size, target_size, cost_matrix, gap_cost, gap_open_cost = 0) { + substitutionMatrix <- -cost_matrix + lapply(seq_along(query), function(i) { + query2 <- substring(query[i], 1, query_size[i,,drop=TRUE]) + target2 <- substring(target, 1, target_size[i,,drop=TRUE]) + -pairwiseAlignmentFix(pattern=query2, subject=target2, substitutionMatrix = substitutionMatrix, gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global") + }) %>% do.call(rbind, .) + } > > biostrings_pairwise_anchored <- function(query, target, query_size, target_size, cost_matrix, gap_cost, gap_open_cost = 0) { + substitutionMatrix <- -cost_matrix + query2 <- substring(query, 1, query_size) + target2 <- substring(target, 1, target_size) + -pairwiseAlignmentFix(pattern=query2, subject=target2, substitutionMatrix = substitutionMatrix, gapOpening=gap_open_cost, gapExtension=gap_cost, scoreOnly=TRUE, type="global") + } > > for(. in 1:NITER) { + + print("Checking hamming search correctness") + local({ + # Note: seqtrie returns `NA_integer_` for hamming distance when the lengths are different + # whereas stringdist returns `Inf` + # This is why we need to replace `NA_integer_` with `Inf` when comparing results + + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + results_seqtrie <- dist_matrix(query, target, mode = "hamming", nthreads=NTHREADS) + results_seqtrie[is.na(results_seqtrie)] <- Inf + results_stringdist <- stringdist::stringdistmatrix(query, target, method = "hamming", nthread=NTHREADS) + stopifnot(all(results_seqtrie == results_stringdist)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "hamming", nthreads=NTHREADS) + results_seqtrie[is.na(results_seqtrie)] <- Inf + results_stringdist <- stringdist::stringdist(query_pairwise, target, method = "hamming", nthread=NTHREADS) + stopifnot(all(results_seqtrie == results_stringdist)) + }) + + print("Checking levenshtein search correctness") + local({ + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + results_seqtrie <- dist_matrix(query, target, mode = "levenshtein", nthreads=NTHREADS) + results_stringdist <- stringdist::stringdistmatrix(query, target, method = "lv", nthread=NTHREADS) + stopifnot(all(results_seqtrie == results_stringdist)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "levenshtein", nthreads=NTHREADS) + results_stringdist <- stringdist::stringdist(query_pairwise, target, method = "lv", nthread=NTHREADS) + stopifnot(all(results_seqtrie == results_stringdist)) + }) + + print("Checking anchored search correctness") + local({ + # There is no anchored search in stringdist (or elsewhere). To get the same results, we substring the query and target sequences + # By the results of the seqtrie anchored search and then compare the results + + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + results_seqtrie <- dist_matrix(query, target, mode = "anchored", nthreads=NTHREADS) + query_size <- attr(results_seqtrie, "query_size") + target_size <- attr(results_seqtrie, "target_size") + results_stringdist <- lapply(seq_along(query), function(i) { + query_size2 <- query_size[i,,drop=TRUE] + target_size2 <- target_size[i,,drop=TRUE] + query2 <- substring(query[i], 1, query_size2) # query[i] is recycled + target2 <- substring(target, 1, target_size2) + stringdist::stringdist(query2, target2, method = "lv", nthread=NTHREADS) + }) %>% do.call(rbind, .) + stopifnot(all(results_seqtrie == results_stringdist)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "anchored", nthreads=NTHREADS) + query_size <- attr(results_seqtrie, "query_size") + target_size <- attr(results_seqtrie, "target_size") + query2 <- substring(query_pairwise, 1, query_size) + target2 <- substring(target, 1, target_size) + results_stringdist <- stringdist::stringdist(query2, target2, method = "lv", nthread=NTHREADS) + stopifnot(all(results_seqtrie == results_stringdist)) + }) + + print("Checking global search with linear gap for correctness") + local({ + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET)) + diag(cost_matrix) <- 0 + colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]] + gap_cost <- sample(1:3, size = 1) + results_seqtrie <- dist_matrix(query, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS) + results_biostrings <- biostrings_matrix_global(query, target, cost_matrix = cost_matrix, gap_cost = gap_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS) + results_biostrings <- biostrings_pairwise_global(query_pairwise, target, cost_matrix = cost_matrix, gap_cost = gap_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + }) + + print("Checking anchored search with linear gap for correctness") + local({ + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET)) + diag(cost_matrix) <- 0 + colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]] + gap_cost <- sample(1:3, size = 1) + results_seqtrie <- dist_matrix(query, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS) + query_size <- attr(results_seqtrie, "query_size") + target_size <- attr(results_seqtrie, "target_size") + results_biostrings <- biostrings_matrix_anchored(query, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, nthreads=NTHREADS) + query_size <- attr(results_seqtrie, "query_size") + target_size <- attr(results_seqtrie, "target_size") + results_biostrings <- biostrings_pairwise_anchored(query_pairwise, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + }) + + + print("Checking global search with affine gap for correctness") + local({ + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET)) + diag(cost_matrix) <- 0 + colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]] + gap_cost <- sample(1:3, size = 1) + gap_open_cost <- sample(1:3, size = 1) + results_seqtrie <- dist_matrix(query, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cos=gap_open_cost, nthreads=NTHREADS) + results_biostrings <- biostrings_matrix_global(query, target, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "levenshtein", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost, nthreads=NTHREADS) + results_biostrings <- biostrings_pairwise_global(query_pairwise, target, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + }) + + print("Checking anchored search with affine gap for correctness") + local({ + target <- c(random_strings(NSEQS, CHARSET),"") %>% unique + query <- sample(c(sample(target, NSEQS/1000), random_strings(NSEQS/1000, CHARSET))) + query <- c(mutate_strings(query, indel_prob=0, charset = CHARSET), "") %>% unique + + # Check matrix results + cost_matrix <- matrix(sample(1:3, size = nchar(CHARSET)^2, replace=TRUE), nrow=nchar(CHARSET)) + diag(cost_matrix) <- 0 + colnames(cost_matrix) <- rownames(cost_matrix) <- strsplit(CHARSET, "")[[1]] + gap_cost <- sample(1:3, size = 1) + gap_open_cost <- sample(1:3, size = 1) + results_seqtrie <- dist_matrix(query, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost, nthreads=NTHREADS) + query_size <- attr(results_seqtrie, "query_size") + target_size <- attr(results_seqtrie, "target_size") + results_biostrings <- biostrings_matrix_anchored(query, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + + # Check pairwise results + query_pairwise <- mutate_strings(target, prob=0.025, indel_prob=0.05, charset = CHARSET) + results_seqtrie <- dist_pairwise(query_pairwise, target, mode = "anchored", cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost, nthreads=NTHREADS) + query_size <- attr(results_seqtrie, "query_size") + target_size <- attr(results_seqtrie, "target_size") + results_biostrings <- biostrings_pairwise_anchored(query_pairwise, target, query_size, target_size, cost_matrix = cost_matrix, gap_cost = gap_cost, gap_open_cost=gap_open_cost) + stopifnot(all(results_seqtrie == results_biostrings)) + }) + } [1] "Checking hamming search correctness" [1] "Checking levenshtein search correctness" [1] "Checking anchored search correctness" [1] "Checking global search with linear gap for correctness" Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': Could not load package pwalign. Is it installed? Note that starting with BioC 3.19, calling pairwiseAlignment() requires the pwalign package. Please install it with: BiocManager::install("pwalign") Calls: ... -> .call_fun_in_pwalign -> .load_package_gracefully Execution halted * checking for unstated dependencies in vignettes ... OK * checking package vignettes ... OK * checking re-building of vignette outputs ... [15s/12s] OK * checking PDF version of manual ... [10s/10s] OK * checking HTML version of manual ... OK * checking for non-standard things in the check directory ... OK * checking for detritus in the temp directory ... OK * DONE Status: 1 ERROR, 1 NOTE