Rfast reference manual

Μιχαήλ Τσαγρής; Ioannis  Tsamardinos

Rfast reference manual

Μιχαήλ Τσαγρής

Ioannis Tsamardinos

visibility

…

description

281 pages

link

1 file

A collection of fast and very fast R functions written in R or C++.

Package ‘Rfast’ March 10, 2018 Type Package Title A Collection of Efficient and Extremely Fast R Functions Version 1.8.8 Date 2018-03-10 Author Manos Papadakis, Michail Tsagris, Marios Dimitriadis, Stefanos Fafalios, Ioannis Tsamardinos, Matteo Fasiolo, Giorgos Borboudakis, John Burkardt, Changliang Zou and Kleanthi Lakiotaki Maintainer Manos Papadakis <papadakm95@gmail.com> Depends R (>= 3.2.2), Rcpp (>= 0.12.3), RcppZiggurat LinkingTo Rcpp (>= 0.12.3), RcppArmadillo SystemRequirements C++11 URL https://rfast.eu Description A collection of fast (utility) functions for data analysis. Column- and rowwise means, medians, variances, minimums, maximums, many t, F and Gsquare tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. License GPL (>= 2.0) LazyData TRUE NeedsCompilation yes Repository CRAN Date/Publication 2018-03-10 22:49:03 UTC R topics documented: Rfast-package . . . . . . . . . . . . . . . . . . . . All k possible combinations from n elements . . . . Analysis of covariance . . . . . . . . . . . . . . . Analysis of variance with a count variable . . . . . Angular central Gaussian random values simulation ANOVA for two quasi Poisson regression models . Backward selection regression . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . 7 . 8 . 9 . 10 . 11 . 13 R topics documented: 2 BIC (using partial correlation) forward regression . . . . . . BIC forward regression with generalised linear models . . . Binary search algorithm . . . . . . . . . . . . . . . . . . . . Binomial coefficient and its logarithm . . . . . . . . . . . . Bootstrap t-test for 2 independent samples . . . . . . . . . . Check if any column or row is fill with zeros . . . . . . . . . Check if values are integers and convert to integer . . . . . . Check Namespace and Rd files . . . . . . . . . . . . . . . . Check whether a square matrix is symmetric . . . . . . . . . Cholesky decomposition of a square matrix . . . . . . . . . Circular or angular regression . . . . . . . . . . . . . . . . Circular-linear correlation . . . . . . . . . . . . . . . . . . . Column and row wise coefficients of variation . . . . . . . . Column and row-wise Any/All . . . . . . . . . . . . . . . . Column and row-wise means of a matrix . . . . . . . . . . . Column and row-wise medians . . . . . . . . . . . . . . . . Column and row-wise nth smallest value of a matrix/vector . Column and row-wise Order - Sort Indices . . . . . . . . . . Column and row-wise products . . . . . . . . . . . . . . . . Column and row-wise range of values of a matrix . . . . . . Column and row-wise ranks . . . . . . . . . . . . . . . . . Column and row-wise Shuffle . . . . . . . . . . . . . . . . Column and row-wise sums of a matrix . . . . . . . . . . . Column and row-wise tabulate . . . . . . . . . . . . . . . . Column and row-wise variances and standard deviations . . Column and rows-wise mean absolute deviations . . . . . . Column-row wise minima and maxima of two matrices . . . Column-wise differences . . . . . . . . . . . . . . . . . . . Column-wise kurtosis and skewness coefficients . . . . . . . Column-wise matching coefficients . . . . . . . . . . . . . . Column-wise minimum and maximum . . . . . . . . . . . . Column-wise MLE of some univariate distributions . . . . . Column-wise true/false value . . . . . . . . . . . . . . . . . Column-wise uniformity Watson test for circular data . . . . Column-wise Yule’s Y (coefficient of colligation) . . . . . . Correlation based forward regression . . . . . . . . . . . . . Correlation between pairs of variables . . . . . . . . . . . . Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . Covariance and correlation matrix . . . . . . . . . . . . . . Cox confidence interval for the ratio of two Poisson variables Cross-Validation for the k-NN algorithm . . . . . . . . . . . data.frame.to_matrix . . . . . . . . . . . . . . . . . . . . . Density of the multivariate normal and t distributions . . . . Design Matrix . . . . . . . . . . . . . . . . . . . . . . . . . Diagonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . Distance between vectors and a matrix . . . . . . . . . . . . Distance correlation . . . . . . . . . . . . . . . . . . . . . . Distance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 16 17 18 19 20 21 23 24 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 55 56 57 58 60 61 62 63 64 65 66 R topics documented: Distance variance and covariance . . . . . . . . . . . . . . . . . . . . . . . . . . Eigenvalues and eigenvectors in high dimensional principal component analysis . Energy distance between matrices . . . . . . . . . . . . . . . . . . . . . . . . . Equality of objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation of an AR(1) model . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation of the Box-Cox transformation . . . . . . . . . . . . . . . . . . . . . Exponential empirical likelihood for a one sample mean vector hypothesis testing Exponential empirical likelihood hypothesis testing for two mean vectors . . . . FBED variable selection method using the correlation . . . . . . . . . . . . . . . Find element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find the given value in a hash table . . . . . . . . . . . . . . . . . . . . . . . . . Fitted probabilities of the Terry-Bradley model . . . . . . . . . . . . . . . . . . Fitting a Dirichlet distribution via Newton-Rapshon . . . . . . . . . . . . . . . . Floyd-Warshall algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forward selection with generalised linear regression models . . . . . . . . . . . G-square test of conditional indepdence . . . . . . . . . . . . . . . . . . . . . . Gaussian regression with a log-link . . . . . . . . . . . . . . . . . . . . . . . . . Generates random values from a normal and puts them in a matrix . . . . . . . . Get specific columns/rows fo a matrix . . . . . . . . . . . . . . . . . . . . . . . Hash - Pair function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hash object to a list object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . High dimensional MCD based detection of outliers . . . . . . . . . . . . . . . . Hypothesis test for the distance correlation . . . . . . . . . . . . . . . . . . . . . Hypothesis test for two means of percentages . . . . . . . . . . . . . . . . . . . Hypothesis test for von Mises-Fisher distribution over Kent distribution . . . . . Hypothesis testing betseen two skewness or kurtosis coefficients . . . . . . . . . Index of the columns of a data.frame which are factor variables . . . . . . . . . . Insert new function names in the NAMESPACE file . . . . . . . . . . . . . . . . Inverse of a symmetric positive definite matrix . . . . . . . . . . . . . . . . . . . James multivariate version of the t-test . . . . . . . . . . . . . . . . . . . . . . . k nearest neighbours algorithm (k-NN) . . . . . . . . . . . . . . . . . . . . . . . k-NN algorithm using the arc cosinus distance . . . . . . . . . . . . . . . . . . . Linear models for large scale data . . . . . . . . . . . . . . . . . . . . . . . . . Logistic and Poisson regression models . . . . . . . . . . . . . . . . . . . . . . Logistic or Poisson regression with a single categorical predictor . . . . . . . . . Lower and Upper triangular of a matrix . . . . . . . . . . . . . . . . . . . . . . Mahalanobis distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many (and one) area aunder the curve values . . . . . . . . . . . . . . . . . . . Many 2 sample proportions tests . . . . . . . . . . . . . . . . . . . . . . . . . . Many 2 sample tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many analysis of variance tests with a discrete variable . . . . . . . . . . . . . . Many ANCOVAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many exponential regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . Many F-tests with really huge matrices . . . . . . . . . . . . . . . . . . . . . . . Many G-square tests of indepedence . . . . . . . . . . . . . . . . . . . . . . . . Many Gini coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many hypothesis tests for two means of percentages . . . . . . . . . . . . . . . . Many moment and maximum likelihood estimations of variance components . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 68 70 71 72 73 74 75 77 78 79 80 81 82 84 85 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 103 104 105 107 108 110 111 112 113 114 116 117 118 119 121 122 123 4 R topics documented: Many multi-sample tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many multivariate simple linear regressions coefficients . . . . . . . . . . . . . . . Many non parametric multi-sample tests . . . . . . . . . . . . . . . . . . . . . . . Many odds ratio tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many one sample goodness of fit tests for categorical data . . . . . . . . . . . . . Many one sample tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many random intercepts LMMs for balanced data with a single identical covariate. Many regression based tests for single sample repeated measures . . . . . . . . . . Many score based GLM regressions . . . . . . . . . . . . . . . . . . . . . . . . . Many score based regression models . . . . . . . . . . . . . . . . . . . . . . . . . Many Shapiro-Francia normality tests . . . . . . . . . . . . . . . . . . . . . . . . Many simple circular or angular regressions . . . . . . . . . . . . . . . . . . . . . Many simple Gaussian regressions with a log-link . . . . . . . . . . . . . . . . . . Many simple geometric regressions . . . . . . . . . . . . . . . . . . . . . . . . . Many simple linear mixed model regressions . . . . . . . . . . . . . . . . . . . . Many simple linear regressions coefficients . . . . . . . . . . . . . . . . . . . . . Many simple multinomial regressions . . . . . . . . . . . . . . . . . . . . . . . . Many tests for the dispersion parameter in Poisson distribution . . . . . . . . . . . Many two-way ANOVAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Many univariate generalised linear models . . . . . . . . . . . . . . . . . . . . . . Many univariate simple binary logistic regressions . . . . . . . . . . . . . . . . . . Many univariate simple linear regressions . . . . . . . . . . . . . . . . . . . . . . Many univariate simple poisson regressions . . . . . . . . . . . . . . . . . . . . . Many univariate simple quasi poisson regressions . . . . . . . . . . . . . . . . . . Many Welch’s F-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix with all pairs of t-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix with G-square tests of indepedence . . . . . . . . . . . . . . . . . . . . . . Mean - Median absolute deviation of a vector . . . . . . . . . . . . . . . . . . . . Median of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minima and maxima of two vectors/matrices . . . . . . . . . . . . . . . . . . . . . minimum and maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . minimum and maximum frequencies . . . . . . . . . . . . . . . . . . . . . . . . . MLE for multivariate discrete data . . . . . . . . . . . . . . . . . . . . . . . . . . MLE of (hyper-)spherical distributions . . . . . . . . . . . . . . . . . . . . . . . . MLE of continuous univariate distributions defined on the positive line . . . . . . . MLE of continuous univariate distributions defined on the real line . . . . . . . . . MLE of count data (univariate discrete distributions) . . . . . . . . . . . . . . . . MLE of distributions defined in the (0, 1) interval . . . . . . . . . . . . . . . . . . MLE of some circular distributions . . . . . . . . . . . . . . . . . . . . . . . . . . MLE of the inverted Dirichlet distribution . . . . . . . . . . . . . . . . . . . . . . MLE of the multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . MLE of the ordinal model without covariates . . . . . . . . . . . . . . . . . . . . MLE of the tobit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moment and maximum likelihood estimation of variance components . . . . . . . Multi-sample tests for vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multinomial regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 127 128 129 130 131 133 134 136 138 139 141 142 143 144 145 146 147 148 149 151 152 154 155 156 157 158 159 161 162 163 164 165 166 167 169 171 172 174 176 177 178 179 180 181 183 185 186 R topics documented: Multivariate Laplace random values simulation . . . . . . . . . . . . . . . . . . . . . . Multivariate normal and t random values simulation . . . . . . . . . . . . . . . . . . . . Naive Bayes classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natural Logarithm each element of a matrix . . . . . . . . . . . . . . . . . . . . . . . . Natural logarithm of the beta function . . . . . . . . . . . . . . . . . . . . . . . . . . . Natural logarithm of the gamma function and its derivatives . . . . . . . . . . . . . . . . Norm of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of equal columns between two matrices . . . . . . . . . . . . . . . . . . . . . . Odds ratio and relative risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One sample empirical and exponential empirical likelihood test . . . . . . . . . . . . . . One sample t-test for a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operations between two matrices or matrix and vector . . . . . . . . . . . . . . . . . . Orthogonal matching pursuit regression . . . . . . . . . . . . . . . . . . . . . . . . . . Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permutation based p-value for the Pearson correlation coefficient . . . . . . . . . . . . . Prediction with some naive Bayes classifiers . . . . . . . . . . . . . . . . . . . . . . . . Quasi binomial regression for proportions . . . . . . . . . . . . . . . . . . . . . . . . . Quasi Poisson regression for count data . . . . . . . . . . . . . . . . . . . . . . . . . . Random intercepts linear mixed models . . . . . . . . . . . . . . . . . . . . . . . . . . Random values simulation from a von Mises distribution . . . . . . . . . . . . . . . . . Ranks of the values of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reading the files of a directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Repeated measures anova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Replicate columns/rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Round each element of a matrix/vector . . . . . . . . . . . . . . . . . . . . . . . . . . . Row - Wise matrix/vector count the frequency of a value . . . . . . . . . . . . . . . . . Row-wise minimum and maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Row-wise true value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search for variables with zero range in a matrix . . . . . . . . . . . . . . . . . . . . . . Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression Simulation of random values from a Bingham distribution . . . . . . . . . . . . . . . . Simulation of random values from a Bingham distribution with any symmetric matrix . . Simulation of random values from a normal distribution . . . . . . . . . . . . . . . . . . Simulation of random values from a von Mises-Fisher distribution . . . . . . . . . . . . Skeleton of the PC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Skewness and kurtosis coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some summary statistics of a vector for each level of a grouping variable . . . . . . . . . Sort - Sort a vector coresponding to another . . . . . . . . . . . . . . . . . . . . . . . . Sort and unique numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sorting of the columns-rows of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . Source many R files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial median for Euclidean data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spherical and hyperspherical median . . . . . . . . . . . . . . . . . . . . . . . . . . . . Standardisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sub-matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sum of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sum of all pairwise distances in a distance matrix . . . . . . . . . . . . . . . . . . . . . Sums of a vector for each level of a grouping variable . . . . . . . . . . . . . . . . . . . 5 187 188 189 191 192 193 194 195 196 197 198 200 201 202 203 204 205 207 209 210 211 212 214 215 216 217 218 219 220 221 222 223 224 225 227 229 230 231 233 234 235 236 237 238 239 240 240 242 6 Rfast-package Table Creation - Frequency of each value . . . . . . . . Tests for the dispersion parameter in Poisson distribution Topological sort of a DAG . . . . . . . . . . . . . . . . Transpose of a matrix . . . . . . . . . . . . . . . . . . . Two sample exponential empirical likelihood test . . . . Uniformity test for circular data . . . . . . . . . . . . . Variance of a vector . . . . . . . . . . . . . . . . . . . . Vector allocation in a symmetric matrix . . . . . . . . . Weibull regression model . . . . . . . . . . . . . . . . . Yule’s Y (coefficient of colligation) . . . . . . . . . . . . Index Rfast-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 244 246 247 248 249 250 251 252 253 255 Really fast R functions Description A collection of Rfast functions for data analysis. Note 1: The vast majority of the functions accept matrices only, not data.frames. Note 2: Do not have matrices or vectors with have missing data (i.e NAs). We do no check about them and C++ internally transforms them into zeros (0), so you may get wrong results. Note 3: In general, make sure you give the correct input, in order to get the correct output. We do no checks and this is one of the many reasons we are fast. Details Package: Type: Version: Date: License: Rfast Package 1.8.8 2018-03-10 GPL-2 Maintainers Manos Papadakis <papadakm95@gmail.com> Note Acknowledgments: We would like to acknowledge Professor Kurt Hornik, Doctor Uwe Ligges (and the rest of R core team) for their invaluable help with this R package. Erich Studerus for his invaluable comments and Neal Fultz for his suggestions. Vassilis Vasdekis for his invaluable help with the random effects models. Marios Dimitriadis work was funded by the Special Account for Research Funds of the University of Crete, Department of Computer Science. Phillip Si is greatly acknowledged for his All k possible combinations from n elements 7 help with the Floyd-Warshal algorithm. Keefe Murphy for his invaluable help with NEWS file and for his suggestions. Zacharias Papadovassilakis gave us the inspiration for the memory efficient version of the k-NN algorithm. Yannis Pantazis explained us how the orhtogonal matching pursuit works. Achim Zeileis for his help with the quasi Poisson regression models. Pedro J. Aphalo for finding a bug. Dimitris Yannikis for finding a bug. Acknowledgements to Christina Chatzipantsiou for her idea with the "permcor" and the "boot.ttest2" functions. Author(s) Manos Papadakis <papadakm95@gmail.com>, Michail Tsagris <mtsagris@yahoo.gr>, Marios Dimitriadis <kmdimitriadis@gmail.com>, Stefanos Fafalios <stefanosfafalios@gmail.com>, Ioannis Tsamardinos <tsamard@csd.uoc.gr>, Matteo Fasiolo <matteo.fasiolo@gmail.com>, Giorgos Borboudakis <borbudak@gmail.com>, John Burkardt <jburkardt@fsu.edu> and Kleanthi Lakiotaki <kliolak@gmail.com> All k possible combinations from n elements All k possible combinations from n elements Description All k possible combinations from n elements. Usage comb_n(n, k) Arguments n A positive or negative INTEGER number or a vector with numbers. k A positive integer number at most equal to n or at most equal to the length of n, if n is a vector. Value A matrix with k columns and rows equal to the number of possible unique combinations of n with k elements. Author(s) Manos Papadakis and Marios Dimitriadis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> and Marios Dimitriadis <kmdimitriadis@gmail.com>. References Nijenhuis A. and Wilf H.S. (1978). Combinatorial Algorithms for Computers and Calculators. Academic Press, NY. 8 Analysis of covariance See Also nth, colMaxs, colMins, colrange Examples system.time( comb_n(20, 4) ) system.time( combn(20, 4) ) x <- rnorm(5) comb_n(x, 3) Analysis of covariance Analysis of covariance Description Analysis of covariance Usage ancova1(y, ina, x, logged = FALSE) Arguments y A numerical vector with the data, the response variable. ina A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Be careful, the function is desinged to accept numbers greater than zero. x A numerical vector whose length is equal to the number of rows of y. This is the covariate. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details Analysis of covariance is performed. No interaction between the factor and the covariate is tested. Only the main effects. The design need not be balanced. The values of ina need not have the same frequency. The sums of squares have been adjusted to accept balanced and unbalanced designs. Value A matrix with the test statistic and the p-value for the factor variable and the covariate. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Analysis of variance with a count variable 9 References D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons See Also ancovas, ftests, ttests, anova1 Examples y <- rnorm(90) ina <- rbinom(90, 2, 0.5) + 1 x <- rnorm(90) system.time( a <- ancova1(y, ina, x) ) m1 <- lm(y ~ factor(ina) + x) m2 <- lm(y ~ x + factor(ina)) anova(m1) anova(m2) Analysis of variance with a count variable Analysis of variance with a count variable Description Analysis of variance with a count variable. Usage poisson.anova(y, ina, logged = FALSE) geom.anova(y, ina, type = 1, logged = FALSE) quasipoisson.anova(y, ina, logged = FALSE) Arguments y A numerical vector with discrete valued data, i.e. counts. ina A numerical vector with discrete numbers starting from 1, i.e. 1, 2, 3, 4,... or a factor variable. This is suppose to be a categorical predictor. If you supply a continuous valued vector the function will obviously provide wrong results. type This argument is for the geometric distribution. Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? 10 Angular central Gaussian random values simulation Details This is the analysis of variance with Poisson or geometric distributed data. What we do is a loglikelihood ratio test. However, this is exactly the same as Poisson regression with a single predictor variable who happens to be categorical. Needless to say that this is faster function than the glm command in R. For the same purpose with a Bernoulli variable use g2Test. The quasinpoisson.anova is when in the glm function you specify family = quasipoisson. This is suitable for the case of over or under-dispersed data. Value A vector with two values, the difference in the deviances (or the scale difference in the case of quasi poisson) and the relevant p-value. The quasipoisson.anova also returns the estimate of the φ parameter. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also logistic.cat1, g2Test, poisson.anovas, anova, poisson_only,poisson.mle Examples y <- rpois(300, 10) ina <- rbinom(300, 3, 0.5) + 1 a1 <- poisson.anova(y, ina) a2 <- glm(y ~ ina, poisson) a1 anova(a2, test = "Chisq") y <- rgeom(300, 0.7) geom.anova(y, ina) Angular central Gaussian random values simulation Angular central Gaussian random values simulation Description Angular central Gaussian random values simulation. Usage racg(n, sigma) ANOVA for two quasi Poisson regression models 11 Arguments n The sample size, a numerical value. sigma The covariance matrix in Rd . Details The algorithm uses univariate normal random values and transforms them to multivariate via a spectral decomposition. The vectors are then scaled to have unit length. Value A matrix with the simulated data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Tyler D. E. (1987). Statistical analysis for the angular central Gaussian distribution on the sphere. Biometrika 74(3): 579-589. See Also acg.mle, rmvnorm, rmvlaplace, rmvt Examples s <- cov( iris[, 1:4] ) x <- racg(100, s) acg.mle(x) vmf.mle(x) ## the concentration parameter, kappa, is very low, close to zero, as expected. ANOVA for two quasi Poisson regression models ANOVA for two quasi Poisson regression models Description ANOVA for two quasi Poisson regression models. Usage anova_quasipois.reg(mod0, mod1, n) 12 ANOVA for two quasi Poisson regression models Arguments mod0 An object as returned by the "qpois.reg" function. This is the null model. mod1 An object as returned by the "qpois.reg" function. This is the alternative model. n The sample size. This is necessary to calculate the degrees of freedom. Details This is an ANOVA type significance testing for two quasi Poisson models. Value A vector with 4 elements, the test statistic value, its associated p-value and the relevant degrees of freedom of the numerator and the denominator. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Papke L. E. & Wooldridge J. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11(6): 619–632. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also anova_qpois.reg, qpois.reg, univglms, quasipoisson.anova Examples y <- rnbinom(200, 10, 0.5) x <- matrix(rnorm(200 * 3), ncol = 3) a1 <- qpois.reg(x, y) a0 <- qpois.reg(x[, 1], y) anova_quasipois.reg(a0, a1, 200) b1 <- glm(y ~ x, family = quasipoisson) b0 <- glm(y ~ x[, 1], family = quasipoisson) anova(b0, b1, test = "F") c1 <- glm(y ~ x, family = poisson) c0 <- glm(y ~ x[, 1], family = poisson) anova(c0, c1, test = "Chisq") Backward selection regression 13 Backward selection regression Backward selection regression Description Backward selection regression. Usage bs.reg(y, x, alpha = 0.05, type = "logistic") Arguments y A numerical vector with the response variable values. It can either be of 0 and 1 values (Logistic regression) or of integer values 0, 1, 2,... (Poisson regression). x A numerical matrix with the candidate variables. alpha Threshold (suitable values are in [0,1]) for assessing the significance of p-values. The default value is at 0.05. type For the Logistic regression put "logistic" (default value) and for Poisson type "poisson". Details This function currently implements only the binary Logistic and Poisson regressions. If the sample size is less than the number of variables a notification message will appear and no backward regression will be performed. Value The output of the algorithm is an S3 object including: info A matrix with the non selected variables and their latest test statistics and pvalues. Vars A vector with the selected variables. Author(s) Marios Dimitriadis R implementation and documentation: Marios Dimitriadis <mtsagris@csd.uoc.gr> See Also fs.reg, univglms, cor.fsreg 14 BIC (using partial correlation) forward regression Examples y <- rbinom(50, 1, 0.5) x <- matrnorm(50, 10) bs.reg(y, x) BIC (using partial correlation) forward regression BIC (using partial correlation) forward regression Description BIC (using partial correlation) forward regression. Usage bic.corfsreg(y, x, tol = 2) Arguments y A numerical vector. x A matrix with data, the predictor variables. tol If the BIC difference between two successive models is less than the tolerance value, the variable will not enter the model. Details The forward regression tries one by one the variables using the F-test, basically partial F-test every time for the latest variable. This is the same as testing the significance of the coefficient of this latest enetered variable. Alternatively the correlation can be used and this case the partial correlation coefficient. There is a direct relationship between the t-test statistic and the partial correlation coefficient. Now, instead of having to calculate the test statistic, we calculate the partial correlation coefficient. The largest partial correlation indicates the candidate variable to enter the model. If the BIC of the regression model with that variable included, reduces, less than "tol" from the previous model without this variable, the variable enters. Value A matrix with two columns, the index of the selected variable(s) and the BIC of each model. The first line is always 0 and the BIC of the model with no predictor variables. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. BIC forward regression with generalised linear models 15 References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. See Also cor.fsreg, score.glms, univglms, logistic_only,poisson_only, regression Examples ## 200 variables, hence 200 univariate regressions are to be fitted x <- matrix( rnorm(200 * 200), ncol = 200 ) y <- rnorm(200) system.time( a1 <- bic.corfsreg(y, x) ) system.time( a2 <- cor.fsreg(y, x) ) x <- NULL BIC forward regression with generalised linear models BIC forward regression with generalised linear models Description BIC forward regression with generalised linear models. Usage bic.fs.reg(y, x, tol = 2, type = "logistic") Arguments y A numerical vector. x A matrix with data, the predictor variables. tol If the BIC difference between two successive models is less than the tolerance value, the variable will not enter the model. type If you have a binary dependent variable, put "logistic". If you have count data, put "poisson". Details The forward regression tries one by one the variables using the BIC at each step for the latest variable. If the BIC of the regression model with that variable included, is less than "tol" from the previous model without this variable, the variable enters. Value A matrix with two columns, the index of the selected variable(s) and the BIC of each model. 16 Binary search algorithm Author(s) Marios Dimitriadis R implementation and documentation: Marios Dimitriadis <kmdimitriadis@gmail.com>. References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. See Also fs.reg, bic.corfsreg, cor.fsreg, score.glms, univglms, logistic_only,poisson_only, regression Examples ## 200 variables, hence 200 univariate regressions are to be fitted x <- matrix(rnorm(200 * 200), ncol = 200) y <- rbinom(200, 1, 0.5) a <- bic.fs.reg(y, x) x <- NULL Binary search algorithm Binary search algorithm Description Search a value in an ordered vector. Usage binary_search(x, v, index=FALSE) Arguments x A vector with the data. v A value to check if exists in the vector x. index A boolean value for choose to return the position inside the vector. Details The functions is written in C++ in order to be as fast as possible. Value Search if the v exists in x. Then returns TRUE/FALSE if the value is been found. Binomial coefficient and its logarithm Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also is_element Examples x <- sort(rnorm(1000)) v <- x[50] b <- binary_search(x,v) b1 <- binary_search(x,v,TRUE) Binomial coefficient and its logarithm Binomial coefficient and its logarithm Description Binomial coefficient and its logarithm. Usage Lchoose(x, k) Choose(x, k) Arguments x A vector with integer values numbers. k A positive non zero at most equal to x. Details The binomial coefficient or its logarithm are evaluated. Value A vector with the answers. Author(s) Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> 17 18 Bootstrap t-test for 2 independent samples See Also comb_n, Lbeta, Lgamma Examples x <- sample(20:30, 100, replace = TRUE) Choose(x, 4) Lchoose(x, 4) x<-NULL Bootstrap t-test for 2 independent samples Bootstrap t-test for 2 independent samples Description Bootstrap t-test for 2 independent samples. Usage boot.ttest2(x, y, B = 999) Arguments x A numerical vector with the data. y A numerical vector with the data. B The number of bootstrap samples to use. Details Instead of sampling B times from each sample, we sample sqrtB from each of them and then take all pairs. Each bootstrap sample is independent of each other, hence there is no violation of the theory. Value A vector with the test statistic and the bootstrap p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr>. Check if any column or row is fill with zeros 19 References B.L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. Efron Bradley and Robert J. Tibshirani (1993). An introduction to the bootstrap. New York: Chapman \& Hall/CRC. See Also ttest2, ftest Examples tic <- proc.time() x <- rexp(40, 4) y <- rbeta(50, 2.5, 7.5) system.time( a <- boot.ttest2(x, y, 9999) ) a Check if any column or row is fill with zeros Check if any column or row is fill with zeros Description Check if any column or row is fill with zeros. Usage colrow.zero(x) Arguments x A vector with data. Details Check all the column if any has all its elements zeros. If found, return "TRUE". Otherwise continues with rows. If columns and rows hasn’t any zero vector then return "FALSE". Even if it returns "FALSE" that doesn’t mean the determinant can’t be zero. It might be but if check before and found any zero vector then for sure the determinant it’ll be zero. Value A boolean value, "TRUE" if any column OR row is zero. "FALSE" otherwise. 20 Check if values are integers and convert to integer Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowFalse, nth, colrange, colMedians, colVars, sort_mat, rowTrue Examples x <- matrix(runif(10*10),10,10) colrow.zero(x) x<-NULL Check if values are integers and convert to integer Check if values are integers and convert to integer Description Check if values are integers and convert to integer. Usage is_integer(x) as_integer(x,result.sort = TRUE,init = 1) Arguments x is_integer: A vector with numeric data. as_integer: A vector with data. result.sort A logical value for sorting the result. init An integer value to start. Details The behavior of these functions are different than R’s built in. is_integer: check if all the values are integers in memory. If typeof is double, and the values are integers in range -2^31 : 2^31 then it is better to convert to integer vector for using less memory. Also you can decrease the time complexity. as_integer: converts the discrete values to integers. Check Namespace and Rd files 21 Value is_integer: A logical value, TRUE if all values are integers and in range -2^31 : 2^31. Otherwise FALSE. as_integer: By default the function will return the same result with "as.numeric" but the user can change the "init" value not start from 1 like R’s. Also the result can be unsorted using "result.sort". Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also as_integer, colVars, colmeans, read.directory Examples x<-runif(10) y1<-is_integer(x) # y1 is FALSE x<-as.numeric(rpois(10,10)) # integers but typeof is double y1<-is_integer(x) # y1 is TRUE so you can convert to integer vector. as_integer(letters) ## as.numeric(letters) produce errors x<-y1<-NULL Check Namespace and Rd files Check Namespace and Rd files Description Check Namespace/Rd and examples files. Usage checkNamespace(path.namespace,path.rfolder) checkAliases(path.man,path.rfolder,dont.read = "") checkTF(path.man,dont.read = "") checkExamples(path.man,each = 1,dont.read = "",print.errors = stderr(), print.names = FALSE) Arguments path.namespace An full path to the "NAMESPACE" file. path.rfolder An full path to the directory that contains the "R" files. path.man An full path to the directory that contains the "Rd" files. each An integer value for running each example. 22 Check Namespace and Rd files dont.read A character vector with the name of the files that you wish not to read. By default it’s empty "". print.errors Print the errors to a file. By default it’s "stdeer()". print.names A boolean value (TRUE/FALSE) for printing the names of the files before running the examples. Details For function "checkNamespace": reads from the NAMESPACE folder all the export R functions, reads from folder R all the R functions and check if all the functions are export. For function "checkAliases": reads from the man directory all the Rd files, then reads from each file the aliases and check if: 1) All the R files has man file or an alias. 2) All aliases belongs to functions. 3) If there are dublicated aliases. For function "checkExamples": reads from the man directory all the Rd files, then read from each file the examples and then run each of them. If you want to print the errors in any file then set "print.errors=file_name" or in the standard error "print.errors=stderr()" and then you will see all the errors for every file. For succeed run of your code you should first run "library(PACKAGE_NAME)". The argument "print.names" it is very helpful because if any of you function crashes R during running you will never know which one was. So setting it "TRUE", it will print the name of each file before runnign it’s example.It might crash, but you will know which file. Remember that there always an error timeout so it might didn’t crash the current file but one from the previous. For function checkTF: reads from the man directory all the Rd files, then read from each file the examples and checks if any examples has the values "T" and "F" instead "TRUE" and "FALSE". The "T","F" is wrong. For function checkUsage: reads from the man directory all the Rd files and for each man check if the usage section has the right signature for the functions from the R directory. This functions has 2 limitations: 1) each function should be in one R file with name function_name.R and only this function can be inside the file. 2) if you want multiple functions inside this file then the main function should the first on top. Value For function "checkNamespace": a vector with the names of missing R files. For function "checkAliases": a list with 3 fields. Missing Man files A vector with the names of the missing Rd files. Missing R files A vector with the names of the missing R files. Duplicate alias A vector with the names of the dublicate aliases. For function "checkExamples": a list with 2 fields Errors A character vector with the names of the Rd files that produced an error. Big Examples A character vector with the names of the Rd files that has big examples per line. For function "checkTF": a list with 2 fields Check whether a square matrix is symmetric TRUE A character vector with the names of the Rd files that has "T". FALSE A character vector with the names of the Rd files that has "F". 23 For function "checkUsage": a list with 2 fields missing functions A character vector with the name of the file that is missing and the Rd file that is found. missmatch functions A character vector with the name of the file that has missmatch function and the Rd file that is found. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also read.directory, AddToNamespace, sourceR, sourceRd, read.examples Examples ## Not run: # for example: # for example: # for example: # system.time( # system.time( # system.time( # system.time( # system.time( path.namespace="C:\some_file\NAMESPACE" path.rfolder="C:\some_file\R\" path.man="C:\some_file\man\" a<-checkNamespace(path.namespace,path.rfolder) ) b<-checkAliases(path.man,path.rfolder) ) b<-checkExamples(path.man) ) b<-checkExamples(path.man,2) ) b<-checkTF(path.man) ) ## End(Not run) Check whether a square matrix is symmetric Check whether a square matrix is symmetric Description Check whether a square matrix is symmetric. Usage is.symmetric(x) Arguments x A square matrix with data. 24 Cholesky decomposition of a square matrix Details Instead of going through the whole matrix, the function will stop if the first disagreement is met. Value A boolean value, TRUE of FALSE. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also cholesky, cora, cova Examples x <-matrix( rnorm( 100 * 400), ncol = 400 ) s1 <- cor(x) is.symmetric(s1) x <- x[1:100, ] is.symmetric(x) x<-s1<-NULL Cholesky decomposition of a square matrix Cholesky decomposition of a square matrix Description Cholesky decomposition of a square matrix. Usage cholesky(x,parallel = FALSE) Arguments x A square positive definite matrix. parallel A boolean value for parallel version. Details The Cholesky decomposition of a square positive definite matrix is computed. The use of parallel is suggested for matrices with dimensions of 1000 or more. Circular or angular regression 25 Value An upper triangular matrix. Author(s) Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> See Also is.symmetric Examples x = matrix(rnorm(1000 * 50), ncol = 50) s = cov(x) system.time(a1 <- cholesky(s)) system.time(a2 <- chol(s)) all.equal(a1[upper.tri(a1)], a2[upper.tri(a2)]) x <- NULL s <- NULL a1 <- NULL a2 <- NULL Circular or angular regression Circular or angular regression Description Regression with circular dependent variable and Euclidean or categorical independent variables. Usage spml.reg(y, x, tol = 1e-07, seb = FALSE) Arguments y x tol seb The dependent variable, it can be a numerical vector with data expressed in radians or it can be a matrix with two columns, the cosinus and the sinus of the circular data. The benefit of the matrix is that if the function is to be called multiple times with the same response, there is no need to transform the vector every time into a matrix. The independent variable(s). Can be Euclidean or categorical (factor variables). The tolerance value to terminatate the Newton-Raphson algorithm. Do you want the standard error of the estimates to be returned? TRUE or FALSE. 26 Circular or angular regression Details The Newton-Raphson algorithm is fitted in this regression as described in Presnell et al. (1998). Value A list including: iters The number of iterations required until convergence of the EM algorithm. be The regression coefficients. seb The standard errors of the coefficients. loglik The value of the maximised log-likelihood. seb The covariance matrix of the beta values. Author(s) Michail Tsagris and Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Presnell Brett, Morrison Scott P. and Littell Ramon C. (1998). Projected multivariate linear models for directional data. Journal of the American Statistical Association, 93(443): 1068-1077. See Also spml.mle, iag.mle, acg.mle Examples x <- rnorm(100) z <- cbind(3 + 2 * x, 1 -3 * x) y <- cbind( rnorm(100,z[ ,1], 1), rnorm(100, z[ ,2], 1) ) y <- y / sqrt( rowsums(y^2) ) a1 <- spml.reg(y, x) y <- atan( y[, 2] / y[, 1] ) + pi * I(y[, 1] < 0) a2 <- spml.reg(y, x) Circular-linear correlation Circular-linear correlation Circular-linear correlation Description It calculates the squared correlation between a circular and one or more linear variables. Usage circlin.cor(theta, x) Arguments theta x A circular variable expressed in radians. The linear variable or a matrix containing many linear variables. Details The squared correlation between a circular and one or more linear variables is calculated. Value A matrix with as many rows as linear variables including: R-squared p-value The value of the squared correlation. The p-value of the zero correlation hypothesis testing. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Mardia, K. V. and Jupp, P. E. (2000). Directional statistics. Chicester: John Wiley & Sons. See Also spml.reg Examples phi <- rvonmises(50, 2, 20, rads = TRUE) x <- 2 * phi + rnorm(50) y <- matrix(rnorm(50 * 5), ncol = 5) circlin.cor(phi, x) circlin.cor(phi, y) y <- NULL 27 28 Column and row wise coefficients of variation Column and row wise coefficients of variation Column and row wise coefficients of variation Description Column and row wise coefficients of variation. Usage colcvs(x, ln = FALSE, unbiased = FALSE) rowcvs(x, ln = FALSE, unbiased = FALSE) Arguments x A numerical matrix with the data. ln If you have log-normally distributed data (or assume you do), then set this to TRUE. unbiased A boolean variable indicating whether the unbiased for shpould be returned. This is applicable in case of small samples. Details The colum-wise coefficients of variation are calculated. Value A vector with the coefficient of variation for each column or row. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also colsums, colVars Examples m <- rnorm(100, 10) x <- matrix(rnorm(100 * 100, m, 1), ncol = 100) a1 <- colcvs(x) a2 <- colcvs(x[1:25, ], unbiased = TRUE) a3 <- colcvs( exp(x), ln = TRUE) x <- NULL Column and row-wise Any/All 29 Column and row-wise Any/All Column and row-wise Any Description Column and row-wise Any/All of a matrix. Usage colAny(x) rowAny(x) colAll(x, parallel = FALSE) rowAll(x, parallel = FALSE) Arguments x A logical matrix with the data. parallel Do you want the computations to take place in parallel? The default value is FALSE. Details The functions is written in C++ in order to be as fast as possible. Value A vector where item "i" is true if found Any/All true in column/row "i". Otherwise false. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also med, colMedians, colMeans (buit-in R function) Examples x <- matrix(as.logical(rbinom(100*100,1,0.5)),100,100) system.time( a<-colAny(x) ) system.time( b<-apply(x,2,any) ) all.equal(a,b) system.time( a<-rowAny(x) ) system.time( b<-apply(x,1,any) ) all.equal(a,b) 30 Column and row-wise means of a matrix system.time( a<-colAll(x) ) system.time( b<-apply(x,2,all) ) all.equal(a,b) a<-b<-x<-NULL Column and row-wise means of a matrix Column and row-wise means of a matrix Description Column and row-wise means of a matrix. Usage colmeans(x, parallel = FALSE) rowmeans(x) colhameans(x, parallel = FALSE) rowhameans(x) Arguments x A numerical matrix with data. parallel Do you want to do it in parallel in C++? TRUE or FALSE. Value A vector with the column or row arithmetic or harmonic means. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colsums, rowsums, colMins, colMedians, colMads Examples x <- matrix(rpois(100 * 100, 10),ncol = 100) x1 <- colmeans(x) x2 <- colMeans(x) all.equal(x1,x2) x1 <- rowmeans(x) x2 <- rowMeans(x) Column and row-wise medians all.equal(x1,x2) system.time( colhameans(x) ) system.time( rowhameans(x) ) x<-x1<-x2<-NULL Column and row-wise medians Column and row-wise medians Description Column and row-wise medians of a matrix. Usage colMedians(x,na.rm = FALSE, parallel = FALSE) rowMedians(x, parallel = FALSE) Arguments x A matrix with the data. parallel Do you want to do it in parallel in C++? TRUE or FALSE. na.rm TRUE or FAlSE for remove NAs if exists. Details The functions is written in C++ in order to be as fast as possible. Value A vector with the column medians. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also med, colVars, colMeans (buit-in R function) Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) a <- apply(x, 2, median) b1 <- colMedians(x) all.equal(as.vector(a), b1) x<-a<-b1<-NULL 31 32 Column and row-wise nth smallest value of a matrix/vector Column and row-wise nth smallest value of a matrix/vector Column and row-wise nth smallest value of a matrix/vector Description Column and row-wise nth smallest value of a matrix/vector. Usage colnth(x,elems, parallel = FALSE) rownth(x,elems, parallel = FALSE) nth(x, k,descending = FALSE,index.return = FALSE,na.rm = FALSE) Arguments x A matrix with the data. elems An integer vector with the kth smallest number to be returned for each column/row. k The kth smallest/biggest number to be returned. descending A boolean value (TRUE/FALSE) for descending order (biggest number). By default is ascending (smallest number). index.return Return the index of the kth smallest/biggest number. parallel Do you want to do it in parallel in C++? TRUE or FALSE only for col-row wise. na.rm TRUE or FAlSE for remove NAs if exists. Only for function "nth". Details The functions is written in C++ in order to be as fast as possible. Value For "colnth" , "rownth": A vector with the column/row nth For "nth": The nth value. Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also med, colMedians, colMeans (buit-in R function) Column and row-wise Order - Sort Indices 33 Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) elems <- sample(1:100,100,TRUE) system.time( colnth(x,elems) ) system.time( rownth(x,elems) ) x <- rnorm(1000) nth(x, 500) sort(x)[500] x<-elems<-NULL Column and row-wise Order - Sort Indices Column and row-wise Order - Sort Indices Description Column and row-wise Order - Sort Indices. Usage colOrder(x,stable=FALSE,descending=FALSE, parallel = FALSE) rowOrder(x,stable=FALSE,descending=FALSE, parallel = FALSE) Order(x,stable=FALSE,descending=FALSE,partial = NULL) Arguments x A matrix with numbers or a numeric/character vector. stable A boolean value for using a stable sorting algorithm. descending A boolean value (TRUE/FALSE) for sorting the vector in descending order. By default sorts the vector in ascending. parallel A boolean value for parallel version. partial A boolean value for partial sorting. Details The function applies "order" in a column or row-wise fashion or Order a vector. If you want the same results as R’s, then set "stable=TRUE" because "stable=FALSE" uses a sorting algorithm that it is not stable like R’s sort. But it is faster to use the default. This verion is faster for large data, more than 300. Value For "colOrder" and "rowOrder" a matrix with integer numbers. The result is the same as apply(x, 2, order) or apply(x, 1, order). For "Order" sort the vector and returns the indices of each element that it has before the sorting. The result is the same as order(x) but for the same exactly results set argument "stable" to "TRUE". 34 Column and row-wise products Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colsums, coldiffs, colMedians, colprods Examples x <- matrix( runif(10 * 10), ncol = 10 ) colOrder(x) apply(x, 2, order) rowOrder(x) t(apply(x, 1, order)) y <- rnorm(100) b <- Order(y) a <- order(y) all.equal(a,b) ## false because it is not stable b <- Order(y,stable=TRUE) all.equal(a,b) ## true because it is stable x<-y<-b<-a<-NULL Column and row-wise products Column and row-wise products Description Column and row-wise products. Usage colprods(x) rowprods(x) Arguments x A matrix with numbers. Details The product of the numbers in a matrix is returned either column-wise or row-wise. Value A vector with the column or the row products. Column and row-wise range of values of a matrix 35 Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colsums, coldiffs, colMedians Examples x <- matrix( runif(100 * 10), ncol = 10 ) colprods(x) rowprods(x) x<-NULL Column and row-wise range of values of a matrix Column and row-wise range of values of a matrix. Description Column and row-wise range of values of a matrix. Usage colrange(x, cont = TRUE) rowrange(x, cont = TRUE) Arguments x A numerical matrix with data. cont If the data are continuous, leave this TRUE and it will return the range of values for each variable (column). If the data are integers, categorical, or if you want to find out the number of unique numbers in each column set this to FALSE. Value A vector with the relevant values. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colMins, colMaxs, rowMins, rowMaxs, nth, colMedians, colVars, sort_mat 36 Column and row-wise ranks Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) a1 <- colrange(x) a2 <- apply(x, 2, function(x) diff( range(x)) ) all.equal(a1, a2) a1 <- rowrange(x) a2 <- apply(x, 1, function(x) diff( range(x)) ) all.equal(a1, a2) x<-a1<-a2<-NULL Column and row-wise ranks Column and row-wise ranks Description Column and row-wise ranks. Usage colRanks(x,method = "average",descending = FALSE,stable = FALSE, parallel = FALSE) rowRanks(x,method = "average",descending = FALSE,stable = FALSE, parallel = FALSE) Arguments x A mumerical matrix with the data. parallel A boolean value for parallel version. method a character string for choosing method. Must be one of "average", "min", "max", "first". descending A boolean value (TRUE/FALSE) for sorting the vector in descending order. By default sorts the vector in ascending. stable A boolean value (TRUE/FALSE) for choosing a stable sort algorithm. Stable means that discriminates on the same elements. Only for the method "first". Details For each column or row of a matrix the ranks are calculated and they are returned. The initial matrix is gone. Value A matrix with the column or row-wise ranks. Column and row-wise Shuffle Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Rank, correls Examples x <- matrnorm(100, 10) a1 <- colRanks(x) a2 <- apply(x, 2, rank) b1 <- rowRanks(x) b2 <- apply(x, 1, rank) x<-a1<-a2<-b1<-b2<-NULL Column and row-wise Shuffle Column and row-wise Shuffle Description Column and row-wise shuffle of a matrix. Usage colShuffle(x) rowShuffle(x) Arguments x A matrix with the data. Details The functions is written in C++ in order to be as fast as possible. Value A vector with the column/row Shuffle. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. 37 38 Column and row-wise sums of a matrix See Also med, colVars, colMeans (buit-in R function) Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) system.time( colShuffle(x) ) system.time( rowShuffle(x) ) x<-NULL Column and row-wise sums of a matrix Column and row-wise sums of a matrix Description Column and row-wise sums of a matrix. Usage colsums(x,indices = NULL, parallel = FALSE) rowsums(x,indices = NULL, parallel = FALSE) Arguments x A numerical matrix with data. indices An integer vector with the indices to sum the columns/rows. parallel Do you want to do it in parallel in C++? TRUE or FALSE. Doens’t work with argument "indices". Value A vector with sums. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colMedians, colmeans, colVars Column and row-wise tabulate 39 Examples x <- matrix(rpois(500 * 100, 10),ncol = 100) x1 <- colsums(x) x2 <- colSums(x) all.equal(x1,x2) x1 <- rowsums(x) x2 <- rowSums(x) all.equal(x1,x2) x<-x1<-x2<-NULL Column and row-wise tabulate Column and row-wise tabulate Description Column and row-wise tabulate of a matrix. Usage colTabulate(x, max_number = max(x)) rowTabulate(x, max_number = max(x)) Arguments x An integer matrix with the data. The numbers must start from 1, i.e. 1, 2, 3, 4,... No zeros are allowed. Anything else may cause a crash. max_number The maximum value of vector x. If you know which is the max number use this argument for faster results or by default max(x). Details The functions is written in C++ in order to be as fast as possible. Value A matrix where in each column the command "tabulate" has been performed. The number of rows of the returned matrix will be equal to the max_number if given. Otherwise, the functions will find this number. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colShuffle, colVars, colmeans 40 Column and row-wise variances and standard deviations Examples x <- matrix( rbinom(100 * 100, 4, 0.5), ncol = 100 ) system.time( colTabulate(x) ) x <- t(x) system.time( rowTabulate(x) ) x<-NULL Column and row-wise variances and standard deviations Column and row-wise variances and standard deviations of a matrix Description Column and row-wise variances and standard deviations of a matrix Usage colVars(x, suma = NULL, std = FALSE, parallel = FALSE) rowVars(x, suma = NULL, std = FALSE) groupcolVars(x, ina, std = FALSE) Arguments x A matrix with the data. suma If you already have the column sums vector supply it, otherwise leave it NULL. ina A numerical vector specifying the groups. If you have numerical values, do not put zeros, but 1, 2, 3 and so on. std A boolean variable specyfying whether you want the variances (FALSE) or the standard deviations (TRUE) of each column. parallel Should parallel implentations take place in C++? The default value is FALSE. Details We found this in stackoverflow and was created by David Arenburg. We then modified the function to match the sums type formula of the variance, which is faster. Value A vector with the column variances or standard deviations. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Column and rows-wise mean absolute deviations 41 See Also colmeans, colMedians, colrange Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) a2 <- colVars(x) groupcolVars( as.matrix(iris[, 1:4]), as.numeric(iris[, 5]) ) x<-a2<-NULL Column and rows-wise mean absolute deviations Column and row-wise mean absolute deviations Description Column and row-wise mean absolute deviations. Usage colMads(x,parallel = FALSE) rowMads(x,parallel = FALSE) Arguments x A matrix with the data. parallel A boolean value for parallel version. Details The functions is written in C++ in order to be as fast as possible. Value A vector with the column-wise mean absolute deviations. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also colMedians, rowMedians, colVars, colmeans, colMeans (buit-in R function) 42 Column-row wise minima and maxima of two matrices Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) system.time( a <- colMads(x) ) x<-NULL Column-row wise minima and maxima of two matrices Column-row wise minima and maxima of two matrices Description Column-row wise minima and maxima of two matrices. Usage colPmax(x, y) colPmin(x, y) Arguments x A numerical vector with numbers. y A numerical vector with numbers. Details The parallel minima or maxima are returned. This are the same as the base functions pmax and pmin. Value A numerical vector/matrix with numbers, whose length is equal to the length of the initital matrices containing the maximum or minimum between each pair. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also sort_mat, Sort, colMins, colMaxs, colMedians Column-wise differences 43 Examples x <- matrix(rnorm(100),10,10) y <- matrix(rnorm(100),10,10) colPmax(x, y) colPmin(x, y) x<-y<-NULL Column-wise differences Column-wise differences Description Column-wise differences. Usage coldiffs(x) Arguments A matrix with numbers. x Details This function simply does this function x[, -1] - x[, -k], where k is the last column of the matrix x. But it does it a lot faster. That is, 2nd column - 1st column, 3rd column - 2nd column, and so on. Value A matrix with one column less containing the differences between the successive columns. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Dist, dista, colmeans Examples x <- matrix( rnorm(50 * 10), ncol = 10 ) coldiffs(x) x<-NULL 44 Column-wise kurtosis and skewness coefficients Column-wise kurtosis and skewness coefficients Column-wise kurtosis and skewness coefficients Description Column-wise kurtosis and skewness coefficients. Usage colkurtosis(x, pvalue = FALSE) colskewness(x, pvalue = FALSE) Arguments x A matrix with the data, where the rows denote the samples and the columns are the variables. pvalue If you want a hypothesis test that the skewness or kurtosis are significant set this to TRUE. This checks whether the skewness is significantly different from 0 and whether the kurtosis is significantly different from 3. Details The skewness and kurtosis coefficients are calculated. For the skewness coefficient we use the sample unbiased version of the standard deviation. For the kurtosis, we do not subtract 3. Value If "pvalue" is FALSE, a vector with the relevant coefficient. Otherwise a matrix with two columns. The kurtosis or skewness coefficient and the p-value from the hypothesis test that they are significantly different from 3 or 0 respectively. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also skew, skew.test2, colMedians, colmeans, colVars, sftests Column-wise matching coefficients 45 Examples ## 200 variables, hence 200 F-tests will be performed x = matrix( rnorm(200 * 50), ncol = 50 ) ## 200 observations in total system.time( colkurtosis(x) ) system.time( colskewness(x) ) x <- NULL Column-wise matching coefficients Column-wise matching coefficients Description Column-wise matching coefficients. Usage match.coefs(x, y = NULL, ina, type = "jacc") Arguments x A matrix with the data, where the rows denote the samples and the columns are the variables. y A second matrix with the data of the second group. If this is NULL (default value) then the argument ina must be supplied. Notice that when you supply the two matrices the procedure is two times faster. ina A numerical vector with 1s and 2s indicating the two groups. Be careful, the function is designed to accept only these two numbers. In addition, if your "y" is NULL, you must specify "ina". type This denotes the type of matching coefficient to calculate. For the Jaccard index put "jacc". For the simple matching coefficient put "smc" or else both of them will be calculated. Details Two matrices are given as imput and for each column matching coefficients are calculated, either the Jaccard or the simple matching coefficient or both. Value A matrix with one or two columns, depending on the type you have specified. If you specify "both", there will be two columns, if you specify "jacc" or "smc" then just one column. 46 Column-wise minimum and maximum Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also odds, colTabulate Examples x y a x y <<<<<- matrix(rbinom(400 * 10, 1, 0.5), ncol = 10) matrix(rbinom(400 * 10, 1, 0.5), ncol = 10) match.coefs(x, y, type = "both") NULL NULL Column-wise minimum and maximum Column-wise minimum and maximum of a matrix Description Column-wise minimum and maximum of a matrix. Usage colMins(x, value = FALSE, parallel = FALSE) colMaxs(x, value = FALSE, parallel = FALSE) colMinsMaxs(x) Arguments x A numerical matrix with data. value If the value is FALSE it returns the indices of the minimum/maximum, otherwise it returns the minimum and maximum values. parallel Do you want to do it in parallel in C++? TRUE or FALSE. The parallel will return the minimum/maximum value only. It will never return the indices. Value A vector with the relevant values. Column-wise MLE of some univariate distributions Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowMaxs, nth, colrange, colMedians, colVars, sort_mat Examples x <- matrix( rnorm(100 * 200), ncol = 200 ) s1 <- colMins(x) s2 <- apply(x, 2, min) s1 <- colMaxs(x) s2 <- apply(x, 2, max) s1 <- colMinsMaxs(x) s2 <- c(apply(x, 2, min), apply(x, 2, max)) x<-s1<-s2<-NULL Column-wise MLE of some univariate distributions Column-wise MLE of some univariate distributions Description Column-wise MLE of some univariate distributions. Usage colexpmle(x) colexp2.mle(x) colgammamle(x, tol = 1e-07) colinvgauss.mle(x) collaplace.mle(x) collindley.mle(x) colmaxboltz.mle(x) colnormal.mle(x) colpareto.mle(x) colpois.mle(x) colrayleigh.mle(x) colvm.mle(x, tol = 1e-07) colweibull.mle(x, tol = 1e-09, maxiters = 100, parallel = FALSE) colnormlog.mle(x) 47 48 Column-wise MLE of some univariate distributions Arguments x A numerical matrix with data. Each column refers to a different vector of observations of the same distribution. For exponential, 2 parameter exponential, Weibull, gamma, inverse Gaussian, Maxwell-Boltzman, Lindley, Rayleigh and Pareto distributions, the numbers must be greater than zero. For the Poisson and geometric distributions, the numbers must be integers, 0, 1, 2,... For the Normal and Laplace distribution the numbers can take any value. The von Mises distribution takes values beween 0 and 2 * pi (radians). tol The tolerance value to terminate the Newton-Fisher algorithm. maxiters The maximum number of iterations to implement. parallel Do you want to calculations to take place in parallel? The default value is FALSE Details For each column, the same distribution is fitted and its parameter and log-likelihood are computed. Value A matrix with two, three or five (for the colnormlog.mle) columns. The first one or the first two contain the parameter(s) of the distribution and the second or third column the relvant log-likelihood. Author(s) Michail Tsagris and Stefanos Fafalios R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Stefanos Fafalios <stefanosfafalios@gmail.com> References Kalimuthu Krishnamoorthy, Meesook Lee and Wang Xiao (2015). Likelihood ratio tests for comparing several gamma distributions. Environmetrics, 26(8):571-583. N.L. Johnson, S. Kotz \& N. Balakrishnan (1994). Continuous Univariate Distributions, Volume 1 (2nd Edition). N.L. Johnson, S. Kotz \& N. Balakrishnan (1970). Distributions in statistics: continuous univariate distributions, Volume 2 Sharma V. K., Singh S. K., Singh U. \& Agiwal V. (2015). The inverse Lindley distribution: a stress-strength reliability model with application to head and neck cancer data. Journal of Industrial and Production Engineering, 32(3): 162-173. See Also vm.mle, poisson.mle, normal.mle, gammamle Column-wise true/false value Examples x a b x <<<<- matrix(rnorm(1000 * 50), ncol = 50) colnormal.mle(x) collaplace.mle(x) NULL Column-wise true/false value Column-wise true/false value of a matrix Description Column-wise true/false value of a matrix. Usage colTrue(x) colFalse(x) colTrueFalse(x) Arguments x A logical matrix with data. Value An integer vector where item "i" is the number of the true/false values of "i" column. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowFalse, nth, colrange, colMedians, colVars, sort_mat, rowTrue Examples x <- matrix(as.logical(rbinom(100*100,1,0.5)),100,100) s1 <- colTrue(x) s1 <- colFalse(x) s1 <- colTrueFalse(x) x<-s1<-NULL 49 50 Column-wise uniformity Watson test for circular data Column-wise uniformity Watson test for circular data Column-wise uniformity tests for circular data Description Column-wise uniformity tests for circular data. Usage colwatsons(u) Arguments u A numeric matrix containing the circular data which are expressed in radians. Each column is a different sample. Details These tests are used to test the hypothesis that the data come from a circular uniform distribution. The Kuiper test is much more time consuming and this is why it not implemented yet. Once we figure out a way to make it fast, we will incldue it. Value A matrix with two columns, the value of the test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Jammalamadaka, S. Rao and SenGupta, A. (2001). Topics in Circular Statistics, pg. 153-55 (Kuiper’s test) & 156-157 (Watson’s test). See Also watson, vmf.mle, rvonmises Examples x <- matrix( rvonmises(n = 50 * 10, m = 2, k = 0), ncol = 10 ) colwatsons(x) x <- NULL Column-wise Yule’s Y (coefficient of colligation) 51 Column-wise Yule’s Y (coefficient of colligation) Column-wise Yule’s Y (coefficient of colligation) Description Column-wise Yule’s Y (coefficient of colligation). Usage col.yule(x, y = NULL, ina) Arguments x A matrix with 0 and 1. Every column refers to a different sample or variable. y A second matrix, of the same dimensions as x, with 0 and 1. Every column refers to a different sample or variable. ina If y is NULL, ina must be specified. This is a numeric vector with 1s and 2s, indicating the group of each row. Details Yule’s coefficient of colligation is calculated for every column. Value A vector with Yule’s Y, one for every column of x is returned. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Yule G. Udny (1912). On the Methods of Measuring Association Between Two Attributes. Journal of the Royal Statistical Society, 75(6):579-652. See Also yule, odds Examples x <- matrix(rbinom(300 * 10, 1, 0.5), ncol = 10) ina <- rep(1:2, each = 150) col.yule( x, ina = ina ) 52 Correlation based forward regression Correlation based forward regression Correlation based forward regression. Description Correlation based forward regression. Usage cor.fsreg(y, x, threshold = 0.05, tolb = 2, tolr = 0.02, stopping = "BIC") Arguments y A numerical vector. x A matrix with data, the predictor variables. threshold The significance level, set to 0.05 by default. Bear in mind that the logarithm of it is used, as the logarithm of the p-values is calculated at every point. This will avoid numerical overflows and small p-values, less than the machine epsilon, being returned as zero. tolb If we see only the significane of the variables, many may enter the linear regression model. For this reason, we also use the BIC as a way to validate the inclusion of a candidate variable. If the BIC difference between two successive models is less than the tolerance value, the variable will not enter the model, even if it statistically significant. Set it to 0 if you do not want this extra check. tolr This is an alternative to the BIC change and it uses the adjusted coefficient of determination. If the increase in the adjusted R2 is more than the tolr continue. stopping This refers to the type of extra checking to do. If you want the BIC check, set it to "BIC". If you want the adjusted R2 check set this to "ar2". Or, if you want both of them to take place, both of these criteria to be satisfied make this "BICR2". Details The forward regression tries one by one the variables using the F-test, basically partial F-test every time for the latest variable. This is the same as testing the significance of the coefficient of this latest enetered variable. Alternatively the correlation can be used and this case the partial correlation coefficient. There is a direct relationship between the t-test statistic and the partial correlation coefficient. Now, instead of having to calculate the test statistic, we calculate the partial correlation coefficient. Using Fisher’s z-transform we get the variance imediately. The partial correlation coefficient, using Fisher’s z-transform, and the partial F-test (or the coefficient’s t-test statistic) are not identical. They will be identical for large sample sizes though. 53 Correlation between pairs of variables Value A matrix with three columns, the index of the selected variables, the logged p-value and the the test statistic value and the BIC or adjusted R2 of each model. In the case of stopping="BICR2" both of these criteria will be returned. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. See Also score.glms, univglms, logistic_only, poisson_only, regression Examples ## 200 variables, hence 200 univariate regressions are to be fitted x <- matrnorm(200, 100) y <- rnorm(200) system.time( cor.fsreg(y, x) ) x <- NULL Correlation between pairs of variables Correlation between pairs of variables Description Correlations between pairs of variables. Usage corpairs(x, y, rho = NULL, logged = FALSE, parallel = FALSE) Arguments x A matrix with real valued data. y A matrix with real valued data whose dimensions match those of x. rho This can be a vector of assumed correlations (equal to the number of variables or the columns of x or y) to be tested. If this is not the case, leave it NULL and only the correlations will be returned. 54 Correlation between pairs of variables logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? This is taken into account only if "rho" is a vector. parallel Should parallel implentations take place in C++? The default value is FALSE. Details The paired correlations are calculated. For each column of the matrices x and y the correlation between them is calculated. Value A vector of correlations in the case of "rho" being NULL, or a matrix with two extra columns, the test statistic and the (logged) p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Lambert Diane (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics. 34(1):1-14. Johnson Norman L., Kotz Samuel and Kemp Adrienne W. (1992). Univariate Discrete Distributions (2nd ed.). Wiley Cohen, A. Clifford (1960). Estimating parameters in a conditional Poisson distribution. Biometrics. 16:203-211. Johnson, Norman L. Kemp, Adrianne W. Kotz, Samuel (2005). Univariate Discrete Distributions (third edition). Hoboken, NJ: Wiley-Interscience. See Also correls, allbetas, mvbetas Examples x <- matrnorm(100, 100) y <- matrnorm(100, 100) system.time( corpairs(x, y) ) a <- corpairs(x, y) x <- NULL y <- NULL 55 Correlations Correlation between a vector and a set of variables Correlations Description Correlation between a vector and a set of variables. Usage correls(y, x, type = "pearson", a = 0.05, rho = 0) groupcorrels(y, x, type = "pearson", ina) Arguments y A numerical vector. x A matrix with the data. type The type of correlation you want. "pearson" and "spearman" are the two supported types for the "correls" because their standard error is easily calculated. For the "groupcorrels" you can also put "kendall" because no hypothesis test is performed in that function. a The significance level used for the confidence intervals. rho The value of the hypothesised correlation to be used in the hypothesis testing. ina A factor variable or a numeric variable idicating the group of each observation. Details The functions uses the built-in function "cor" which is very fast and then includes confidence intervals and produces a p-value for the hypothesis test. Value For the "correls" a matrix with 5 column; the correlation, the p-value for the hypothesis test that each of them is eaqual to "rho", the test statistic and the $a/2%$ lower and upper confidence limits. For the "groupcorrels" a matrix with rows equal to the number of groups and columns equal to the number of columns of x. The matrix contains the correlations only, no statistical hypothesis test is performed. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also allbetas, univglms 56 Covariance and correlation matrix Examples x <- matrnorm(60, 100 ) y <- rnorm(60) r <- cor(y, x) ## correlation of y with each of the xs a <- allbetas(y, x) ## the coefficients of each simple linear regression of y with x b <- correls(y, x) ina <- rep(1:2, each = 30) b2 <- groupcorrels(y, x, ina = ina) x <- NULL Covariance and correlation matrix Fast covariance and correlation matrix calculation Description Fast covariance and correlation matrix calculation. Usage cova(x) cora(x) Arguments x A matrix with data. It has to be matrix, if it is data.frame for example the function does not turn it into a matrix. Details The calculations take place faster than the built-in functions cor as the number of variables increases. For a few tens of variables. This is true if the number of variables is high, say from 500 and above. The "cova" on the other hand is always faster. For the correlation matrix we took the code from here https://stackoverflow.com/questions/18964837/fast-correlation-in-r-using-c-and-parallelization/18965892#18965892 Value The covariance or correlation matrix. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Cox confidence interval for the ratio of two Poisson variables 57 See Also colVars, cor, cov Examples x <- matrnorm( 100, 100 ) s1 <- cov(x) s2 <- cova(x) all.equal(s1, s2) x <- NULL Cox confidence interval for the ratio of two Poisson variables Cox confidence interval for the ratio of two Poisson variables Description Cox confidence interval for the ratio of two Poisson variables. Usage cox.poisrat(x, y, alpha = 0.05) col.coxpoisrat(x, y, alpha = 0.05) Arguments x A numeric vector or a matrix with count data. y A numeric vector or a matrix with count data. alpha The 1 - confidence level. The default value is 0.05. Details Cox confidence interval for the ratio of two Poisson means is calculated. Value For the cox.poisrat a vector with three elements, the ratio and the lower and upper confidence interval limits. For the col.coxpoisrat a matrix with three columns, the ratio and the lower and upper confidence interval limits. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> 58 Cross-Validation for the k-NN algorithm References Krishnamoorthy K., Peng J. and Zhang D. (2016). Modified large sample confidence intervals for Poisson distributions: Ratio, weighted average, and product of means. Communications in Statistics-Theory and Methods, 45(1): 83-97. See Also correls, Table Examples x <- rpois(100, 10) y <- rpois(100, 10) cox.poisrat(x, y) Cross-Validation for the k-NN algorithm Cross-Validation for the k-NN algorithm Description Cross-Validation for the k-NN algorithm. Usage knn.cv(folds = NULL, nfolds = 10, stratified = FALSE, seed = FALSE, y, x, k, dist.type = "euclidean", type = "C", method = "average", freq.option = 0, pred.ret = FALSE, mem.eff = FALSE) Arguments folds A list with the indices of the folds. nfolds The number of folds to be used. This is taken into consideration only if "folds" is NULL. stratified Do you want the folds to be selected using stratified random sampling? This preserves the analogy of the samples of each group. Make this TRUE if you wish, but only for the classification. If you have regression (type = "R"), do not put this to TRUE as it will cause problems or return wrong results. seed If you set this to TRUE, the same folds will be created every time. y A vector of data. The response variable, which can be either continuous or categorical (factor is acceptable). x A matrix with the available data, the predictor variables. k A vector with the possible numbers of nearest neighbours to be considered. dist.type The type of distance to be used, "euclidean" or "manhattan". type Do you want to do classification ("C") or regression ("R")? Cross-Validation for the k-NN algorithm 59 method If you do regression (type = "R"), then how should the predicted values be calculated? Choose among the average ("average"), median ("median") or the harmonic mean ("harmonic") of the closest neighbours. freq.option If classification (type = "C") and ties occur in the prediction, more than one class have the same number of k nearest neighbours, there are three strategies available. Option 0 selects the first most frequent encountered. Option 1 randomly selects the most frequent value, in the case that there are duplicates. pred.ret If you want the predicted values returned set this to TRUE. mem.eff Boolean value indicating a conservative or not use of memory. Lower usage of memory/Having this option on will lead to a slight decrease in execution speed and should ideally be on when the amount of memory in demand might be a concern. Details The concept behind k-NN is simple. Suppose we have a matrix with predictor variables and a vector with the response variable (numerical or categorical). When a new vector with observations (predictor variables) is available, its corresponding response value, numerical or categorical, is to be predicted. Instead of using a model, parametric or not, one can use this ad hoc algorithm. The k smallest distances between the new predictor variables and the existing ones are calculated. In the case of regression, the average, median, or harmonic mean of the corresponding response values of these closest predictor values are calculated. In the case of classification, i.e. categorical response value, a voting rule is applied. The most frequent group (response value) is where the new observation is to be allocated. This function does the cross-validation procedure to select the optimal k, the optimal number of nearest neighbours. The optimal in terms of some accuracy metric. For the classification it is the percentage of correct classification and for the regression the mean squared error. Value A list including: preds If pred.ret is TRUE the predicted values for each fold are returned as elements in a list. crit A vector whose length is equal to the number of k and is the accuracy metric for each k. Author(s) Marios Dimitriadis R implementation and documentation: Marios Dimitriadis <kmdimitriadis@gmail.com> References Friedman J., Hastie T. and Tibshirani R. (2017). The elements of statistical learning. New York: Springer. Cover TM and Hart PE (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 13(1):21-27. 60 data.frame.to_matrix Tsagris Michail, Simon Preston and Andrew T.A. Wood (2016). Improved classification for compositional data using the α-transformation. Journal of classification 33(2): 243-261. See Also knn, Dist, dista Examples x <- as.matrix(iris[, 1:4]) y <- iris[, 5] mod <- knn.cv(folds = NULL, nfolds = 10, stratified = TRUE, seed = FALSE, y = y, x = x, k = c(3, 4), dist.type = "euclidean", type = "C", method = "average", freq.option = 0, pred.ret = FALSE, mem.eff = FALSE) data.frame.to_matrix Convert a dataframe to matrix Description Convert a dataframe to matrix. Usage data.frame.to_matrix(x,col.names = NULL,row.names = NULL) Arguments x A Numeric matrix with data and NAs. col.names A boolean value for keeping the colnames for argument x or a character vector for the new colnames. row.names A boolean value for keeping the rownames for argument x or a character vector for the new rownames. Details This functions converts a dataframe to matrix. Even if there are factors, the function converts them into numerical values. Attributes are not allowed for now. Value A matrix wich has the numrical values from the dataframe. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> Density of the multivariate normal and t distributions 61 See Also Match, is.symmetric, permutation Examples data.frame.to_matrix(iris) Density of the multivariate normal and t distributions Density of the multivariate normal and t distributions Description Density of the multivariate normal and t distributions. Usage dmvnorm(x, mu, sigma, logged = FALSE) dmvt(x, mu, sigma, nu, logged = FALSE) Arguments x A numerical matrix with the data. The rows correspond to observations and the columns to variables. mu The mean vector. sigma The covariance matrix. nu The degrees of freedom for the multivariate t distribution. logged Should the logarithm of the density be returned (TRUE) or not (FALSE)? Details The (log) density of the multivariate normal distribution is calculated for given mean vector and covariance matrix. Value A numerical vector with the density values calculated at each vector (row of the matrix x). Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also rmvnorm, rmvt, mvnorm.mle, iag.mle 62 Design Matrix Examples x <- matrnorm(100, 20) mu <- colmeans(x) s <- cova(x) a1 <- dmvnorm(x, mu, s) a2 <- dmvt(x, mu, s, 1) x <- NULL Design Matrix Design Matrix Description Design Matrix. Usage design_matrix(x, ones = TRUE) Arguments x A character vector or a factor type vector or a dataframe. Do not supply a numerical vector. ones A boolean variable specifying whether to include the ones in the design matrix or not. The default value is TRUE. Details This function implements the R’s "model.matrix" function and is used only when the x is a factor/charactervector or Dataframe. Value Returns the same matrix with model.matrix. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> See Also model.matrix 63 Diagonal Matrix Examples a <- design_matrix( iris[, 5] ) b <- model.matrix( ~ iris[,5] ) ## R's built-in function all.equal(as.vector(a),as.vector(b)) ## true a<-b<-NULL Diagonal Matrix Diagonal Matrix Description Fill the diagonal of a matrix or create a diagonal and initialize it with a specific value. Usage Diag.fill(x,v=0) Diag.matrix(len,v=0) Arguments x A matrix with data. len Number of columns or rows. v Value or vector to initialize the diagonal of a matrix.By default "v=0". Value Diag.fill returns a diagonal matrix where all the elements in the diagonal are equal to "v". Diag.matrix returns a diagonal matrix where has dimension "len,len" and all the elements in the diagonal are equal to "v". It is fast for huge matrices with dimensions more than [row,col] = [500,500] Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, colFalse, nth, rowrange, rowMedians, rowVars, sort_mat, colTrue 64 Distance between vectors and a matrix Examples x <- matrix(rbinom(100*100,1,0.5),100,100) f f f f <<<<- Diag.fill(x,1) Diag.fill(x,1:100) ##equals to diag(x)<-1:100 Diag.matrix(100,1) ##equals to diag(1,100,100) Diag.matrix(100,1:100) ##equals to diag(1:100,100,100) f<-x<-NULL Distance between vectors and a matrix Distance between vectors and a matrix Description Distance between vectors and a matrix. Usage dista(xnew,x,type = "euclidean",k=0,index=FALSE,trans = TRUE,square = FALSE) Arguments xnew A matrix with some data or a vector. x A matrix with the data, where rows denotes observations (vectors) and the columns contain the variables. type This can be either "euclidean" or "manhattan". k Should the k smaller distances or their indices be returned? If k > 0 this will happen. index In case k is greater than 0, you have the option to get the indices of the k smallest distances. trans Do you want the returned matrix to be transposed? TRUE or FALSE. square If you choose "euclidean" as the method, then you can have the optino to return the squared Euclidean distances by setting this argument to TRUE. Details The target of this function is to calculate the distances between xnew and x without having to calculate the whole distance matrix of xnew and x. The latter does extra calculaitons, which can be avoided. Value A matrix with the distances of each xnew from each vector of x. The number of rows of the xnew and and the number of columns of xnew are the dimensions of this matrix. 65 Distance correlation Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also mahala, Dist, total.dist, total.dista Examples xnew x <a <b <b <sum( <- as.matrix( iris[1:10, 1:4] ) as.matrix( iris[-c(1:10), 1:4] ) dista(xnew, x) as.matrix( dist( rbind(xnew, x) ) ) b[ 1:10, -c(1:10) ] abs(a - b) ) ## see the time x <- matrix( rnorm(1000 * 4), ncol = 4 ) system.time( dista(xnew, x) ) system.time( as.matrix( dist( rbind(xnew, x) ) ) ) x<-b<-a<-xnew<-NULL Distance correlation Distance correlation Description Distance correlation. Usage dcor(x, y) bcdcor(x, y) Arguments x A numerical matrix. y A numerical matrix. Details The distance correlation or the bias corrected distance correlation of two matrices is calculated. The latter one is used for the hypothesis test that the distance correlation is zero (see dcor.ttest). 66 Distance matrix Value The value of the distance correlation of the bias corrected distance correlation. Author(s) Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References G.J. Szekely, M.L. Rizzo and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6):2769-2794. See Also dcov, dcor.ttest, edist Examples x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) dcor(x, y) bcdcor(x, y) x<-y<-NULL Distance matrix Distance matrix Description Distance matrix. Usage Dist(x, method = "euclidean", square = FALSE, p = 0,vector = FALSE) vecdist(x) Arguments x A matrix with data. The distances will be calculated between pairs of rows. In the case of vecdist this is a vector. method This is either "euclidean", "manhattan", "canberra1", "canberra2", "minimum", "maximum", "minkowski","bhattacharyya", "hellinger", "total_variation" or "kullback_leibler/jensen_shannon". The last two options are basically the same. Distance variance and covariance 67 square If you choose "euclidean" or "hellinger" as the method, then you can have the option to return the squared Euclidean distances by setting this argument to TRUE. p This is for the the Minkowski, the power of the metric. vector For return a vector instead a matrix. Details The distance matrix is computer with an extra argument for the Euclidean distances. Value A square matrix with the pairwise distances. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> References Mardia K. V., Kent J. T. and Bibby J. M. (1979). Multivariate Analysis. Academic Press. See Also dista, colMedians Examples x <- matrix(rnorm(50 * 10), ncol = 10) a1 <- Dist(x) a2 <- as.matrix( dist(x) ) x<-a1<-a2<-NULL Distance variance and covariance Distance variance and covariance Description Distance variance and covariances. Usage dvar(x) dcov(x, y) 68 Eigenvalues and eigenvectors in high dimensional principal component analysis Arguments x A numerical matrix. y A numerical matrix. Details The distance variance of a matrix or the distance covariance of two matrices is calculated. Value The distance covariance or distance variance. Author(s) Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References G.J. Szekely, M.L. Rizzo and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6):2769-2794. See Also dcor, edist Examples x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) dcov(x, y) dvar(x) Eigenvalues and eigenvectors in high dimensional principal component analysis Eigenvalues in high dimensional principal component analysis Description Eigenvalues in high dimensional (n«p) principal component analysis. Usage hd.eigen(x, center = TRUE, scale = FALSE, k = NULL, vectors = FALSE) Eigenvalues and eigenvectors in high dimensional principal component analysis 69 Arguments x A numerical n × p matrix with data where the rows are the observations and the columns are the variables. center Do you want your data centered? TRUE or FALSE. scale Do you want each of your variables scaled, i.e. to have unit variance? TRUE or FALSE. k If you want a specific number of eigenvalues and eigenvectors set it here, otherwise all eigenvalues (and eigenvectors if requested) will be returned. vectors Do you want the eigenvectors be returned? By dafault this is FALSE. Details When n«p, at most the first n eigenvalues are non zero. Hence, there is no need to calculate the other p-n zero eigenvalues. When center is TRUE, the eigenvalues of the covariance matrix are calculated. When both the center and scale is TRUE the eigenvalues of the correlation matrix are calculated. One or more eigenvectors (towards the end) will be 0. In general the signs might be the opposite than R’s, but this makes no difference. Value A list including: values A vector with the n (or first k) eigenvalues. The divisor in the crossproduc matrix is n-1 and not n. vectors A matrix of p × n or p × k eigenvectors. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr>. See Also rmdp Examples x <- matrnorm( 40, 100) a <- hd.eigen(x, FALSE, FALSE) b <- prcomp(x, center = FALSE, scale = FALSE) a b$sdev^2 x <- NULL 70 Energy distance between matrices Energy distance between matrices Energy distance between matrices Description Energy distance between matrices. Usage edist(x, y=NULL) Arguments x A matrix with numbers or a list with matrices. y A second matrix with data. The number of comlumns of this matrix must be the same with the matrix x. The number of rows can be different. Details This calculates the energy distance between two matrices. It will work even for tens of thousands of rows, it will just take some time. See the references for more information. If you have many matrices and want to calculate the distance matrix, then put them in a list and use eDist. Value If "x" is matrix, a numerical value, the energy distance. If "x" is list, a matrix with all pairwsie distances of the matrices. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. References Szekely G. J. and Rizzo M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5). Szekely G. J. (2000) Technical Report 03-05, E-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University. Sejdinovic D., Sriperumbudur B., Gretton A. and Fukumizu, K. (2013). Equivalence of distancebased and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5), 2263-2291. See Also dvar, total.dist, total.dista, Dist, dista 71 Equality of objects Examples x <- as.matrix( iris[1:50, 1:4] ) y <- as.matrix( iris[51:100, 1:4] ) edist(x, y) z <- as.matrix(iris[101:150, 1:4]) a <- list() a[[ 1 ]] <- x a[[ 2 ]] <- y a[[ 3 ]] <- z edist(a) x<-y<-z<-a<-NULL Equality of objects Equality of objects Description Equality of objects. Usage all_equals(x,y,round_digits = FALSE,without_attr=FALSE,fast_result=FALSE) Arguments x A Matrix, List, Dataframe or Vector. y A Matrix, List, Dataframe or Vector. round_digits The digit for rounding numbers. without_attr A boolean value (TRUE/FALSE) for deleting attributes. Be carefull although because some atributes are very important for you item. fast_result A boolean value (TRUE/FALSE) for using just identical.But you can combine only with round_digits argument. Value A boolean (TRUE/FALSE) value which represents if the items x and y are equal. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Match, mvbetas, correls, univglms, colsums, colVars 72 Estimation of an AR(1) model Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) y <- matrix( rnorm(100 * 100), ncol = 100 ) all_equals(x,y) all_equals(x, x) Estimation of an AR(1) model Estimation of an AR(1) model Description Estimation of an AR(1) model. Usage ar1(y, method = "cmle") colar1(y, method = "cmle") Arguments y For the case of ar1 this is a vector of time series. For the case of colar1 this is a matrix where weach column represents a time series. method This can be either "cmle" for conditional maximum likelihood or "yw" for the Yule-Walker equations. Details Instead of the classical MLE for the AR(1) model which requires numerical optimsation (NewtonRaphson for example) we estimate the parameters of the AR(1) model using conditional maximum likelihood. This procedure is described in Chapter 17 in Lee (2006). In some, it assumes that the first observation is deterministic and hence conditioning on that observation, there is a closed form solution for the parameters. The second alternative is to use the method of moments and hence the Yule-Walker equations. Value param For the case of ar1 this is a vector with three elements, the constant term, the φ term (lag coefficient) and the variance. For the case of colar1 this is a matrix with three columns, eahc of which carries the same aforementioned elements. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Estimation of the Box-Cox transformation References http://econ.nsysu.edu.tw/ezfiles/124/1124/img/Chapter17_MaximumLikelihoodEstimation.pdf See Also rm.lines, varcomps.mle, rm.anovas Examples y <- as.vector(lh) ar1(y) ar(y, FALSE, 1, "ols") ar1(y, method = "yw") ar(y, FALSE, 1, "yw") a1 <- colar1(cbind(y, y) ) b1 <- colar1(cbind(y, y), method = "yw") Estimation of the Box-Cox transformation Estimation of the Box-Cox transformation Description Estimation of the Box-Cox transformation. Usage bc(x, low = -1, up = 1) Arguments x A numerical vector with strictly positive values. low The lowest value to search for the best λ parameter. up The highest value to search for the best λ parameter. Details The functions estimates the best λ in the Box-Cox power transformation. Value The optimal value of λ. 73 74 Exponential empirical likelihood for a one sample mean vector hypothesis testing Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Box George E. P. and Cox D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B, 26 (2):211-252. See Also correls, auc Examples x <- exp(rnorm(1000)) bc(x) Exponential empirical likelihood for a one sample mean vector hypothesis testing Exponential empirical likelihood for a one sample mean vector hypothesis testing Description Exponential empirical likelihood for a one sample mean vector hypothesis testing. Usage mv.eeltest1(x, mu, tol = 1e-06, R = 1) Arguments x A matrix containing Euclidean data. mu The hypothesized mean vector. tol The tolerance value used to stop the Newton-Raphson algorithm. R The number of bootstrap samples used to calculate the p-value. If R = 1 (default value), no bootstrap calibration is performed Details Multivariate hypothesis test for a one sample mean vector. This is a non parametric test and it works for univariate and multivariate data. The p-value is currently computed only asymptotically (no bootstrap calibration at the moment). Exponential empirical likelihood hypothesis testing for two mean vectors 75 Value A list including: p The estimated probabiities. lambda The value of the Lagrangian parameter λ. iters The number of iterations required by the newton-Raphson algorithm. info The value of the log-likelihood ratio test statistic along with its corresponding p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Giorgos Athineou <athineou@csd.uoc.gr> References Jing Bing-Yi and Andrew TA Wood (1996). Exponential empirical likelihood is not Bartlett correctable. Annals of Statistics 24(1): 365-369. Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press. See Also james, mv.eeltest2 Examples x <- Rfast::rmvnorm(100, numeric(10), diag( rexp(10, 0.5) ) ) mv.eeltest1(x, numeric(10) ) Exponential empirical likelihood hypothesis testing for two mean vectors Exponential empirical likelihood hypothesis testing for two mean vectors Description Exponential empirical likelihood hypothesis testing for two mean vectors. Usage mv.eeltest2(y1, y2, tol = 1e-07, R = 0) 76 Exponential empirical likelihood hypothesis testing for two mean vectors Arguments y1 A matrix containing the Euclidean data of the first group. y2 A matrix containing the Euclidean data of the second group. tol The tolerance level used to terminate the Newton-Raphson algorithm. R If R is 0, the classical chi-sqaure distribution is used, if R = 1, the corrected chisqaure distribution (James, 1954) is used and if R = 2, the modified F distribution (Krishnamoorthy and Yanping, 2006) is used. If R is greater than 3 bootstrap calibration is performed. Details Exponential empirical likelihood is a non parametric hypothesis testing procedure for one sample. The generalisation to two (or more samples) is via searching for the mean vector that minimises the sum of the two test statistics. Value A list including: test The empirical likelihood test statistic value. modif.test The modified test statistic, either via the chi-square or the F distribution. pvalue The p-value. iters The number of iterations required by the newton-Raphson algorithm. mu The estimated common mean vector. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Giorgos Athineou <athineou@csd.uoc.gr> References Jing Bing-Yi and Andrew TA Wood (1996). Exponential empirical likelihood is not Bartlett correctable. Annals of Statistics 24(1): 365-369. G.S. James (1954). Tests of Linear Hypothese in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19-43. Krishnamoorthy K. and Yanping Xia (2006). On Selecting Tests for Equality of Two Normal Mean Vectors. Multivariate Behavioral Research 41(4): 533-548. Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press. Amaral G.J.A., Dryden I.L. and Wood A.T.A. (2007). Pivotal bootstrap methods for k-sample problems in directional statistics and shape analysis. Journal of the American Statistical Association 102(478): 695-707. Preston S.P. and Wood A.T.A. (2010). Two-Sample Bootstrap Hypothesis Tests for Three-Dimensional Labelled Landmark Data. Scandinavian Journal of Statistics 37(4): 568-587. FBED variable selection method using the correlation 77 Tsagris M., Preston S. and Wood A.T.A. (2017). Nonparametric hypothesis testing for equality of means on the simplex. Journal of Statistical Computation and Simulation, 87(2): 406-422. See Also james, mv.eeltest1 Examples mv.eeltest2( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 0 ) mv.eeltest2( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 1 ) FBED variable selection method using the correlation FBED variable selection method using the correlation Description FBED variable selection method using the correlation. Usage cor.fbed(y, x, alpha = 0.05, K = 0) Arguments y The response variable, a numeric vector. x A matrix with the data, where the rows denote the samples and the columns are the variables. alpha The significance level, set to 0.05 by default. K The number of times to repeat the process. The default value is 0. Details FBED stands for Forward Backward with Earcly Dropping. It is a variation of the classical forward selection, where at each step, only the statistically significant variables carry on. The rest are dropped. The process stops when no other variables can be selected. If K = 1, the process is repeated testing sequentially again all those that have not been selected. If K > 1, then this is repeated. In the end, the backward selection is performed to remove any falsely included variables. This backward phase has not been implemented yet. 78 Find element Value A list including: runtime The duration of the process. res A matrix with the index of the selected variable, their test statistic value and the associated p-value. info A matrix with two columns. The cumulative number of variables selected and the number of tests for each value of K. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Giorgos Borboudakis and Ioannis Tsamardinos (2017). Forward-Backward Selection with Early Dropping. Arxiv preprint: https://arxiv.org/pdf/1705.10770.pdf See Also cor.fsreg, ompr, correls, fs.reg Examples x y a a x <- matrnorm(100, 100) <- rnorm(100) <- cor.fbed(y, x) <- NULL Find element Find element Description Search a value in an unordered vector. Usage is_element(x, key) Arguments x A vector or matrix with the data. key A value to check if exists in the vector x. Find the given value in a hash table 79 Details Find if the key exists in the vector and return returns TRUE/FALSE if the value is been found. If the vector is unordered it is fast but if the vector is ordered then use binary_search. The functions is written in C++ in order to be as fast as possible. Value TRUE/FALSE if the value is been found. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also binary_search (buit-in R function) Examples x <- rnorm(500) key <- x[50] b <- is_element(x, key) Find the given value in a hash table Find the given value in a hash table Description Find the given value in a hash table or list. Usage hash.find(x,key) Arguments x A hash table or list. key The key for searching the table. Details This function search the given key. Value If the given key exists return its value else returns 0. 80 Fitted probabilities of the Terry-Bradley model Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> See Also hash.list Examples x <- hash.list(letters,c(1:26)) value <- hash.find(x,"a") x[["a"]]==value Fitted probabilities of the Terry-Bradley model Fitted probabilities of the Terry-Bradley model Description Fitted probabilities of the Terry-Bradley model. Usage btmprobs(x, tol = 1e-09) Arguments x A numerical square, usually not symmetric, matrix with discrete valued data. Each entry is a frequency, to give an example, the number of wins. x[i, j] is the number of wins of home team i against guest team j. x[j, i] is the number of wins of home team j against guest team i. tol The tolerance level to terminate the iterative algorithm. Details It fits a Bradley-Terry model to the given matrix and returns the fitted probabilities only. Value A list including: iters The numbetr of iterations required. probs A vector with probabilities which sum to 1. This is the probability of win for each item (or team in our hypothetical example). Fitting a Dirichlet distribution via Newton-Rapshon 81 Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Bradley R.A. and Terry M.E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika, 39(3/4):324-345. Huang Tzu-Kuo, Ruby C. Weng and Chih-Jen Lin (2006). Generalized Bradley-Terry models and multi-class probability estimates. Journal of Machine Learning Research, 7:85-115. Agresti A. (2002). Categorical Data Analysis (2nd ed). New York: Wiley. See Also g2tests, poisson.anova, anova, poisson_only, poisson.mle Examples x <- matrix( rpois(10 * 10, 10), ncol = 10) ## not the best example though btmprobs(x) Fitting a Dirichlet distribution via Newton-Rapshon Fitting a Dirichlet distribution via Newton-Rapshon Description Fitting a Dirichlet distribution via Newton-Rapshon. Usage diri.nr2(x, type = 1, tol = 1e-07) Arguments x A matrix containing the compositional data. Zeros are not allowed. type Type 1 uses a vectorised version of the Newton-Raphson (Minka, 2012). In high dimensions this is to be preferred. If the data are too concentrated, regardless of the dimensions, this is also to be preferrred. Type 2 uses the regular Newton-Raphson, with matrix multiplications. In small dimensions this can be considerably faster. tol The tolerance level idicating no further increase in the log-likelihood. 82 Floyd-Warshall algorithm Details Maximum likelihood estimation of the parameters of a Dirichlet distribution is performed via Newton-Raphson. Initial values suggested by Minka (2012) are used. Value A list including: loglik The value of the log-likelihood. param The estimated parameters. Author(s) Michail Tsagris and Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Minka Thomas (2012). Estimating a Dirichlet distribution. Technical report. Ng Kai Wang, Guo-Liang Tian, and Man-Lai Tang (2011). Dirichlet and related distributions: Theory, methods and applications. John Wiley & Sons. See Also beta.mle Examples x <- matrix( rgamma(100 * 4, c(5, 6, 7, 8), 1), ncol = 4) x <- x / rowsums(x) diri.nr2(x) Floyd-Warshall algorithm Floyd-Warshall algorithm for shortest paths in a directed graph Description Floyd-Warshall algorithm for shortest paths in a directed graph. Usage floyd(x) Floyd-Warshall algorithm 83 Arguments x The adjacency matrix of a directed graph. A positive number (including) in x[i, j] indicates that there is an arrow from i to j and it also shows the cost of going from i to j. Hence, the algorithm will find not only the shortest path but also the with the smallest cost. A value of NA means that there is no path. Put positive number only, as negative will cause problems. Details The Floyd-Warshall algorithm is designed to find the shortest path (if it exists) between two nodes in a graph. Value A matrix, say z, with 0 and positive numbers. The elements denote the length of the shortest path between each pair of points. If z[i, j] is zero it means that there is no cost from i to j. If z[i, j] has a positive value it means that the length of going from i to j is equal to that value. Author(s) John Burkardt (C++ code) Ported into R and documentation: Manos Papadakis <papadakm95@gmail.com>. References Floyd, Robert W. (1962). Algorithm 97: Shortest Path. Communications of the ACM. 5(6): 345. Warshall, Stephen (1962). A theorem on Boolean matrices. Journal of the ACM. 9 (1): 11-12. https://en.wikipedia.org/wiki/Floyd See Also sort_mat Examples x <- matrix(NA, 10, 10) x[sample(1:100, 10)] <- rpois(10, 3) floyd(x) 84 Forward selection with generalised linear regression models Forward selection with generalised linear regression models Variable selection in generalised linear regression models with forward selection Description Variable selection in generalised linear regression models with forward selection Usage fs.reg(y, ds, sig = 0.05, tol = 2, type = "logistic") Arguments y ds sig tol type The dependent variable. This can either be a binary numeric (0, 1) or a vector with integers (numeric or integer class), count data. The first case is for the binary logistic regression and the second for the Poisson regression. The dataset; provide a matrix where columns denote the variables and the rows the observations. The variables must be continuous, no categorical variables are accepted. Significance level for assessing the p-values significance. Default value is 0.05. The difference bewtween two successive values of the stopping rule. By default this is is set to 2. If for example, the BIC difference between two succesive models is less than 2, the process stops and the last variable, even though significant does not enter the model. If you have a binary dependent variable, put "logistic" or "quasibinomial". If you have percentages, values between 0 and 1, including 0 and or 1, use "quasibinomial" as well. If you have count data put "poisson". Details The classical forward regression is implemented. The difference is that we have an extra step of check. Even if a variable is significant, the BIC of the model (with that variable) is calculated. If the decrease from the previous BIC (of the model without this variable) is less thatn a prespecified by the user value (default is 2) the variable wil enter. This way, we guard somehow against over-fitting. Value A matrix with for columns, the selected variables, the logarithm of their p-value, their test statistic and the BIC of the model with these variables included. If no variable is selected, the matrix is empty. Author(s) Marios Dimitriadis Documentation: Marios Dimitriadis <kmdimitriadis@gmail.com>. G-square test of conditional indepdence 85 See Also cor.fsreg, logistic_only, poisson_only, glm_logistic, glm_poisson Examples set.seed(123) #simulate a dataset with continuous data x <- matrnorm(100, 50) y <- rpois(100, 10) a <- fs.reg(y, x, sig = 0.05, tol = 2, type = "poisson") y <- rbinom(100, 1, 0.5) b <- fs.reg(y, x, sig = 0.05, tol = 2, type = "logistic") x <- NULL G-square test of conditional indepdence G-square test of conditional indepdence Description G-square test of conditional indepdence with and without permutations. Usage g2Test(data, x, y, cs, dc) g2Test_perm(data, x, y, cs, dc, nperm) Arguments data A numerical matrix with the data. The minimum must be 0, otherwise the function can crash or will produce wrong results. x A number between 1 and the number of columns of data. This indicates which variable to take. y A number between 1 and the number of columns of data (other than x). This indicates the other variable whose independence with x is to be tested. cs A vector with the indices of the variables to condition upon. It must be non zero and between 1 and the number of variables. If you want unconditional independence test see g2Test_univariate and g2Test_univariate_perm. If there is an overlap between x, y and cs you will get 0 as the value of the test statistic. dc A numerical value equal to the number of variables (or columns of the data matrix) indicating the number of distinct, unique values (or levels) of each variable. Make sure you give the correct numbers here, otherwise the degrees of freedom will be wrong. 86 G-square test of conditional indepdence nperm The number of permutations. The permutations test is slower than without permutations and should be used with small sample sizes or when the contigency tables have zeros. When there are few variables, R’s "chisq.test" function is faster, but as the number of variables increase the time difference with R’s procedure becomes larger and larger. Details The functions calculates the test statistic of the G2 test of conditional independence between x and y conditional on a set of variable(s) cs. Value A list including: statistic The G2 test statistic. df The degrees of freedom of the test statistic. x The row or variable of the data. y The column or variable of the data. Author(s) Giorgos Borboudakis. The permutation version used a C++ code by John Burkardt. R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. References Tsamardinos, I., & Borboudakis, G. (2010). Permutation testing improves Bayesian network learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 322-337). Springer Berlin Heidelberg See Also g2Test_univariate, g2Test_univariate_perm, correls, univglms Examples nvalues <- 3 nvars <- 10 nsamples <- 5000 data <- matrix( sample( 0:(nvalues - 1), nvars * nsamples, replace = TRUE ), nsamples, nvars ) dc <- rep(nvalues, nvars) g2Test( data, 1, 2, 3, c(3, 3, 3) ) g2Test_perm( data, 1, 2, 3, c(3, 3, 3), 1000 ) dc<-data<-NULL Gaussian regression with a log-link Gaussian regression with a log-link Gaussian regression with a log-link Description Gaussian regression with a log-link. Usage normlog.reg(y, x, tol = 1e-07, maxiters = 100) Arguments y The dependent variable, a numerical variable with non negative numbers. x A matrix or data.frame with the indendent variables. tol The tolerance value to terminate the Newton-Raphson algorithm. maxiters The maximum number of iterations that can take place in the regression. Details A Gaussian regression with a log-link is fitted. Value A list including: i The number of iterations required by the Newton-Raphson loglik The log-likelihood value. deviance The deviance value. be The regression coefficients Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com> See Also normlog.regs, score.glms, prop.regs, allbetas 87 88 Generates random values from a normal and puts them in a matrix Examples y <- abs( rnorm(100) ) x <- matrix( rnorm(100 * 2), ncol = 2) a <- normlog.reg(y, x) b <- glm(y ~ x, family = gaussian(log) ) summary(b) a Generates random values from a normal and puts them in a matrix Generates random values from a normal and puts them in a matrix Description Generates random values from a normal and puts them in a matrix. Usage matrnorm(n, p) Arguments n The sample size, the number of rows the matrix will have. p The dimensionality of the data, the nubmer of columns of the matrix. Details How many times did you have to simulated data from a (standard) normal distribution in order to test something? For example, in order to see the speed of logistic_only, one needs to generate a matrix with predictor variables. The same is true for other similar functions. In sftests, one would like to examine the typer I error of this test under the null hypothesis. By using the Ziggurat method of generating standard normal variates, this function is really fast when you want to generate big matrices. Value An n x p matrix with data simulated from a standard normal distribution. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> See Also rvmf, Rnorm, rmvnorm, rvonmises Get specific columns/rows fo a matrix Examples x <- matrnorm(100, 100) Get specific columns/rows fo a matrix Get specific columns/rows fo a matrix Description Get specific columns/rows of a matrix. Usage columns(x,indices) rows(x,indices) Arguments x A matrix with data. indices An integer vector with the indices. Value A matrix with the specific columns/rows of argumment indices. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowFalse, nth, colrange, colMedians, colVars, sort_mat, rowTrue Examples x <- matrix(runif(100*100),100,100) indices = sample(1:100,50) all.equal(x[,indices],columns(x,indices)) all.equal(x[indices,],rows(x,indices)) x<-indices<-NULL 89 90 Hash - Pair function Hash - Pair function Hash - Pair function Description Hash - Pair function. Usage hash.list(key,x) Arguments key The keys of the given values. x The values. Details This function pairs each item of of key and value make a unique hash table. Value Returns the hash-list table. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> See Also hash.find Examples x <- hash.list(letters,c(1:26)) x[["a"]]==1 Hash object to a list object 91 Hash object to a list object Hash object to a list object Description Hash object to a list object. Usage hash2list(x, sorting = FALSE) Arguments x A hash table with two parts, the keys (number(s) as string) and the key values (a single number). sorting This is if you you want the numbers in the keys sorted. The default value is FALSE. Details For every key, there is a key value. This function creates a list and puts every pair of keys and value in a component of a list. Value A list whose length is equal to the size of the hash table. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also hash.list, hash.find Examples x=list("1 2 4 3"=2.56,"2.34 1.05"=2) hash2list(x) hash2list(x,TRUE) 92 High dimensional MCD based detection of outliers High dimensional MCD based detection of outliers High dimensional MCD based detection of outliers Description High dimensional MCD based detection of outliers. Usage rmdp(y, alpha = 0.05, itertime = 100) Arguments y A matrix with numerical data with more columns (p) than rows (n), i.e. n<p. alpha The significance level, i.e. used to decide whether an observation is said to be considered a possible outlier. The default value is 0.05. itertime The number of iterations the algorithm will be ran. The higher the sample size, the larger this number must be. With 50 observations in R1 000 maybe this has to be 1000 in order to produce stable results. Details High dimensional outliers (n«p) are detected using a properly constructed MCD. The variances of the variables are used and the determinant is simply their product. Value A list including: runtime = runtime, dis = dis, wei = wei runtime The duration of the process. dis The final estimated Mahalanobis type normalised distances. wei A bollean variable vector specifying whether an observation is "clean" (TRUE) or a possible outlier (FALSE). cova The estimated covatriance matrix. Author(s) Initial R code: Changliang Zou <nk.chlzou@gmail.com> R code modifications: Michail Tsagris <mtsagris@yahoo.gr> C++ implementation: Manos Papadakis <papadakm95@gmail.com> Documentation: Michail Tsagris <mtsagris@yahoo.gr> and Changliang Zhou <nk.chlzou@gmail.com> References Ro K., Zou C., Wang Z. and Yin G. (2015). Outlier detection for high-dimensional data. Biometrika, 102(3):589-599. Hypothesis test for the distance correlation 93 See Also colmeans, colVars, colMedians Examples x <- matrix(rnorm(50 * 400), ncol = 400) a <- rmdp(x, itertime = 500) x<-a<-NULL Hypothesis test for the distance correlation Hypothesis test for the distance correlation Description Hypothesis test for the distance correlation. Usage dcor.ttest(x, y, logged = FALSE) Arguments x A numerical matrix. y A numerical matrix. logged Do you want the logarithm of the p-value to be returned? If yes, set this to TRUE. Details The bias corrected distance correlation is used. The hypothesis test is whether the two matrices are independent or not. Note, that this test is size correct as both the sample size and the dimensionality goes to infinity. It will not have the correct type I error for univariate data or for matrices with just a couple of variables. Value A vector with 4 elements, the bias corrected distance correlation, the degrees of freedom, the test statistic and its associated p-value. Author(s) Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. 94 Hypothesis test for two means of percentages References G.J. Szekely, M.L. Rizzo and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances. Annals of Statistics, 35(6):2769-2794. See Also bcdcor, dcov, edist Examples x <- as.matrix(iris[1:50, 1:4]) y <- as.matrix(iris[51:100, 1:4]) dcor.ttest(x, y) Hypothesis test for two means of percentages Hypothesis test for two means of percentages Description Hypothesis test for two means of percentages. Usage percent.ttest(x, y, logged = FALSE) Arguments x A numerical vector with the percentages of the first sample. Any value between 0 and 1 (inclusive) is allowed. y A numerical vector with the percentages of the first sample. Any value between 0 and 1 (inclusive) is allowed. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details This is the prop.reg but with a single categorical predictor which has two levels only. It is like a t-test for the means of two samples haivng percentages. Value A vector with three elements, the phi parameter, the test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Hypothesis test for von Mises-Fisher distribution over Kent distribution 95 References Papke L. E. & Wooldridge J. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11(6): 619-632. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also link{percent.ttests}, prop.reg, ttest2, ftest Examples x <- rbeta(100, 3, 1) y <- rbeta(100, 7.5, 2.5) percent.ttest(x, y) Hypothesis test for von Mises-Fisher distribution over Kent distribution Hypothesis test for von Mises-Fisher distribution over Kent distribution Description The null hypothesis is whether a von Mises-Fisher distribution fits the data well, and the altenrative is that the Kent distribution is more suitable. Usage fish.kent(x, logged = FALSE) Arguments x A numeric matrix containing the data as unit vectors in Euclidean coordinates. logged If you want the logarithm of the p-value ot be returned set this to TRUE. Details Essentially it is a test of rotational symmetry, whether Kent’s ovalness parameter (beta) is equal to zero. This works for spherical data only. Value A vector with two elements, the value of the test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> 96 Hypothesis testing betseen two skewness or kurtosis coefficients References Rivest, L. P. (1986). Modified Kent’s statistics for testing goodness of fit for the Fisher distribution in small concentrated samples. Statistics & probability letters, 4(1): 1-4. See Also vmf.mle, iag.mle Examples x <- rvmf(100, rnorm(3), 15) fish.kent(x) x <- NULL Hypothesis testing betseen two skewness or kurtosis coefficients Skewness and kurtosis coefficients Description Skewness and kurtosis coefficients. Usage skew.test2(x, y) kurt.test2(x, y) Arguments x A numerical vector with data. y A numerical vector with data, not necessarily of the same size. Details The skewness of kurtosis coefficients between two samples are being compared. Value A vector with the test statistic and its associated p-value. Author(s) Klio Lakiotaki R implementation and documentation: Klio Lakiotaki <kliolak@gmail.com>. Index of the columns of a data.frame which are factor variables 97 References https://en.wikipedia.org/wiki/Skewness https://en.wikipedia.org/wiki/Kurtosis See Also skew, colskewness, colmeans, colVars, colMedians Examples x <- rgamma(150,1, 4) y <- rgamma(100, 1, 4) skew.test2(x, y) kurt.test2(x, y) Index of the columns of a data.frame which are factor variables Index of the columns of a data.frame which are factor variables Description Index of the columns of a data.frame which are factor variables. Usage which_isFactor(x) Arguments A data.frame where some columns are expected to be factor variables. x Details The function is written in C++ and this is why it is very fast. Value A vector with the column indices which are factor variables. If there are no factor variables it will return an empty vector. Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also nth, Match 98 Insert new function names in the NAMESPACE file Examples which_isFactor(iris) Insert new function names in the NAMESPACE file Insert new function names in the NAMESPACE file Description Insert new function names in the NAMESPACE file. Usage AddToNamespace(path.namespace,path.rfolder,sort = FALSE) Arguments path.namespace An full path to the NAMESPACE file. path.rfolder An full path to the directory the new files to be added are stored. sort An boolean value for sorting the exported functions in file NAMESPACE. Details Reads the files that are exported in NAMESPACE and the files inside rfolder (where R files are) and insert every file that is not exported. To work properly must each R file to have the same name with the exported function. Also every file must have only one function. Value Returns the file that added in the export or empty character vector if all the files was inserted. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colShuffle, colVars, colmeans, read.directory Inverse of a symmetric positive definite matrix 99 Examples ## Not run: # for example: path.namespace="C:\some_file\NAMESPACE" where is NAMESPACE file # path.rfolder="C:\some_file\R\" where is R files are # system.time( a<-AddToNamespace(path.namespace,path.rfolder) ) # if(length(a)==0){ # print("all the files are inserted") # }else{ # print("The new files that inserted are: \n") # a # } ## End(Not run) Inverse of a symmetric positive definite matrix Inverse of a symmetric positive definite matrix Description Inverse of a symmetric positive definite matrix. Usage spdinv(A) Arguments A A square positive definite matrix. Details After calculating the Cholesky decomposition of the matrix we use this upper triangular matrix to invert the original matrix. Value The inverse of the input matrix. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References http://econ.nsysu.edu.tw/ezfiles/124/1124/img/Chapter17_MaximumLikelihoodEstimation.pdf 100 James multivariate version of the t-test See Also cholesky, cova Examples s <- cova( as.matrix(iris[, 1:4]) ) spdinv(s) solve(s) James multivariate version of the t-test James multivariate version of the t-test Description James test for testing the equality of two population mean vectors without assuming equality of the covariance matrices. Usage james(y1, y2, a = 0.05, R = 1) Arguments y1 A matrix containing the Euclidean data of the first group. y2 A matrix containing the Euclidean data of the second group. a The significance level, set to 0.05 by default. R If R is 1 the classical James test is returned. If R is 2 the MNV modficiation is implemented. Details Multivariate analysis of variance without assuming equality of the covariance matrices. The p-value can be calculated either asymptotically or via bootstrap. The James test (1954) or a modification proposed by Krishnamoorthy and Yanping (2006) is implemented. The James test uses a corected chi-square distribution, whereas the modified version uses an F distribution. Value A list including: note A message informing the user about the test used. mesoi The two mean vectors. info The test statistic, the p-value, the correction factor and the corrected critical value of the chi-square distribution if the James test has been used or, the test statistic, the p-value, the critical value and the degrees of freedom (numerator and denominator) of the F distribution if the modified James test has been used. k nearest neighbours algorithm (k-NN) 101 Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Giorgos Athineou <athineou@csd.uoc.gr> References G.S. James (1954). Tests of Linear Hypothese in Univariate and Multivariate Analysis when the Ratios of the Population Variances are Unknown. Biometrika, 41(1/2): 19-43 Krishnamoorthy K. and Yanping Xia. On Selecting Tests for Equality of Two Normal Mean Vectors (2006). Multivariate Behavioral Research 41(4): 533-548 See Also mv.eeltest2 Examples james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 1 ) james( as.matrix(iris[1:25, 1:4]), as.matrix(iris[26:50, 1:4]), R = 2 ) k nearest neighbours algorithm (k-NN) k nearest neighbours algorithm (k-NN) Description k nearest neighbours algorithm (k-NN). Usage knn(xnew, y, x, k, dist.type = "euclidean", type = "C", method = "average", freq.option = 0, mem.eff = FALSE) Arguments xnew The new data, new predictor variable values. A matrix with numerical data. y A vector with the response variable, whose values for the new data we wish to predict. This can be numerical data, factor or discrete, 0, 1, ... The latter two cases are for classification. x The dataset. A matrix with numerical data. k The number of nearest neighbours to use. The number can either be a single value or a vector with multiple values. dist.type The type of distance to be used. Either \"euclidean\" or \"manhattan\". 102 k nearest neighbours algorithm (k-NN) type If your response variable \"y\" is numerical data, then this should be \"R\" (regression). If \"y\" is in general categorical, factor or discrete set this argument to \"C\" (classification). method In case you have regression (type = \"R\") you want a way to summarise the prediction. If you want to take the average of the reponses of the k closest observations, type \"average\". For the median, type \"median\" and for the harmonic mean, type \"harmonic\". freq.option If classification (type = \"C\") and ties occur in the prediction, more than one class has the same number of k nearest neighbours, in which case there are two strategies available: Option 0 selects the first most frequent encountered. Option 1 randomly selects the most frequent value, in the case that there are duplicates. mem.eff Boolean value indicating a conservative or not use of memory. Lower usage of memory/Having this option on will lead to a slight decrease in execution speed and should ideally be on when the amount of memory in demand might be a concern. Details The concept behind k-NN is simple. Suppose we have a matrix with predictor variables and a vector with the response variable (numerical or categorical). When a new vector with observations (predictor variables) is available, its corresponding response value, numerical or category is to be predicted. Instead of using a model, parametric or not, one can use this ad hoc algorithm. The k smallest distances between the new predictor variables and the existing ones are calculated. In the case of regression, the average, median or harmonic mean of the corresponding respone values of these closest predictor values are calculated. In the case of classification, i.e. categorical response value, a voting rule is applied. The most frequent group (response value) is where the new observation is to be allocated. Value A matrix whose number of columns is equal to the size of k. If in the input you provided there is just one value of k, then a matrix with one column is returned containing the predicted values. If more than one value was supplied, the matrix will contain the predicted values for every value of k. Author(s) Marios Dimitriadis R implementation and documentation: Marios Dimitriadis <kmdimitriadis@gmail.com> References Cover TM and Hart PE (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. 13(1):21-27. Friedman J., Hastie T. and Tibshirani R. (2017). The elements of statistical learning. New York: Springer. http://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf http://statlink.tripod.com/id3.html 103 k-NN algorithm using the arc cosinus distance See Also logistic_only, fs.reg, cor.fsreg Examples # Simulate a dataset with continuous data x <- as.matrix(iris[, 1:4]) y <- as.numeric(iris[, 5]) id <- sample(1:150, 120) mod <- knn(x[-id, ], y[id], x[id, ], k = c(4, 5, mod # Predicted values of y for 3 values of k. table(mod[, 1], y[-id] ) # Confusion matrix for table(mod[, 2], y[-id] ) # Confusion matrix for table(mod[, 3], y[-id] ) # Confusion matrix for 6), type = "C", mem.eff = FALSE) k = 4 k = 5 k = 6 k-NN algorithm using the arc cosinus distance k-NN algorithm using the arc cosinus distance Description It classifies new observations to some known groups via the k-NN algorithm. Usage dirknn(xnew, x, y, k, type = "C", parallel = FALSE) Arguments xnew The new data whose membership is to be predicted, a numeric matrix with unit vectors. x The data, a numeric matrix with unit vectors. k The number of nearest neighbours. It can also be a vector with many values. y A numerical vector representing the class or label of each vector of x. 1, 2, 3, and so on. It can also be a numerical vector with data in order to perform regression. type If your response variable y is numerical data, then this should be "R" (regression). If y is in general categorical set this argument to "C" (classification). parallel Do you want th ecalculations to take place in parallel? The default value is FALSE. 104 Linear models for large scale data Details The standard algorithm is to keep the k nearest observations and see the groups of these observations. The new observation is allocated to the most frequent seen group. The non standard algorithm is to calculate the classical mean or the harmonic mean of the k nearest observations for each group. The new observation is allocated to the group with the smallest mean distance. If you want regression, the predicted value is calculated as the average of the responses of the k nearest observations. Value A matrix with the predicted group(s). It has as many columns as the values of k. Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com> See Also knn, vmf.mle, spml.mle Examples x <- as.matrix(iris[, 1:4]) x <- x/sqrt( rowSums(x^2) ) y<- as.numeric( iris[, 5] ) a <- dirknn(x, x, y, k = 2:10) Linear models for large scale data Linear models for large scale data Description Linear models for large scale data. Usage lmfit(x, y, w = NULL) Arguments x y w The design matrix with the data, where each column refers to a different sample of subjects. You must supply the design matrix, with the column of 1s. This function is the analogue of lm.fit and .lm.fit. A numerical vector or a numerical matrix. An optional numerical vector with weights. Note that if you supply this, the function does not make them sum to 1. So, you should do it. Logistic and Poisson regression models 105 Details We have simply exploitted R’s powerful function and managed to do better than .lm.fit which is a really powerful function as well. This is a bare bones function as it returns only two things, the coefficients and the residuals. .lm.fit returns more and lm.fit even more and finally lm returns too much. The motivatrion came form this site https://m-clark.github.io/docs/fastr.html . We changed the function a bit. Value A list including: be The beta coefficients. residuals The residuals of the linear model(s). Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. See Also regression, allbetas, correls, mvbetas, cor.fsreg Examples n <- 200 ; p <- 5 X <- matrnorm(n, p) y <- rnorm(200) a1 <- .lm.fit(X, y) a2 <- lmfit(X, y) x <- NULL Logistic and Poisson regression models Logistic and Poisson regression models Description Logistic and Poisson regression models. 106 Logistic and Poisson regression models Usage glm_logistic(x, y, full = FALSE,tol = 1e-09, maxiters = 100) glm_poisson(x, y, full = FALSE,tol = 1e-09) Arguments x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. This can be a matrix or a data.frame (with factors). y The dependent variable; a numerical vector with two values (0 and 1) for the logistic regression or integer values, 0, 1, 2,... for the Poisson regression. full If this is FALSE, the coefficients and the deviance will be returned only. If this is TRUE, more information is returned. tol The tolerance value to terminate the Newton-Raphson algorithm. maxiters The max number of iterations that can take place in each regression. Details The function is written in C++ and this is why it is very fast. Value When full is FALSE a list including: be The regression coefficients. devi The deviance of the model. When full is TRUE a list including: info The regression coefficients, their standard error, their Wald test statistic and their p-value. devi The deviance. Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also poisson_only, logistic_only, univglms, regression Logistic or Poisson regression with a single categorical predictor 107 Examples x <- matrix(rnorm(100 * 3), ncol = 3) y <- rbinom(100, 1, 0.6) ## binary logistic regression a1 <- glm_logistic(x, y, full = TRUE) a2 <- glm(y ~ x, binomial) x <- matrix(rnorm(100 * 3), ncol = 3) y <- rpois(100, 10) ## binary logistic regression b1 <- glm_poisson(x, y, full = TRUE) b2 <- glm(y ~ x, poisson) x<-y<-a1<-a2<-b1<-b2<-NULL Logistic or Poisson regression with a single categorical predictor Logistic or Poisson regression with a single categorical predictor Description Logistic or Poisson regression with a single categorical predictor. Usage logistic.cat1(y, x, logged = FALSE) poisson.cat1(y, x, logged = FALSE) Arguments y x logged A numerical vector with values 0 or 1. A numerical vector with discrete numbers or a factor variable. This is suppose to be a categorical predictor. If you supply a continuous valued vector the function will obviously provide wrong results. Note: For the "binomial.anova" if this is a numerical vector it must contain strictly positive numbers, i.e. 1, 2, 3, 4, ..., no zeros are allowed. Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details There is a closed form solution for the logistic regression in the case of a single predictor variable. See the references for more information. Value info devs res A matrix similar to the one produced by the glm command. The estimates, their standard error, the Wald value and the relevant p-value. For the logistic regression case a vector with the null and the residual deviances, their difference and the significance of this difference. For the Poisson regression case a vector with the log likelihood ratio test statistic value and its significance. 108 Lower and Upper triangular of a matrix Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Stan Lipovetsky (2015). Analytical closed-form solution for binary logit regression by categorical predictors. Journal of Applied Statistics, 42(1): 37–49. See Also poisson.anova, poisson.anovas, anova, logistic_only, poisson_only Examples y <- rbinom(20000, 1, 0.6) x <- as.factor( rbinom(20000, 3, 0.5) ) system.time( a1 <- logistic.cat1(y, x) ) system.time( a2 <- glm(y ~ x, binomial) ) a1 ; a2 y <- rpois(20000, 10) x <- as.factor( rbinom(20000, 3, 0.5) ) system.time( a1 <- poisson.cat1(y, x) ) system.time( a2 <- glm(y ~ x, poisson) ) a1 ; a2 x<-y<-a1<-a2<-NULL Lower and Upper triangular of a matrix Lower and Upper triangular of a matrix Description Lower/upper triangular matrix. Usage lower_tri(x, suma = FALSE, diag = FALSE) upper_tri(x, suma = FALSE, diag = FALSE) Lower and Upper triangular of a matrix 109 Arguments x A matrix with data or a vector with 2 values which is the dimension of the logical matrix to be returned with the upper or lower triangular filled with \"TRUE\". suma A logical value for returning the sum of the upper or lower triangular. By default is \"FALSE\". Works only if argument "x" is matrix. diag A logical value include the diagonal to the result. Value Get a lower/upper triangular logical matrix with values \"TRUE\"/\"FALSE\"" or a vector with the values of a lower/upper triangular or the sum of the upper/lower triangular if suma is set \"TRUE\". You can also include diagonal with any operation if argument diag is set to "TRUE". Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, colFalse, nth, rowrange, rowMedians, rowVars, sort_mat, colTrue Examples x <- matrix(runif(10*10),10,10) all.equal(lower_tri(c(10,10)),lower.tri(x)) all.equal(lower_tri(x),x[lower.tri(x)]) #all.equal(upper_tri(c(10,10)),upper.tri(x)) #all.equal(upper_tri(x),x[upper.tri(x)]) #all.equal(lower_tri(c(10,10),diag = TRUE),lower.tri(x,diag = TRUE)) #all.equal(lower_tri(x,diag = TRUE),x[lower.tri(x,diag = TRUE)]) #all.equal(upper_tri(c(10,10),diag = TRUE),upper.tri(x,diag = TRUE)) #all.equal(upper_tri(x,diag = TRUE),x[upper.tri(x,diag = TRUE)]) x<-NULL 110 Mahalanobis distance Mahalanobis distance Mahalanobis distance Description Mahalanobis distance. Usage mahala(x, mu, sigma, ischol = FALSE) Arguments x A matrix with the data, where rows denotes observations (vectors) and the columns contain the variables. mu The mean vector. sigma The covariance or any square symmetric matrix. ischol A boolean variable set to true if the Cholesky decomposition of the covariance matrix is supplied in the argument \"sigma\". Value A vector with the Mahalanobis distances. Author(s) Matteo Fasiolo <matteo.fasiolo@gmail.com>, C++ and R implementation and documentation: Matteo Fasiolo <matteo.fasiolo@gmail.com>. See Also dista, colmeans Examples x <- matrix( rnorm(100 * 50), ncol = 50 ) m <- colmeans(x) s <- cov(x) a1 <- mahala(x, m, s) Many (and one) area aunder the curve values 111 Many (and one) area aunder the curve values Many are aunder the curve values Description Many are aunder the curve values. Usage colaucs(group, preds) auc(group, preds) Arguments group A numerical vector with two values, one of which must be strictly 1. preds A numerical matrix with scores, probabilities or any other measure. In the case of auc this is a vector. Details The AUCs are calculated column-wise or just an AUC if the vector function is used. Value A vector with length equal to the number of columns of the "preds" argument. The AUC vlaues for each column. If the "auc" function is used then a signle number is returned. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also ttests, ttest, ftests Examples ## 200 variables, hence 200 AUCs will be calculated x <- matrix( rnorm(100 * 200), ncol = 200 ) ina <- rbinom(100, 1, 0.6) system.time( colaucs(ina, x) ) a <- colaucs(ina, x) b <- auc(ina, x[, 1]) x <- NULL 112 Many 2 sample proportions tests Many 2 sample proportions tests Many 2 sample proportions tests Description It performs very many 2 sample proportions tests. Usage proptests(x1, x2, n1, n2) Arguments x1 A vector with the successes of the one group. x2 A vector with the successes of the one group. n1 A vector with the number of trials of the one group. n2 A vector with the number of trials of the one group. Details The 2-sample proportions test is performed for each pair of proportions of teh two groups. Value A matrix with the proportions of each group (two columns), the test statistic and the p-value of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References B. L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. See Also ttests, ftests, colVars 113 Many 2 sample tests Examples ## 10000 variables, hence 10000 t-tests will be performed set.seed(12345) x1 <- rpois(500, 5) x2 <- rpois(500, 5) n1 <- rpois(1000, 40) n2 <- rpois(1000, 40) a <- proptests(x1, x2, n1, n2) mean(a[, 4]<0.05) x1 <- rbinom(500, 500, 0.6) x2 <- rbinom(500, 500, 0.6) b <- proptests(x1, x2, 500, 500) mean(b[, 4]<0.05) Many 2 sample tests Many 2 sample tests tests Description It performs very many 2 sample tests. Usage ttests(x, y = NULL, ina, paired = FALSE, logged = FALSE, parallel = FALSE) mcnemars(x, y = NULL, ina, logged = FALSE) var2tests(x, y = NULL, ina, alternative = "unequal", logged = FALSE) Arguments x A matrix with the data, where the rows denote the samples and the columns are the variables. y A second matrix with the data of the second group. If this is NULL (default value) then the argument ina must be supplied. Notice that when you supply the two matrices the procedure is two times faster. ina A numerical vector with 1s and 2s indicating the two groups. Be careful, the function is designed to accept only these two numbers. In addition, if your "y" is NULL, you must specify "ina". alternative The type of hypothesis to be checked, "equal", "greater", "less". paired If the groups are not independent paired t-tests should be performed and this must be TRUE, otherwise, leave it FALSE. In this case, the two groups must have equal smaple sizes, otherwise no test will be performed. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? parallel Should parallel implentations take place in C++? The default value is FALSE. 114 Many analysis of variance tests with a discrete variable Details For the ttests, if the groups are independent, the Welch’s t-test (without assuming equal variances) is performed. Otherwise many paired t-tests are performed. The McNemar’s test requires a number of observations, at least 30 would be good in order for the test to have some power and be size corect. Value A matrix with the test statistic, the degrees of freedom (if the groups are independent) and the p-value (or their logarithm) of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References B. L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. McNemar Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 12(2):153-157. See Also ftests, anovas, ttest Examples ## 1000 variables, hence 1000 t-tests will be performed x = matrnorm(100, 100) ## 100 observations in total ina = rbinom(100, 1, 0.6) + 1 ## independent samples t-test system.time( ttests(x, ina = ina) ) x1 = x[ina == 1, ] x2 = x[ina == 2, ] system.time( ttests(x1, x2) ) x <- NULL Many analysis of variance tests with a discrete variable Many analysis of variance tests with a discrete variable Description Many analysis of variance tests with a discrete variable. Many analysis of variance tests with a discrete variable 115 Usage poisson.anovas(y, ina, logged = FALSE) quasipoisson.anovas(y, ina, logged = FALSE) geom.anovas(y, ina, type = 1, logged = FALSE) Arguments y A numerical matrix with discrete valued data, i.e. counts for the case of the Poisson, or with 0s and 1s for the case of the Bernoulli distribution. Each column represents a variable. ina A numerical vector with discrete numbers starting from 1, i.e. 1, 2, 3, 4,... or a factor variable. This is suppose to be a categorical predictor. If you supply a continuous valued vector the function will obviously provide wrong results. type This rgument is for the geometric distribution. Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details This is the analysis of variance with count data. What we do is many log-likelihood ratio tests. For the quasi Poisson case we scale the difference in the deviances. Value A matrix with two values, the difference in the deviances (test statistic) and the relevant p-value. For the case of quasi Poisson the estimated φ parameter is also returned. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also g2tests, poisson.anova, anova, poisson_only, poisson.mle Examples ina <- rbinom(500, 3, 0.5) + 1 ## Poisson example y <- matrix( rpois(500 * 100, 10), ncol= 100 ) system.time(a1 <- poisson.anovas(y, ina) ) y <- NULL 116 Many ANCOVAs Many ANCOVAs Many ANCOVAs Description Many ANCOVAs. Usage ancovas(y, ina, x, logged = FALSE) Arguments y A matrix with the data, where the rows denote the observations and the columns are the variables. ina A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Be careful, the function is desinged to accept numbers greater than zero. x A numerical vector whose length is equal to the number of rows of y. This is the covariate. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details Many Analysis of covariance tests are performed. No interaction between the factor and the covariate is tested. Only the main effects. The design need not be balanced. The values of ina need not have the same frequency. The sums of squares have been adjusted to accept balanced and unbalanced designs. Value A matrix with the test statistic and the p-value for the factor variable and the covariate. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons See Also ftests, ttests, anovas Many exponential regressions 117 Examples ## 100 variables, hence 100 F-tests will be performed y <- matrix( rnorm(90 * 100), ncol = 100 ) ina <- rbinom(90, 2, 0.5) + 1 x <- rnorm(90) system.time( a <- ancovas(y, ina, x) ) m1 <- lm(y[, 15] ~ factor(ina) + x) m2 <- lm(y[, 15] ~ x + factor(ina)) anova(m1) anova(m2) y <- NULL a[15, ] ## the same with the m2 model, but not the m1 Many exponential regressions Many exponential regressions Description Many exponential regressions. Usage expregs(y, x, di, tol = 1e-09, logged = FALSE) Arguments y A vector with positive data (including zeros). x A numerical matrix with the predictor variables. di A vector of size equal to that of y with 0s and 1s indicating censoring or not respectively. tol The tolerance value to stop the newton-Raphson iterations. It is set to 1e-09 by default. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. Details We have implemented the newton-Raphson in order to avoid unnecessary calculations. Value A matrix with three columns, the test statistic, its associated (logged) p-value and the BIC of each model. 118 Many F-tests with really huge matrices Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also univglms, score.glms, logistic_only, poisson_only, regression Examples ## 200 variables, hence 200 univariate regressions are to be fitted x <- matrnorm(100, 100) y <- rexp(100, 4) system.time( expregs(y, x, di = rep(1, length(y))) ) x <- NULL Many F-tests with really huge matrices Many F-tests with really huge matrices Description Many F-tests with really huge matrices. Usage list.ftests(x, logged = FALSE) Arguments x A list with many big size matrices. Each element of the list contains a matrix. This is the ftests function but with really huge matrices, which cannot be loaded into R as a single matrix. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details The Welch’s F-test (without assuming equal variances) is performed just like in the "ftests" function. The difference is that you have a really huge matrix which you cannot load into R. In the "ftests" function, the argument "ina" denotes the different groups. Here, you "cut" the matrix into smaller ones, each of which denotes a different group and put them in a list. Many G-square tests of indepedence 119 Value A matrix with the test statistic and the p-value of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr>. References B.L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. See Also ftests, ttests Examples x <- matrnorm(300, 500) ina <- rbinom(300, 2, 0.6) + 1 a <- list() a[[ 1 ]] <- x[ina == 1, ] a[[ 2 ]] <- x[ina == 2, ] a[[ 3 ]] <- x[ina == 3, ] mod <- list.ftests(a) z <- NULL a <- NULL Many G-square tests of indepedence Many G-square tests of indepedence Description Many G-square tests of indepdence with and without permutations. Usage g2tests(data, x, y, dc) g2tests_perm(data, x, y, dc, nperm) 120 Many G-square tests of indepedence Arguments data A numerical matrix with the data. The minimum must be 0, otherwise the function can crash or will produce wrong results. x An integer number or a vector of integer numbers showing the other variable(s) to be used for the G2 test of independence. y An integer number showing which column of data to be used. dc A numerical value equal to the number of variables (or columns of the data matrix) indicating the number of distinct, unique values (or levels) of each variable. Make sure you give the correct numbers here, otherwise the degrees of freedom will be wrong. nperm The number of permutations. The permutations test is slower than without permutations and should be used with small sample sizes or when the contigency tables have zeros. When there are few variables, R’s "chisq.test" function is faster, but as the number of variables increase the time difference with R’s procedure becomes larger and larger. Details The function does all the pairwise G2 test of independence and gives the position inside the matrix. The user must build the associations matrix now, similarly to the correlation matrix. See the examples of how to do that. The p-value is not returned, we leavve this to the user. See the examples of how to obtain it. Value A list including: statistic The G2 test statistic for each pair of variables. pvalue This is returned when you have selected the permutation based G2 test. x The row or variable of the data. y The column or variable of the data. df The degrees of freedom of each test. Author(s) Giorgos Borboudakis. The permutation version used a C++ code by John Burkardt. R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. References Tsagris M. (2017). Conditional independence test for categorical data using Poisson log-linear model. Journal of Data Science, 15(2):347-356. Tsamardinos, I., & Borboudakis, G. (2010). Permutation testing improves Bayesian network learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 322-337). Springer Berlin Heidelberg. 121 Many Gini coefficients See Also g2Test, g2Test_perm, correls, univglms Examples nvalues <- 3 nvars <- 10 nsamples <- 2000 data <- matrix( sample( 0:(nvalues - 1), nvars * nsamples, replace = TRUE ), nsamples, nvars ) dc <- rep(nvalues, nvars) a <- g2tests(data = data, x = 2:9, y = 1, dc = dc) pval <- pchisq(a$statistic, a$df, lower.tail = FALSE) ## p-value b <- g2tests_perm(data = data, x = 2:9, y = 1, dc = dc, nperm = 1000) a<-b<-data<-NULL Many Gini coefficients Many Gini coefficients Description Many Gini coefficients. Usage ginis(x) Arguments x A matrix with non negative data. The rows are observations and the columns denote the variables. Details We have implemented the fast version of the Gini coefficient. See wikipedia for more details. Value A vector with the Gini coefficient, one for each variable. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. 122 Many hypothesis tests for two means of percentages See Also colskewness, colmeans, corpairs Examples x <- matrix( rpois(500 * 1000, 1000), ncol = 1000 ) a <- ginis(x) Many hypothesis tests for two means of percentages Many hypothesis tests for two means of percentages Description Many hypothesis tests for two means of percentages. Usage percent.ttests(x, y, logged = FALSE) Arguments x A numericalmatrix with the percentages of the first sample. Any value between 0 and 1 (inclusive) is allowed. y A numerical matrix with the percentages of the first sample. Any value between 0 and 1 (inclusive) is allowed. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details This is the prop.reg but with a single categorical predictor which has two levels only. It is like a t-test for the means of two samples haivng percentages. Value A matrix with three columns, the phi parameter, the test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Many moment and maximum likelihood estimations of variance components 123 References Papke L. E. & Wooldridge J. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11(6): 619-632. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also link{percent.ttest}, prop.reg, ttest2, ftest Examples x <- matrix( rbeta(100 * 10, 3, 1), ncol = 10) y <- matrix( rbeta(50 * 10, 7.5, 2.5), ncol = 10) percent.ttests(x, y) Many moment and maximum likelihood estimations of variance components Many moment and maximum likelihood estimations of variance components Description Many moment and maximum likelihood estimations of variance components. Usage colvarcomps.mom(x, id, parallel = FALSE) colvarcomps.mle(x, id, ranef = FALSE, tol= 1e-08, maxiters = 100, parallel = FALSE) Arguments x A matrix with the data, where each column refers to a different sample of subjects. id A numerical vector indicating the subject. You must put consecutive numbers and no zero values. Alternatively this can be a factor variable. ranef Do you also want the random effects to be returned? TRUE or FALSE. tol The tolerance level to terminate the golden ratio search. maxiters The maximum number of iterations to perform. parallel Should the computations run in parallel? TRUE or FALSE. 124 Many moment and maximum likelihood estimations of variance components Details Note that the "colvarcomp.mom" works for balanced designs only, i.e. for each subject the same number of measurements have been taken. The "colvarcomps.mle" works for unbalanced as well. The variance components, the variance of the between measurements and the variance of the within are estimated using moment estimators. The "colvarcomps.mom" is the moment analogue of a random effects model which uses likelihood estimation ("colvarcomps.mle"). It is much faster, but can give negative variance of the random effects, in which case it becomes zero. The maximum likelihood version is a bit slower (try youselves to see the difference), but statistically speaking is to be preferred when small samples are available. The reason why it is only a little bit slower and not a lot slower as one would imagine is because we are using a closed formula to calculate the two variance components (Demidenko, 2013, pg. 67-69). Yes, there are closed formulas for linear mixed models. Value For the "colvarcomps.mom": A matrix with 5 columns, The MSE, the estimate of the between variance, the variance components ratio and a 95% confidence for the ratio. For the "colvarcomps.mle": If ranef = FALSE a list with a single component called "info". That is a matrix with 3 columns, The MSE, the estimate of the between variance and the log-likelihood value. If ranef = TRUE a list including "info" and an extra component called "ranef" containing the random effects. It is a matrix with the same number of columns as the data. Each column contains the randome effects of each variable. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons. Charles S. Davis (2002). Statistical methods for the analysis of repeated measures. New York: Springer-Verlag. Demidenko E. (2013). Mixed Models: Thoery and Applications with R 2nd Edition). New Jersey: John Wiley \& Sons (Excellent book). See Also varcomps.mle, colrint.regbx Examples ## example taken from Montgomery, page 514-517. y <- c(98, 97, 99, 96, 91, 90, 93, 92, 96, 95, 97, 95, 95, 96, 99, 98) Many multi-sample tests 125 y <- matrix(y) id <- rep(1:4, each = 4) x <- rmvnorm(100, numeric(100), diag(rexp(100)) ) id <- rep(1:25, each = 4) n <- 25 ; d <- 4 a <- colvarcomps.mom(x, id) mean(a[, 4]<0 & a[, 5]>0) b <- colvarcomps.mle(x, id) x <- NULL Many multi-sample tests Many multi-sample tests Description Many multi-sample tests. Usage ftests(x, ina, logged = FALSE) anovas(x, ina, logged = FALSE) vartests(x, ina, type = "levene", logged = FALSE) block.anovas(x, treat, block, logged = FALSE) Arguments x A matrix with the data, where the rows denote the observations (and the two groups) and the columns are the variables. ina A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Be careful, the function is desinged to accept numbers greater than zero. Alternatively it can be a factor variable. type This is for the variances test and can be either "levene" or "bf" corresponding to Levene’s or Brown-Forsythe’s testing procedure. treat In the case of the blocking ANOVA this argument plays the role of the "ina" argument. block This item, in the blocking ANOVA denotes the subjects which are the same. Similarly to "ina" a numeric vector with 1s, 2s, 3s and so on. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? 126 Many multi-sample tests Details The Welch’s F-test (without assuming equal variances) is performed with the "ftests" function. The "anovas" function perform the classical (Fisher’s) one-way analysis of variance (ANOVA) which assumes equal variance across the groups. The "vartests" perform hypothesis test for the equality of the variances in two ways, either via the Levene or via the Brown-Forshythe procedure. Levene’s test employs the means, whereas the Brown-Forsythe procedure employs the medians and is therefore more robust to outliers. The "var2tests" implement the classical F test. The "block.anova" is the ANOVA with blocking, randomised complete block design (RCBD). In this case, for every combination of the block and treatment values, there is only one observation. The mathematics are the same as in the case of two way ANOVA, but the assumptions different and the testing procedure also different. In addition, no interaction is present. Value A matrix with the test statistic and the p-value of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References B.L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons See Also ttests Examples x <- matrix( rnorm(300 * 50), ncol = 50 ) ## 300 observations in total ina <- rbinom(300, 3, 0.6) + 1 a1 <- ftests(x, ina) a2 <- anovas(x, ina) a3 <- vartests(x, ina) x <- NULL Many multivariate simple linear regressions coefficients 127 Many multivariate simple linear regressions coefficients Many multivariate simple linear regressions coefficients Description Many multivariate simple linear regressions coefficients. Usage mvbetas(y, x, pvalue = FALSE) Arguments y A matrix with the data, where rows denotes the observations and the columns contain the dependent variables. x A numerical vector with one continuous independent variable only. pvalue If you want a hypothesis test that each slope (beta coefficient) is equal to zero set this equal to TRUE. It will also produce all the correlations between y and x. Details It is a function somehow opposite to the allbetas. Instead of having one y and many xs we have many ys and one x. Value A matrix with the constant (alpha) and the slope (beta) for each simple linear regression. If the p-value is set to TRUE, the correlation of each y with the x is calculated along with the relevant p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also allbetas, correls, univglms 128 Many non parametric multi-sample tests Examples y x a b z <<<<<- matrnorm(100, 100) rnorm(100) mvbetas(y, x, pvalue = FALSE) matrix(nrow = 100, ncol = 2) cbind(1, x) system.time( a <- mvbetas(y, x) ) system.time( for (i in 1:100) b[i, ] = coef( lm.fit( z, y[, i] ) ) y <- NULL ) Many non parametric multi-sample tests Many multi-sample tests Description Many multi-sample tests. Usage kruskaltests(x, ina, logged = FALSE) cqtests(x, treat, block, logged = FALSE) Arguments x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. ina A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Be careful, the function is desinged to accept numbers greater than zero. treat In the case of the Cochran’s Q test, this argument plays the role of the "ina" argument. block This item denotes the subjects which are the same. Similarly to "ina" a numeric vector with 1s, 2s, 3s and so on. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details The "kruskaltests" performs the Kruskal-Wallis non parametric alternative to analysis of variance test. The "cqtests" performs the Cocrhan’s Q test for the equality of more than two groups whose values are strictly binary (0 or 1). This is a generalisation of the McNemar’s test in the multi-sample case. Value A matrix with the test statistic and the p-value of each test. 129 Many odds ratio tests Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also block.anovas, ftests Examples x <- matrix( rexp(300 * 200), ncol = 200 ) ina <- rbinom(300, 3, 0.6) + 1 system.time( kruskaltests(x, ina) ) x <- matrix( rbinom(300 * 200, 1, 0.6), ncol = 200 ) treat <- rep(1:3, each = 100) block <- rep(1:3, 100) system.time( cqtests(x, treat, block) ) x <- NULL Many odds ratio tests Many odds ratio tests Description It performs very many odds ratio tests. Usage odds(x, y = NULL, ina, logged = FALSE) Arguments x A matrix with the data, where the rows denote the samples and the columns are the variables. They must be 0s and 1s only. y A second matrix with the data of the second group. If this is NULL (default value) then the argument ina must be supplied. Notice that when you supply the two matrices the procedure is two times faster. They must be 0s and 1s only. ina A numerical vector with 1s and 2s indicating the two groups. Be careful, the function is designed to accept only these two numbers. In addition, if your "y" is NULL, you must specify "ina". logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details Many odds ratio tests are performed. 130 Many one sample goodness of fit tests for categorical data Value A matrix with the test statistic and the p-value (or their logarithm) of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Mosteller Frederick (1968). Association and Estimation in Contingency Tables. Journal of the American Statistical Association. 63(321):1-28. Edwards A.W.F. (1963). The measure of association in a 2x2 table. Journal of the Royal Statistical Society, Series A. 126(1):109-114. See Also odds.ratio, g2Test_univariate Examples x <- matrix(rbinom(100 * 500, 1, 0.5), ncol = 500) ina <- rep(1:2, each = 50) a <- odds(x,ina=ina) Many one sample goodness of fit tests for categorical data Many one sample goodness of fit tests for categorical data Description Many one sample goodness of fit tests for categorical data. Usage cat.goftests(x, props, type = "gsquare", logged = FALSE) Arguments x props type logged A matrix with the data, where the rows denote the samples and the columns are the variables. The data must be integers and be of the form 1, 2, 3, and so on. The minimum must be 1, and not zero. The assumed distribution of the data. A vector or percentages summing to 1. Either Pearson’s χ2 test ("chisquare") is used or the G2 test ("qsquare", default value). Should the p-values be returned (FALSE) or their logarithm (TRUE)? Many one sample tests 131 Details Given a matrix of integers, where each column refers to a sample, the values of a categorical variable the function tests wether these values can be assumed to fit a specific distribution. Value A matrix with the test statistic and the p-value of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also ttests, ttest, ftests Examples x <- matrix( rbinom(300 * 100, 4, 0.6), ncol = 100 ) + 1 props <- dbinom(0:4, 4, 0.6) ## can we assume that each column comes from a distribution whose mass is given by props? system.time( cat.goftests(x, props) ) a1 <- cat.goftests(x, props) ## G-square test a2 <- cat.goftests(x, props, type = "chisq") ## Chi-square test cor(a1, a2) mean( abs(a1 - a2) ) x <- NULL Many one sample tests Many one sample tests Description Many one sample tests. Usage proptest(x, n, p, alternative = "unequal", logged = FALSE) ttest(x, m, alternative = "unequal", logged = FALSE, conf = NULL) vartest(x, sigma, alternative = "unequal", logged = FALSE, conf = NULL) 132 Many one sample tests Arguments x A matrix with numerical data. Each column of the matrix corresponds to a sample, or a group. In the case of the "proptest" this is a vector integers ranging from 0 up to n. It is the number of "successes". n This is for the "proptest" only and is a vector with integer numbers specifying the number of tries for the proptest. Its size is equal to the size of x. p A vector with the assumed probabilities of success in the "proptest". Its size is equal to the number of colums of the matrix x. m A vector with the assumed means. Its size is equal to the number of colums of the matrix x. sigma A vector with assumed variances. Its size is equal to the number of colums of the matrix x. alternative The type of hypothesis to be checked. Equal to ("unequal"), grater than("greater") or less than ("less") the assumed parameter. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? conf If you want confidence intervals to be returned specify the confidence level, otherwise leave it NULL. Details Despite the functions having been written in R, they are very fast. Value For all tests except for the "sftests" a matrix with two colums, the test statistic and the p-value respectively. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also ftests, ttests Examples R <- 100 ## protest x <- rbinom(R, 50, 0.6) n <- rep(50, R) p <- rep(0.6, R) a1 <- proptest(x, n, p, "unequal", logged = FALSE) sum( a1[, 2] < 0.05 ) / R Many random intercepts LMMs for balanced data with a single identical covariate. 133 ## vartest x <- matrnorm(100, 100) a2 <- vartest(x, rep(1, R) ) sum( a2[, 2] < 0.05 ) ## ttest a4 <- ttest(x, numeric(R) ) sum(a4[, 2] < 0.05) / R x <- NULL Many random intercepts LMMs for balanced data with a single identical covariate. Many random intercepts LMMs for balanced data with a single identical covariate Description Many random intercepts LMMs for balanced data with a single identical covariate. Usage colrint.regbx(y, x, id) Arguments y A numerical matrix with the data. The subject values. x A numerical vector with the same length as the number of rows of y indicating the fixed predictor variable. Its values are the same for all levels of y. An example of this x is time which is the same for all subjects. id A numerical variable with 1, 2, ... indicating the subject. Details This is a special case of a balanced random intercepts model with a compound symmetric covariance matrix and one single covariate which is constant for all replicates. An example, is time, which is the same for all subjects. Maximum likelihood estimation has been performed. In this case the mathematics exist in a closed formula (Demidenko, 2013, pg. 67-69). This is the generalistion of rint.regbx to matrices. Assume you have many observations, gene expressions over time for example, and you want to calculate the random effects or something else for each expression. Instead of using a "for" loop with rint.regbx function we have used amtrix operations to make it even faster. 134 Many regression based tests for single sample repeated measures Value A list including: info A matrix with the random intercepts variance (between), the variance of the errors (within), the log-likelihood, the deviance (twice the log-likelihood) and the BIC. In the case of "rint.reg" it also includes the number of iterations required by the generalised least squares. be The estimated regression coefficients, which in the case of "rint.regbx" are simply two: the constant and the slope (time effect). ranef A matrix with random intercepts effects. Each row corresponds to a column in y. Instead of having a matrix with the same number of columns as y we return a transposed matrix. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Eugene Demidenko (2013). Mixed Models: Theory and Applications with R, 2nd Edition. New Jersey: Wiley \& Sons (excellent book). See Also colvarcomps.mle, rint.regbx, rm.lines, varcomps.mom, rint.reg Examples y <- matrix( rnorm(100 * 50), ncol = 50) id <- rep(1:20, each = 5) x <- rep(1:10, 10) system.time( a<- colrint.regbx(y, x, id) ) x <- NULL Many regression based tests for single sample repeated measures Many regression based tests for single sample repeated measures Description Many regression based tests for single sample repeated measures. Usage rm.lines(y, x, logged = FALSE) rm.anovas(y, x, logged = FALSE) Many regression based tests for single sample repeated measures 135 Arguments y A matrix with the data, where each column refers to a different sample of subjects. For example, the first column is the repeated measurements of a sample of subjects, the second column contains repeated measurements of a second sample of subjects and so on. Within each column, the measurements of each subjects are stacked one upon the other. Say for examples there are n subjects and each of them has been measured d times (in time or at different experimental conditions). We put these in a matrix with just one column. The first d rows are the measurements of subject 1, the next d rows are the measurements of subject 2 and so on. x A numerical vector with time (usually) or the the predictor variable. For example the temperature, or the pressure. See the details for more information. Its length is equal to the time points for example, i.e. it must not have the same length as the number of rows of y. For the "rm.lines" this is a continuous variable. For the "rm.anovas" this is treated as a categorical variable, indicating say the type of experimental condition, but no difference between the points is important. Hence, for this function only, x can also be a facto variable. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details In order to see whether the repeated measurements are associated with a single covariate, e.g. time we perform many regressions and each time calculate the slope. For each subject, its regression slope with the covariate is calculated. In the end a t-test for the hypothesis that the average slopes is zero is performed. The regression slopes ignore that the measurements are not independent, but note that the slopes are independent, because they come from different subjects. This is a simple, summary statistics based approach found in Davis (2002), yet it can provide satisfactory results. The second approach ("rm.anovas") found in Davis (2002) is the usual repeated measures ANOVA. In this case, suppose you have taken measurements on one or more variables from the same group of people. See the example below on how to put such data. Value A matrix with the test statistic (t-test) and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Charles S. Davis (2002). Statistical methods for the analysis of repeated measures. Springer-Verlag, New York. 136 Many score based GLM regressions See Also rint.regbx, rint.reg, varcomps.mle Examples y <- c(74.5,81.5,83.6,68.6,73.1,79.4, 75.5,84.6,70.6,87.3,73.0,75.0, 68.9,71.6,55.9,61.9,60.5,61.8, 57.0,61.3,54.1,59.2,56.6,58.8, 78.3,84.9,64.0,62.2,60.1,78.7, 54.0,62.8,63.0,58.0,56.0,51.5, 72.5,68.3,67.8,71.5,65.0,67.7, 80.8,89.9,83.2,83.0,85.7,79.6) y <- as.matrix(y) ### the first 6 measurements are from subject 1, measurments 7-12 are from subject 2, ## measurements 13-18 are from subject 3 and so on. x <- c(-10, 25, 37, 50, 65, 80) ## all subjects were measured at the same time points rm.lines(y, x) ## Is linear trend between the measurements and the temperature? rm.anovas(y, x) ## Tests whether the means of the individuals are the same ## the temperature is treated as categorical variable here. ## fake example y <- matrnorm(10, 4) ## the y matrix contains 4 repeated measurements for each of the 10 persons. x <- 1:4 ## we stack the measurements of each subject, one under the other in a matrix form. y1 <- matrix( t(y) ) rm.anovas(y1, x) ## perform the test z <- matrix( rnorm(20 * 8), ncol = 2) ## same example, but with 2 sets of measurements. rm.anovas(z, x) Many score based GLM regressions Many score based GLM regressions Description Many score based GLM regressions. Usage score.glms(y, x, oiko = NULL, logged = FALSE ) score.multinomregs(y, x, logged = FALSE ) score.negbinregs(y, x, logged = FALSE ) 137 Many score based GLM regressions Arguments y A vector with either discrete or binary data for the Poisson or negative binomial and binary logistic regression respectively. Otherwise it is a vector with discrete values or factor values for the multinomial regression. If the vector is binary and choose multinomial regression the function checks and transfers to the binary logistic regression. x A matrix with data, the predictor variables. oiko This can be either "poisson" or "binomial". If you are not sure leave it NULL and the function will check internally. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. Details Instead of maximising the log-likelihood via the Newton-Raphson algorithm in order to perform the hypothesis testing that βi = 0 we use the score test. This is dramatcially faster as no model needs to be fitted. The first derivative (score) of the log-likelihood is known and in closed form and under the null hypothesis the fitted values are all equal to the mean of the response variable y. The variance of the score is also known in closed form. The test is not the same as the likelihood ratio test. It is size correct nonetheless but it is a bit less efficient and less powerful. For big sample sizes though (5000 or more) the results are the same. It is also much faster then the classical log-likelihood ratio test. Value A matrix with two columns, the test statistic and its associated p-value. For the Poisson and logistic regression the p-value is derived via the t distribution, whereas for the multinomial regressions via the χ2 distribution. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Campbell, M.J. (2001). Statistics at Square Two: Understand Modern Statistical Applications in Medicine, pg. 112. London, BMJ Books. Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. Agresti Alan (1996). An introduction to categorical data analysis. New York: Wiley. Joseph M.H. (2011). Negative Binomial Regression. Cambridge University Press, 2nd edition. See Also univglms, logistic_only, poisson_only, regression 138 Many score based regression models Examples x <- matrnorm(500, 500) y <- rbinom(500, 1, 0.6) ## binary logistic regression a1 <- univglms(y, x) system.time( score.glms(y, x) ) a2 <- score.glms(y, x) x <- NULL cor(a1, a2) mean(a1 - a2) Many score based regression models Many score based regression models. Description Many score based regression models. Usage score.weibregs(y, x, logged = FALSE) score.betaregs(y, x, logged = FALSE) score.gammaregs(y, x, logged = FALSE) score.expregs(y, x, logged = FALSE) score.invgaussregs(y, x, logged = FALSE) score.ztpregs(y, x, logged = FALSE) score.geomregs(y, x, logged = FALSE) Arguments y A vector with data. For the Weibull, gamma and exponential regressions they must be strictly positive data, lifetimes or durations for example. For the beta regression they must be numbers between 0 and 1. For the zero truncated Poisson regression (score.ztpregs) they must be integer valued data strictly greater than 0. x A matrix with data, the predictor variables. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. Details Instead of maximising the log-likelihood via the Newton-Raphson algorithm in order to perform the hypothesis testing that βi = 0 we use the score test. This is dramatcially faster as no model need to be fitted. The first derivative of the log-likelihood is known in closed form and under the null hypothesis the fitted values are all equal to the mean of the response variable y. The test is not the same as the likelihood ratio test. It is size correct nonetheless but it is a bit less efficient and less powerful. For big sample sizes though (5000 or more) the results are the same. You cna try for 139 Many Shapiro-Francia normality tests yourselves and see that even with 500 the results are pretty close. The score test is also very faster then the classical likelihood ratio test. What we have seen via simulation studies is that it is size correct to large sample sizes, at elast a few thousands. Value A matrix with two columns, the test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. Campbell, M.J. (2001). Statistics at Square Two: Understand Modern Statistical Applications in Medicine, pg. 112. London, BMJ Books. See Also score.glms univglms, logistic_only, poisson_only, regression Examples x <- matrnorm(300, 100) y <- rweibull(300, 2, 3) a <- score.weibregs(y, x) sum(a[, 2] < 0.05) / 100 x <- NULL Many Shapiro-Francia normality tests Many Shapiro-Francia normality tests Description Many Shapiro-Francia normality tests. Usage sftests(x, logged = FALSE) sftest(x, logged = FALSE) 140 Many Shapiro-Francia normality tests Arguments x A matrix with the data, where the rows denote the observations and the columns are the variables. In the case of a single sample, then this must be a vector and "sftest" is to be used. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details The Shapiro-Francia univariate normality test is performed for each column (variable) of the matrix x. Value A matrix with the squared correlation between the ordered values and the standard normal ordered statistics, the test statistic and the p-value of each test. If the "sftest" has been used, the output is a vector with these three elements. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Royston J. P. (1983). A simple method for evaluating the Shapiro-Francia W’ test of non-normality. The Statistician, 32(3): 297-300. Mbah A. K. & Paothong A. (2015). Shapiro-Francia test compared to other normality test using expected p-value. Journal of Statistical Computation and Simulation, 85(15): 3002-3016. See Also ttests, ttest, ftests Examples x <- matrnorm(200, 100) system.time( sftests(x) ) a <- sftests(x) mean(a[, 3]<0.05) x <- rnorm(100) sftest(x) Many simple circular or angular regressions 141 Many simple circular or angular regressions Many simple circular or angular regressions Description Many regressions with one circular dependent variable and one Euclidean independent variable. Usage spml.regs(y, x, tol = 1e-07, logged = FALSE, maxiters = 100, parallel = FALSE) Arguments y The dependent variable, it can be a numerical vector with data expressed in radians or it can be a matrix with two columns, the cosinus and the sinus of the circular data. The benefit of the matrix is that if the function is to be called multiple times with the same response, there is no need to transform the vector every time into a matrix. x A matrix with independent variable. tol The tolerance value to terminatate the Newton-Raphson algorithm. logged Do you want the logarithm of the p-value be returned? TRUE or FALSE. maxiters The maximum number of iterations to implement. parallel Do you want the calculations to take plac ein parallel? The default value if FALSE. Details The Newton-Raphson algorithm is fitted in these regression as described in Presnell et al. (1998). For each colum of x a circual regression model is fitted and the hypothesis testing of no association between y and this variable is performed. Value A matrix with two columns, the test statistics and their associated (log) p-values. Author(s) Michail Tsagris and Stefanos Fafalios R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Stefanos Fafalios <stefanosfafalios@gmail.com> References Presnell Brett, Morrison Scott P. and Littell Ramon C. (1998). Projected multivariate linear models for directional data. Journal of the American Statistical Association, 93(443): 1068-1077. 142 Many simple Gaussian regressions with a log-link See Also spml.mle, iag.mle, acg.mle Examples x z y y x a x <<<<<<<- rnorm(100) cbind(3 + 2 * x, 1 -3 * x) cbind( rnorm(100,z[ ,1], 1), rnorm(100, z[ ,2], 1) ) y / sqrt( rowsums(y^2) ) matrnorm(100, 100) spml.regs(y, x) NULL Many simple Gaussian regressions with a log-link Many simple Gaussian regressions with a log-link Description Many simple Gaussian regressions with a log-link. Usage normlog.regs(y, x, tol = 1e-08, logged = FALSE, parallel = FALSE, maxiters = 100) Arguments y The dependent variable, a numerical variable with non negative numbers. x A matrix with the indendent variables. tol The tolerance value to terminate the Newton-Raphson algorithm. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. parallel Do you want this to be executed in parallel or not. The parallel takes place in C++, therefore you do not have the option to set the number of cores. maxiters The maximum number of iterations that can take place in each regression. Details Many simple Gaussian regressions with a log-link are fitted. Value A matrix with the test statistic values, their relevant (logged) p-values and the BIC values. Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com> Many simple geometric regressions 143 See Also normlog.reg, score.glms, prop.regs, allbetas Examples y <- abs( rnorm(100) ) x <- matrnorm(100, 100) a <- normlog.regs(y, x) b <- glm(y ~ x[, 1], family = gaussian(log) ) anova(b, test= "Chisq") a[1, ] x <- NULL Many simple geometric regressions Many simple geometric regressions. Description Many simple geometric regressions. Usage geom.regs(y, x, tol = 1e-07, type = 1, logged = FALSE, parallel = FALSE, maxiters = 100) Arguments y The dependent variable, count data. x A matrix with the indendent variables. tol The tolerance value to terminate the Newton-Raphson algorithm. type Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. parallel Do you want this to be executed in parallel or not. The parallel takes place in C++, and the number of threads is defined by each system’s availiable cores. maxiters The max number of iterations that can take place in each regression. Details Many simple geometric regressions are fitted. Value A matrix with the test statistic values, their relevant (logged) p-values and the BIC values. 144 Many simple linear mixed model regressions Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com> See Also poisson_only, prop.regs, score.geomregs Examples y x a x <<<<- rgeom(100, 0.6) matrix( rnorm(100 * 50), ncol = 50) geom.regs(y, x) NULL Many simple linear mixed model regressions Many simple linear mixed model regressions Description Many simple linear mixed model regressions with random intercepts only. Usage rint.regs(y, x, id, tol = 1e-08, logged = FALSE, parallel = FALSE, maxiters = 100) Arguments y A numerical vector with the data. The subject values, the clustered data. x A numerical matrix with data ,the independent variables. id A numerical variable with 1, 2, ... indicating the subject. Unbalanced design is of course welcome. tol The tolerance value to terminate the Newton-Raphson algorithm. This is set to 10−9 by default. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? parallel Do you want this to be executed in parallel or not. The parallel takes place in C++, and the number of threads is defined by each system’s availiable cores. maxiters The max number of iterations that can take place in each regression. Details Many linear mixed models with a single covariate are fitted. We use Newton-Raphson as described in Demidenko (2013). The test statistic is the usual F-test. This model allows for random intercepts only. Many simple linear regressions coefficients 145 Value A two-column matrix with the test statistics (Wald statistic) and the associated p-values (or their loggarithm). Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com> References Eugene Demidenko (2013). Mixed Models: Theory and Applications with R, 2nd Edition. New Jersey: Wiley \& Sons (excellent book). See Also rint.reg, allbetas univglms, score.glms, logistic_only Examples ## not a so good example y <- rnorm(100) id <- sample(1:10, 100, replace = TRUE) x <- matrix( rnorm(100 * 100), ncol = 100) a <- rint.regs(y, x, id) x <- NULL Many simple linear regressions coefficients Simple linear regressions coefficients Description Simple linear regressions coefficients. Usage allbetas(y, x, pvalue = FALSE, logged = FALSE) Arguments y x pvalue logged A numerical vector with the response variable. If the y contains proportions or percentages, i.e. values between 0 and 1, the logit transformation is applied first and the transformed data are used. A matrix with the data, where rows denotes the observations and the columns contain the independent variables. If you want a hypothesis test that each slope (beta coefficient) is equal to zero set this equal to TRUE. It will also produce all the correlations between y and x. A boolean variable; it will return the logarithm of the pvalue if set to TRUE. 146 Many simple multinomial regressions Value A matrix with the constant (alpha) and the slope (beta) for each simple linear regression. If the p-value is set to TRUE, the correlation of each y with the x is calculated along with the relevant test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also mvbetas, correls, univglms, colsums, colVars Examples x y r a x <<<<<- matrix( rnorm(100 * 50), ncol = 50 ) rnorm(100) cor(y, x) ## correlation of y with each of the xs allbetas(y, x) ## the coefficients of each simple linear regression of y with x NULL Many simple multinomial regressions Many simple multinomial regressions. Description Many simple multinomial regressions. Usage multinom.regs(y, x, tol = 1e-08, logged = FALSE, maxiters = 100) Arguments y The dependent variable, either a numerical variable or a factor variable. x A matrix with the indendent variables. tol The tolerance value to terminate the Newton-Raphson algorithm. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. maxiters The maximum number of iterations that can take place in each regression. Details Many simple multinomial regressions are fitted. Many tests for the dispersion parameter in Poisson distribution 147 Value A matrix with the test statistic values, their relevant (logged) p-values and the BIC values. Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com> See Also poisson_only, prop.regs, score.geomregs Examples y x a x <<<<- rbinom(100, 2, 0.5) matrnorm(100, 100) multinom.regs(y, x) NULL Many tests for the dispersion parameter in Poisson distribution Many tests for the dispersion parameter in Poisson distribution Description Many tests for the dispersion parameter in Poisson distribution. Usage colpoisdisp.tests(y, alternative = "either", logged = FALSE) colpois.tests(y, logged = FALSE) Arguments y alternative logged A numerical matrix with count data, 0, 1,... Do you want to test specifically for either over or underspirsion ("either"), overdispersion ("over") or undersispersion ("under")? Set to TRUE if you want the logarithm of the p-value. Value A matrix with two columns, the test statistic and the (logged) p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. 148 Many two-way ANOVAs References Yang Zhao, James W. Hardin, and Cheryl L. Addy. (2009). A score test for overdispersion in Poisson regression based on the generalized Poisson-2 model. Journal of statistical planning and inference 139(4):1514-1521. Dimitris Karlis and Evdokia Xekalaki (2000). A Simulation Comparison of Several Procedures for Testing the Poisson Assumption. Journal of the Royal Statistical Society. Series D (The Statistician), 49(3): 355-382. Bohning, D., Dietz, E., Schaub, R., Schlattmann, P. and Lindsay, B. (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(): 373-388. See Also poisson.mle, negbin.mle, poisson.anova, poisson.anovas, poisson_only Examples y <- matrix(rnbinom(100* 50, 10, 0.6), ncol = 50) a1 <- colpoisdisp.tests(y, "over") b1 <- colpois.tests(y) y <- matrix(rpois(100* 50, 10), ncol = 50) a2 <- colpoisdisp.tests(y, "either") b2 <- colpois.tests(y) y <- NULL Many two-way ANOVAs Many two-way ANOVAs Description Many two-way ANOVAs. Usage twoway.anovas(y, x1, x2, interact = FALSE, logged = FALSE) Arguments y x1 x2 interact logged A matrix with the data, where the rows denote the observations (and the two groups) and the columns are the variables. A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Alternatively it can be a factor variable. This is the one factor. A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Alternatively it can be a factor variable. This is the other factor. A boolean variable specifying whether you want to test for interaction. Should the p-values be returned (FALSE) or their logarithm (TRUE)? Many univariate generalised linear models 149 Details The classical two-way ANOVA design is performed. Note that the design must be balanced. For every combination of values of the two factors, x1 and x2 the same number of observations must exist. If that’s not the case, regression models must be used. Value A matrix with the test statistic and the p-value of each test. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons See Also ancovas, ftests, ttests Examples y <- as.matrix( rnorm(125) ) x1 <- rep(1:5, 25) x2 <- rep(1:5, each = 25) x1 <- factor(x1) x2 <- factor(x2) anova( lm(y ~ x1 + x2) ) twoway.anovas(y, x1, x2) anova( lm(y ~ x1*x2) ) twoway.anovas(y, x1, x2, interact = TRUE) y <- matrnorm(125, 100) system.time( a1 <- twoway.anovas(y, x1, x2) ) system.time( a2 <- twoway.anovas(y, x1, x2, interact = TRUE) ) y <- NULL Many univariate generalised linear models Many univariate generalised linear regressions Description It performs very many univariate generalised linear regressions. 150 Many univariate generalised linear models Usage univglms(y, x, oiko = NULL, logged = FALSE) univglms2(y, x, oiko = NULL, logged = FALSE) Arguments y The dependent variable. It can be a factor or a numerical variable with two values only (binary logistic regression), a discrete valued vector (count data) corresponding to a poisson regression or a numerical vector with continuous values (normal regression). x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. For the "univglms" only continuous variables are allowed. You are advised to standardise the data before hand to avoid numerical overflow or similar issues. If you see NaN in the outcome, this might be the case. For the "univglms2" categorical variables are allowed and hence this accepts data.frames. In this case, the categorical variables must be given as factor variables, otherwise you might get wrong results. oiko This can be either "normal", "poisson", "quasipoisson" or "binomial". If you are not sure leave it NULL and the function will check internally. However, you might have discrete data (e.g. years of age) and want to perform many simple linear regressions. In this case you should specify the family. logged A boolean variable; it will return the logarithm of the pvalue if set to TRUE. Details If you specify no family of distributions the function internally checkes the type of your data and decides on the type of regression to perform. The function is written in C++ and this is why it is very fast. It can accept thousands of predictor variables. It is usefull for univariate screening. We provide no p-value correction (such as fdr or q-values); this is up to the user. Value A matrix with the test statistic and the p-value for each predictor variable. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. Many univariate simple binary logistic regressions 151 See Also logistic_only, poisson_only, allbetas, correls, regression Examples x <- matrnorm(100, 100) y <- rbinom(100, 1, 0.6) ## binary logistic regression system.time( univglms(y, x) ) a1 <- univglms(y, x) a2 <- numeric(100) system.time( for (i in 1:100) a2[i] = glm(y ~ x[, i], binomial)$deviance ) a2 <- glm(y ~ 1, binomial)$null.dev - a2 x <- NULL Many univariate simple binary logistic regressions Many univariate simple binary logistic regressions Description It performs very many univariate simple binary logistic regressions. Usage logistic_only(x, y, tol = 1e-09, b_values = FALSE) Arguments x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. Currently only continuous variables are allowed. y The dependent variable; a numerical vector with two values (0 and 1). tol The tolerance value to terminate the Newton-Raphson algorithm. b_values Do you want the values of the coefficients returned? If yes, set this to TRUE. Details The function is written in C++ and this is why it is very fast. It can accept thousands of predictor variables. It is usefull for univariate screening. We provide no p-value correction (such as fdr or q-values); this is up to the user. Value A vector with the deviance of each simple binayr logistic regression model for each predictor variable. 152 Many univariate simple linear regressions Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also allbetas, correls, poisson_only, regression Examples ## 300 variables, hence 300 univariate regressions are to be fitted x = matrix( rnorm(100 * 300), ncol = 300 ) ## 100 observations in total y = rbinom(100, 1, 0.6) ## binary logistic regression system.time( logistic_only(x, y) ) a1 = logistic_only(x, y) a2 <- numeric(300) system.time( for (i in 1:300) a2[i] = glm(y ~ x[, i], binomial)$deviance ) a2 = as.vector(a2) all.equal(a1, a2) a1<-a2<-y<-x<-NULL Many univariate simple linear regressions Many univariate simple linear regressions Description It performs very many univariate simple linear regressions with or without categorical variables. Usage regression(x, y, logged = FALSE) Many univariate simple linear regressions 153 Arguments x A data.frame or a matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. A data frame is expected if you have categorical predictor variables. If you only have continuous predictor variables you should the function allbetas instead as it is faster. y The dependent variable; a numerical vector. logged Do you want the logarithm of the p-values be returned? The default value is FALSE. Details Some parts of the function will be transferred in C++. It can accept thousands of predictor variables. It is usefull for univariate screening. We provide no p-value correction (such as fdr or q-values); this is up to the user. Value A matrix with two columns, the test statistic value and its corresponding (logged) p-value. Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also univglms, allbetas, correls, univglms, mvbetas Examples y <- rnorm(150) a <- regression(iris, y) a summary(lm(y ~ iris[, 5]) ) ## check the F-test 154 Many univariate simple poisson regressions Many univariate simple poisson regressions Many univariate simple poisson regressions Description It performs very many univariate simple poisson regressions. Usage poisson_only(x, y, tol = 1e-09, b_values = FALSE) Arguments x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. Currently only continuous variables are allowed. y The dependent variable; a numerical vector with many discrete values (count data). tol The tolerance value to terminate the Newton-Raphson algorithm. b_values Do you want the values of the coefficients returned? If yes, set this to TRUE. Details The function is written in C++ and this is why it is very fast. It can accept thousands of predictor variables. It is usefull for univariate screening. We provide no p-value correction (such as fdr or q-values); this is up to the user. Value A vector with the deviance of each simple poisson regression model for each predictor variable. Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also univglms, logistic_only, allbetas, regression Many univariate simple quasi poisson regressions 155 Examples ## 200 variables, hence 200 univariate regressions are to be fitted x = matrix( rnorm(100 * 200), ncol = 200 ) y = rpois(100, 10) system.time( poisson_only(x, y) ) b1 = poisson_only(x, y) b2 = numeric(500) system.time( for (i in 1:200) b2[i] = glm(y ~ x[, i], poisson)$deviance ) all.equal(b1, b2) b1<-b2<-x<-y<-NULL Many univariate simple quasi poisson regressions Many univariate simple poisson regressions Description It performs very many univariate simple poisson regressions. Usage quasi.poisson_only(x, y, tol = 1e-09, maxiters = 100) Arguments x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. Currently only continuous variables are allowed. y The dependent variable; a numerical vector with many discrete values (count data). maxiters The maximum number of iterations after which the Newton-Raphson algorithm is terminated. tol The tolerance value to terminate the Newton-Raphson algorithm. Details The function is written in C++ and this is why it is very fast. It can accept thousands of predictor variables. It is usefull for univariate screening. We provide no p-value correction (such as fdr or q-values); this is up to the user. 156 Many Welch’s F-tests Value A matrix with the deviance and the estimated phi parameter (dispersion parameter) of each simple poisson regression model for each predictor variable. Author(s) Manos Papadakis <papadakm95@gmail.com> and Stefanos Fafalios <stefanosfafalios@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr>, Manos Papadakis <papadakm95@gmail.com> and Stefanos Fafalios <stefanosfafalios@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also poisson_only univglms, logistic_only, allbetas, regression Examples ## 200 variables, hence 200 univariate regressions are to be fitted x <- matrix( rnorm(100 * 200), ncol = 200 ) y <- rpois(100, 10) system.time( poisson_only(x, y) ) b1 <- poisson_only(x, y) b2 <- quasi.poisson_only(x, y) b1<-b2<-x<-y<-NULL Many Welch’s F-tests Many Welch’s F-tests Description Many Welch’s F-tests. Usage colanovas(y, x, logged = FALSE) Arguments y x logged A numerical vector with the dependent variable. A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. This must be a matrix with the categorical variables as numbers, starting from 1. Welch’s F-test is performed for each variable. A boolean variable; it will return the logarithm of the pvalue if set to TRUE. 157 Match Details For each categorical variable in the x matrix Welch’s F test is performed. This is the opposie of ftests,where there are many dependent variables and one categorical variable. Value A matrix with the test statistic and the p-value for each predictor variable. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also regression, ftests, allbetas, correls Examples y x a x <- rnorm(100) <- matrix( rbinom(100 * 50, 2, 0.5) + 1 , ncol = 50) <- colanovas(y, x) <- NULL Match Match Description Return the positions of its first argument that matches in its second. Usage Match(x,key=NULL) Arguments x A numeric vector. key The value/vector for searching in vector x. For now let it NULL. dont’t use it!. 158 Matrix with all pairs of t-tests Details This function implements the R’s \"match\" function. This version basicaly calculates the match(x,sort(unique(x))) for now. Do not use the argument key! Value Returns the position/positions of the given key/keys in the x vector. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> See Also match Examples y <- rnorm(100) a <- Match(y) b <-50 all.equal(as.vector(a),as.vector(b)) Matrix with all pairs of t-tests Matrix with all pairs of t-tests Description Matrix with all pairs of t-tests. Usage allttests(x, y = NULL, ina, logged = FALSE) ttests.pairs(x, logged = FALSE) Arguments x A numerical matrix with the data. y For the case of "all.tests", if you have the second group or sample provide it here, otherwise leave it NULL. For the case of "ttests.pairs" this is not required. ina If you have the data in one matric then provide this indicator variable separating the samples. This numerical vector must contain 1s and 2s only as values. For the case of "ttests.pairs" this is not required. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Matrix with G-square tests of indepedence 159 Details The function does all the pairwise t-tests assuming unequal variances (Welch’s t-test). The "all.ttests" does all the pairs formed by "cutting" the matrices x and y in two and everything between them. The "ttests.pairs" accepts a matrix x and does all the pairs of t-tests. This is similar to the correlation matrix style. Value A list including: stat A matrix with t-test statistic for each pair of variables. pvalue A matrix with the corresponding p-values. dof A matrix with the relevant degrees of freedom. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also ttests, ftests, ttest, g2Test_univariate Examples x <- as.matrix( iris[1:100, 1:4] ) ina <- as.numeric(iris[1:100, 5]) a <- allttests(x, ina = ina) b <- ttests.pairs(x) ## less tests Matrix with G-square tests of indepedence Matrix with G-square tests of indepdence Description Matrix with G-square tests of indepdence with and without permutations. Usage g2Test_univariate(data, dc) g2Test_univariate_perm(data, dc, nperm) 160 Matrix with G-square tests of indepedence Arguments data A numerical matrix with the data. The minimum must be 0, otherwise the function can crash or will produce wrong results. dc A numerical value equal to the number of variables (or columns of the data matrix) indicating the number of distinct, unique values (or levels) of each variable. Make sure you give the correct numbers here, otherwise the degrees of freedom will be wrong. nperm The number of permutations. The permutations test is slower than without permutations and should be used with small sample sizes or when the contigency tables have zeros. When there are few variables, R’s "chisq.test" function is faster, but as the number of variables increase the time difference with R’s procedure becomes larger and larger. Details The function does all the pairwise G2 test of independence and gives the position inside the matrix. The user must build the associations matrix now, similarly to the correlation matrix. See the examples of how to do that. The p-value is not returned, we live this to the user. See the examples of how to obtain it. Value A list including: statistic The G2 test statistic for each pair of variables. pvalue This is returned when you have selected the permutation based G2 test. x The row or variable of the data. y The column or variable of the data. df The degrees of freedom of each test. Author(s) Giorgos Borboudakis. The permutation version used a C++ code by John Burkardt. R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. References Tsagris M. (2017). Conditional independence test for categorical data using Poisson log-linear model. Journal of Data Science, 15(2):347-356. Tsamardinos, I., & Borboudakis, G. (2010). Permutation testing improves Bayesian network learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 322-337). Springer Berlin Heidelberg See Also g2Test, g2Test_perm, correls, univglms Mean - Median absolute deviation of a vector 161 Examples nvalues <- 3 nvars <- 10 nsamples <- 2000 data <- matrix( sample( 0:(nvalues - 1), nvars * nsamples, replace = TRUE ), nsamples, nvars ) dc <- rep(nvalues, nvars) system.time( g2Test_univariate(data = data, dc = dc) ) a <- g2Test_univariate(data = data, dc = dc) pval <- pchisq(a$statistic, a$df, lower.tail = FALSE) g <- matrix(0, nvars, nvars) g[ cbind(a$x, a$y) ] <- a$statistic g <- g + t(g) diag(g) <- 0 ## g ## matrix of G^2 test statistics g<-a<-dc<-data<-NULL Mean - Median absolute deviation of a vector Mean - Median absolute deviation of a vector Description Mean - Median absolute deviation of a vector. Usage mad2(x,method = "median",na.rm=FALSE) Arguments x A numerical vector. method A character vector with values "median", for median absolute deviation or "mean", for mean absolute deviation. na.rm A logical value TRUE/FALSE to remove NAs. Value The mean absolute deviation. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> 162 Median of a vector See Also colMads, med, colMedians Examples x <- Rnorm(1000) mad2(x) mad(x) Median of a vector Median of a vector Description Median of a vector. Usage med(x,na.rm=FALSE) Arguments x A numerical vector. na.rm TRUE or FAlSE for remove NAs if exists. Details The function is written in C++ and this is why it is very fast. Value The median of the vector of a numbers. Author(s) Manos Papadakis <papadakm95@gmail.com> R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also nth, colMedians Examples x <- rnorm(1000) system.time( for (i in 1:100) med(x) ) system.time( for (i in 1:100) median(x) ) Minima and maxima of two vectors/matrices 163 Minima and maxima of two vectors/matrices Minima and maxima of two vectors/matrices Description Minima and maxima of two vectors/matrices. Usage Pmax(x, y,na.rm = FALSE) Pmin(x, y,na.rm = FALSE) Pmin_Pmax(x, y,na.rm = FALSE) Arguments x y na.rm A numerical vector with numbers. A numerical vector with numbers. TRUE or FAlSE for remove NAs if exists. Details The parallel minima or maxima are returned. This are the same as the base functions pmax and pmin. Value A numerical vector/matrix with numbers, whose length is equal to the length of the initial vectors/matrices containing the maximum or minimum between each pair. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also sort_mat, Sort, colMins Examples x <- rnorm(10) y <- rnorm(10) Pmax(x, y) a<-pmax(x, y) Pmin(x, y) b<-pmin(x, y) Pmin_Pmax(x,y) == c(a,b) a<-b<-x<-y<-NULL 164 minimum and maximum minimum and maximum Minimum and maximum of a vector Description Minimum and maximum of a vector. Usage min_max(x,index=FALSE, percent = FALSE) Arguments x A numerical vector with data. NAs are handled naturally. index A boolean value for the indices of the minimum and the maximum value. percent A boolean value for the percent of the positive and negative numbers. Value A vector with the relevant values, min and max. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowMaxs, nth, colrange, colMedians,sort_mat Examples x <- rnorm(100 * 500) s1 <- min_max(x) s2 <- c(min(x), max(x)) minimum and maximum frequencies 165 minimum and maximum frequencies Minimum and maximum frequencies of a vector Description Minimum and maximum frequencies of a vector. Usage freq.min(x,na.rm = FALSE) freq.max(x,na.rm = FALSE) Arguments x A numerical/integer vector with data but without NAs. na.rm TRUE or FAlSE for remove NAs if exists. Details Those functions are the same with max(table(x) or min(table(x)) but with one exception. freq.min and freq.max will return also which value has the minimum/maximum frequency. More Efficient than max(table(x) or min(table(x)). Value A vector with 2 values, the value with minimum/maximum frequency and the frequency. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> and Marios Dimitriadis <kmdimitriadis@gmail.com>. See Also rowMins, rowMaxs, nth, colrange, colMedians,sort_mat Examples x <- rnorm(100) f1 <- freq.min(x) f2 <- freq.max(x) # f1r <- min(table(x)) # f2r <- max(table(x)) # f1[2]==f1r ## the frequencies are the same # f2[2]==f2r ## the frequencies are the same 166 MLE for multivariate discrete data MLE for multivariate discrete data MLE for multivariate discrete data Description MLE for multivariate discrete data. Usage multinom.mle(x) dirimultinom.mle(x, tol = 1e-07) colpoisson.mle(x) colgeom.mle(x, type = 1) Arguments x A matrix with discrete valued non negative data. tol the tolerance level to terminate the Newton-Raphson algorithm for the Dirichlet multinomial distribution. type This is for the geometric distribution only. Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. Details For the Poisson and geometric distributions we simply fit independent Poisson and geometric distributions respectively. Value A list including: loglik A vector with the value of the maximised log-likelihood. param A vector of the parameters. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Johnson Norman L., Kotz Samuel and Balakrishnan (1997). Discrete Multivariate Distributions. Wiley Minka Thomas (2012). Estimating a Dirichlet distribution. Technical report. MLE of (hyper-)spherical distributions 167 See Also poisson.mle, zip.mle, ztp.mle, negbin.mle, poisson.nb Examples x <- t( rmultinom(1000, 20, c(0.4, 0.5, 0.1) ) ) multinom.mle(x) colpoisson.mle(x) x <- NULL MLE of (hyper-)spherical distributions MLE of (hyper-)spherical distributions Description MLE of (hyper-)spherical distributions. Usage vmf.mle(x, tol = 1e-07) multivmf.mle(x, ina, tol = 1e-07, ell = FALSE) acg.mle(x, tol = 1e-07) iag.mle(x, tol = 1e-07) Arguments x A matrix with directional data, i.e. unit vectors. ina A numerical vector with discrete numbers starting from 1, i.e. 1, 2, 3, 4,... or a factor variable. Each number denotes a sample or group. If you supply a continuous valued vector the function will obviously provide wrong results. ell This is for the multivmf.mle only. Do you want the log-likelihood returned? The default value is TRUE. tol The tolerance value at which to terminate the iterations. Details For the von Mises-Fisher, the normalised mean is the mean direction. For the concentration parameter, a Newton-Raphson is implemented. For the angular central Gaussian distribution there is a constraint on the estimated covariance matrix; its trace is equal to the number of variables. An iterative algorithm takes place and convergence is guaranteed. Newton-Raphson for the projected normal distribution, on the sphere, is implemented as well. Finally, the von Mises-Fisher distribution for groups of data is also implemented. 168 MLE of (hyper-)spherical distributions Value For the von Mises-Fisher a list including: loglik The maximum log-likelihood value. mu The mean direction. kappa The concentration parameter. For the multi von Mises-Fisher a list including: loglik A vector with the maximum log-likelihood values if ell is set to TRUE. Otherwise NULL is returned. mi A matrix with the group mean directions. ki A vector with the group concentration parameters. For the angular central Gaussian a list including: iter The number if iterations required by the algorithm to converge to the solution. cova The estimated covariance matrix. For the spherical projected normal a list including: iters The number of iteration required by the Newton-Raphson. mesi A matrix with two rows. The first row is the mean direction and the second is the mean vector. The first comes from the second by normalising to have unit length. param A vector with the elements, the norm of mean vector, the log-likelihood and the log-likelihood of the spherical uniform distribution. The third value helps in case you want to do a log-likleihood ratio test for uniformity. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Mardia, K. V. and Jupp, P. E. (2000). Directional statistics. Chicester: John Wiley & Sons. Sra, S. (2012). A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x). Computational Statistics, 27(1): 177–190. Tyler D. E. (1987). Statistical analysis for the angular central Gaussian distribution on the sphere. Biometrika 74(3): 579-589. Paine P.J., Preston S.P., Tsagris M and Wood A.T.A. (2017). An Elliptically Symmetric Angular Gaussian Distribution. Statistics and Computing (To appear). See Also racg, vm.mle, rvmf MLE of continuous univariate distributions defined on the positive line 169 Examples m <- c(0, 0, 0, 0) s <- cov(iris[, 1:4]) x <- racg(100, s) mod <- acg.mle(x) mod cov2cor(mod$cova) ## estimated covariance matrix turned into a correlation matrix cov2cor(s) ## true covariance matrix turned into a correlation matrix vmf.mle(x) x <- rbind( rvmf(100,rnorm(4), 10), rvmf(100,rnorm(4), 20) ) a <- multivmf.mle(x, rep(1:2, each = 100) ) MLE of continuous univariate distributions defined on the positive line MLE of continuous univariate distributions defined on the positive line Description MLE of continuous univariate distributions defined on the positive line. Usage gammamle(x, tol = 1e-09) chisq.mle(x, tol = 1e-09) weibull.mle(x, tol = 1e-09, maxiters = 100) lomax.mle(x, tol = 1e-09) foldnorm.mle(x, tol = 1e-09) betaprime.mle(x, tol = 1e-09) logcauchy.mle(x, tol = 1e-09) loglogistic.mle(x, tol = 1e-09) halfnorm.mle(x) invgauss.mle(x) lognorm.mle(x) pareto.mle(x) expmle(x) exp2.mle(x) maxboltz.mle(x) rayleigh.mle(x) normlog.mle(x) lindley.mle(x) Arguments x A vector with positive valued data (zeros are not allowed). tol The tolerance level up to which the maximisation stops; set to 1e-09 by default. maxiters The maximum number of iterations the Newton-Raphson will perform. 170 MLE of continuous univariate distributions defined on the positive line Details Instead of maximising the log-likelihood via a numerical optimiser we have used a Newton-Raphson algorithm which is faster. See wikipedia for the equations to be solved. For the t distribution we need the degrees of freedom and estimate the location and scatter parameters. If you want to to fit an inverse gamma distribution simply do "gamma.mle(1/x)". The log-likelihood and the parameters are for the inverse gamma. The "normlog.mle" is simply the normal distribution where all values are positive. Note, this is not log-normal. It is the normal with a log link. Similarly to the inverse gaussian distribution where the mean is an exponentiated. This comes from the GLM theory. Value Usually a list with three elements, but this is not for all cases. iters The number of iterations required for the Newton-Raphson to converge. loglik The value of the maximised log-likelihood. param The vector of the parameters. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Kalimuthu Krishnamoorthy, Meesook Lee and Wang Xiao (2015). Likelihood ratio tests for comparing several gamma distributions. Environmetrics, 26(8):571-583. N.L. Johnson, S. Kotz \& N. Balakrishnan (1994). Continuous Univariate Distributions, Volume 1 (2nd Edition). N.L. Johnson, S. Kotz \& N. Balakrishnan (1970). Distributions in statistics: continuous univariate distributions, Volume 2 Tsagris M., Beneki C. and Hassani H. (2014). On the folded normal distribution. Mathematics, 2(1):12-28. Sharma V. K., Singh S. K., Singh U. \& Agiwal V. (2015). The inverse Lindley distribution: a stress-strength reliability model with application to head and neck cancer data. Journal of Industrial and Production Engineering, 32(3): 162-173. You can also check the relevant wikipedia pages for these distributions. See Also zip.mle, normal.mle, beta.mle MLE of continuous univariate distributions defined on the real line 171 Examples x <- rgamma(100, 3, 4) system.time( for (i in 1:20) gammamle(x) ) ## system.time( for (i in 1:20) fitdistr(x,"gamma") ) a <- glm(x ~ 1, gaussian(log) ) normlog.mle(x) MLE of continuous univariate distributions defined on the real line MLE of continuous univariate distributions defined on the real line Description MLE of continuous univariate distributions defined on the real line. Usage normal.mle(x) gumbel.mle(x, tol = 1e-09) cauchy.mle(x, tol = 1e-09) logistic.mle(x, tol = 1e-07) ct.mle(x, tol = 1e-09) tmle(x, v = 5, tol = 1e-08) wigner.mle(x, tol = 1e-09) laplace.mle(x) Arguments x A numerical vector with data. v The degrees of freedom of the t distribution. tol The tolerance level up to which the maximisation stops set to 1e-09 by default. Details Instead of maximising the log-likelihood via a numerical optimiser we have used a Newton-Raphson algorithm which is faster. See wikipedia for the equation to be solved. For the t distribution we need the degrees of freedom and estimate the location and scatter parameters. The Cauchy is the t distribution with 1 degree of freedom. If you want to fit such a distribution used the cauchy.mle and not the t.mle with 1 degree of freedom as it’s faster. The Laplace distribution is also called double exponential distribution. The wigner.mle refers to the wigner semicircle distribution. 172 MLE of count data (univariate discrete distributions) Value Usually a list with three elements, but this is not for all cases. iters The number of iterations required for the Newton-Raphson to converge. loglik The value of the maximised log-likelihood. param The vector of the parameters. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Johnson, Norman L. Kemp, Adrianne W. Kotz, Samuel (2005). Univariate Discrete Distributions (third edition). Hoboken, NJ: Wiley-Interscience. https://en.wikipedia.org/wiki/Wigner_semicircle_distribution See Also zip.mle, gammamle, vm.mle Examples x <- rt(1000,10) a <- ct.mle(x) tmle(x, v = a$nu) cauchy.mle(x) normal.mle(x) logistic.mle(x) gumbel.mle(x) MLE of count data (univariate discrete distributions) MLE of count data Description MLE of count data. MLE of count data (univariate discrete distributions) 173 Usage zip.mle(x, tol = 1e-09) ztp.mle(x, tol = 1e-09) negbin.mle(x, type = 1, tol = 1e-09) binom.mle(x, N = NULL, tol = 1e-07) borel.mle(x) geom.mle(x, type = 1) logseries.mle(x, tol = 1e-09) poisson.mle(x) Arguments x A vector with discrete valued data. type This argument is for the negative binomial and the geometric distribution. In the negative binomial you can choose which way your prefer. Type 1 is for smal sample sizes, whereas type 2 is for larger ones as is faster. For the geometric it is related to its two forms. Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. N This is for the binomial distribution only, specifying the total number of successes. If NULL, it is sestimated by the data. tol The tolerance level up to which the maximisation stops set to 1e-09 by default. Details Instead of maximising the log-likelihood via a numerical optimiser we used a Newton-Raphson algorithm which is faster. See wikipedia for the equation to be solved in the case of the zero inflated distribution. https://en.wikipedia.org/wiki/Zeroinflated_model. In order to avoid negative values we have used link functions, log for the lambda and logit for the π as suggested by Lambert (1992). As for the zero truncated Poisson see https://en.wikipedia.org/wiki/Zerotruncated_Poisson_distribution. zip.mle is for the zero inflated Poisson, whereas ztp.mle is for the zero truncated Poisson distribution. Value A list including: mess This is for the negbin.mle only. If there is no reason to use the negative binomial distribution a message will appear, otherwise this is NULL. iters The number of iterations required for the Newton-Raphson to converge. loglik The value of the maximised log-likelihood. prob The probability parameter of the distribution. In some distributions this argument might have a different name. For example, param in the zero inflated Poisson. 174 MLE of distributions defined in the (0, 1) interval Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Lambert Diane (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics. 34 (1): 1-14 Johnson Norman L., Kotz Samuel and Kemp Adrienne W. (1992). Univariate Discrete Distributions (2nd ed.). Wiley See Also poisson_only, colrange Examples x <- rpois(100, 2) zip.mle(x) poisson.mle(x) ## small difference in the two log-likelihoods as expected. x <- rpois(100, 10) x[x == 0 ] <- 1 ztp.mle(x) poisson.mle(x) ## significant difference in the two log-likelihoods. x <- rnbinom(100, 10, 0.6) poisson.mle(x) negbin.mle(x) MLE of distributions defined in the (0, 1) interval MLE of distributions defined in the (0, 1) interval Description MLE of distributions defined in the (0, 1) interval. Usage beta.mle(x, tol = 1e-09) ibeta.mle(x, tol = 1e-09) logitnorm.mle(x) hsecant01.mle(x, tol = 1e-09) MLE of distributions defined in the (0, 1) interval 175 Arguments x tol A numerical vector with proportions, i.e. numbers in (0, 1) (zeros and ones are not allowed). The tolerance level up to which the maximisation stops. Details Maximum likelihood estimation of the parameters of the beta distribution is performed via NewtonRaphson. The distributions and hence the functions does not accept zeros. "logitnorm.mle" fits the logistic normal, hence no nwewton-Raphson is required and the "hypersecant01.mle" uses the golden ratio search as is it faster than the Newton-Raphson (less calculations) Value A list including: iters loglik param The number of iterations required by the Newton-Raphson. The value of the log-likelihood. The estimated parameters. In the case of "hypersecant01.mle" this is called "theta" as there is only one parameter. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> See Also diri.nr2, Examples x <- rbeta(1000, 1, 4) system.time( for(i in 1:1000) beta.mle(x) ) beta.mle(x) ibeta.mle(x) x <- runif(1000) hsecant01.mle(x) logitnorm.mle(x) ibeta.mle(x) x <- rbeta(1000, 2, 5) x[sample(1:1000, 50)] <- 0 ibeta.mle(x) 176 MLE of some circular distributions MLE of some circular distributions MLE of some circular distributions Description MLE of some circular distributions. Usage vm.mle(x, tol = 1e-09) spml.mle(x, tol = 1e-09, maxiters = 100) wrapcauchy.mle(x, tol = 1e-09) Arguments x tol maxiters A numerical vector with the circular data. They must be expressed in radians. For the "spml.mle" this can also be a matrix with two columns, the cosinus and the sinus of the circular data. The tolerance level to stop the iterative process of finding the MLEs. The maximum number of iterations to implement. Details The parameters of the von Mises, the bivariate angular Gaussian and wrapped Cauchy distributions are estimated. For the Wrapped Cauchy, the iterative procedure described by Kent and Tyler (1988) is used. As for the von Mises distribution, we use a Newton-Raphson to estimate the concentration parameter. The angular Gaussian is described, in the regression setting in Presnell et al. (1998). Value A list including: iters loglik param gamma mu The iterations required until convergence. This is returned in the wrapped Cauchy distribution only. The value of the maximised log-likelihood. A vector consisting of the estimates of the two parameters, the mean direction for both distributions and the concentration parameter kappa and the rho for the von Mises and wrapped Cauchy respectively. The norm of the mean vector of the angualr Gaussian distribution. The mean vector of the angular Gaussian distribution. Author(s) Michail Tsagris and Stefanos Fafalios R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Stefanos Fafalios <stefanosfafalios@gmail.com> MLE of the inverted Dirichlet distribution 177 References Mardia K. V. and Jupp P. E. (2000). Directional statistics. Chicester: John Wiley \& Sons. Sra S. (2012). A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x). Computational Statistics, 27(1): 177-190. Presnell Brett, Morrison Scott P. and Littell Ramon C. (1998). Projected multivariate linear models for directional data. Journal of the American Statistical Association, 93(443): 1068-1077. Kent J. and Tyler D. (1988). Maximum likelihood estimation for the wrapped Cauchy distribution. Journal of Applied Statistics, 15(2): 247–254. See Also vmf.mle, rvonmises, rvmf Examples y <- rcauchy(100, 3, 1) x <- y vm.mle(x) spml.mle(x) wrapcauchy.mle(x) x <- NULL MLE of the inverted Dirichlet distribution MLE of the inverted Dirichlet distribution Description MLE of the inverted Dirichlet distribution. Usage invdir.mle(x, tol = 1e-09) Arguments x A matrix with strictly positive data (no zeros are allowed). tol The tolerance level up to which the maximisation stops. Details Maximum likelihood estimation of the parameters of the inverted is performed via Newton-Raphson. We took the initial values suggested by Bdiri T. and Bouguila N. (2012) and modified them a bit. 178 MLE of the multivariate normal distribution Value A list including: iters The number of iterations required by the Newton Raphson. loglik The value of the log-likelihood. param The estimated parameters. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Bdiri T. and Bouguila N. (2012). Positive vectors clustering using inverted Dirichlet finite mixture models. Expert Systems with Applications, 39(2): 1869-1882. See Also diri.nr2, multinom.mle Examples x <- as.matrix(iris[, 1:4]) system.time( for(i in 1:100) invdir.mle(x) ) invdir.mle(x) MLE of the multivariate normal distribution MLE of the multivariate normal distribution Description MLE of the multivariate normal distribution. Usage mvnorm.mle(x) Arguments x A matrix with numerical data. Details The mean vector, covariance matrix and the value of the log-likelihood is calculated. MLE of the ordinal model without covariates 179 Value A list including: iters The number of iterations required for the Newton-Raphson to converge. loglik The value of the maximised log-likelihood. param The vector of the parameters for the zero inflated Poisson. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Johnson Norman L., Kotz Samuel and Balakrishnan (1997). Discrete Multivariate Distributions. Wiley See Also multinom.mle, dmvnorm, gaussian.nb Examples x <- matrnorm(100, 4) mvnorm.mle(x) x <- NULL MLE of the ordinal model without covariates Natural logarithm of the beta functionMLE of the ordinal model without covariates Description MLE of the ordinal model without covariates. Usage ordinal.mle(y, link = "logit") Arguments y A numerical vector with values 1, 2, 3,..., not zeros, or an ordered factor. link This can either be "logit" or "probit". It is the link function to be used. 180 MLE of the tobit model Details Maximum likelihood of the ordinal model (proportional odds) is implemented. See for example the "polr" command in R or the examples. Value A list including: loglik The log-likelihood of the model. a The intercepts (threshold coefficients) of the model. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. References Agresti, A. (2002) Categorical Data. Second edition. Wiley. See Also beta.mle, diri.nr2 Examples y <- factor( rbinom(100,3,0.5), ordered = TRUE ) ordinal.mle(y) ordinal.mle(y, link = "probit") MLE of the tobit model MLE of the tobit model Description MLE of the tobit model. Usage tobit.mle(y, tol = 1e-09) Arguments y A vector with positive valued data and zero values. If there are no zero values, a simple normal model is fitted in the end. tol The tolerance level up to which the maximisation stops; set to 1e-09 by default. Moment and maximum likelihood estimation of variance components 181 Details The tobin model is useful for (univariate) positive data with left censoring at zero. There is the assumption of a latent variable. Tthe values of that variable which are positive concide with the observed values. If some values are negative, they are left censored and the observed values are zero. Instead of maximising the log-likelihood via a numerical optimiser we have used a NewtonRaphson algorithm which is faster. Value A list with three elements including iters The number of iterations required for the Newton-Raphson to converge. loglik The value of the maximised log-likelihood. param The vector of the parameters. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Tobin James (1958). Estimation of relationships for limited dependent variables. Econometrica. 26(1):24–36. https://en.wikipedia.org/wiki/Tobit_model See Also gammamle, normal.mle Examples x <- rnorm(300, 3, 5) x[ x < 0 ] <- 0 ## left censoring. Values below zero become zero system.time( for (i in 1:100) tobit.mle(x) ) Moment and maximum likelihood estimation of variance components Moment and maximum likelihood estimation of variance components Description Moment and maximum likelihood estimation of variance components. 182 Moment and maximum likelihood estimation of variance components Usage rint.mle(x, ina, ranef = FALSE, tol = 1e-09, maxiters = 100) varcomps.mom(x, ina) varcomps.mle(x, ina, tol = 1e-09) Arguments x A numerical vector with the data. ranef Should the random effects be returned as well? The default value is FALSE. ina A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Be careful, the function is desinged to accept numbers greater than zero. Alternatively it can be a factor variable. tol The tolerance level to terminate the golden ratio search. the default value is 10^(-9). maxiters The maximum number of iterations Newton-Raphson will implement. Details Note that the "varcomps.mle" and "varcomp.mom" work for balanced designs only, i.e. for each subject the same number of measurements have been taken. The "rint.mle" works for both the balanced and unbalanced designs. The variance components, the variance of the between measurements and the variance of the within are estimated using moment estimators. The "colvarcomsp.mom" is the moment analogue of a random effects model which uses likelihood estimation ("colvarcomps.mle"). It is much faster, but can give negative variance of the random effects, in which case it becomes zero. The maximum likelihood version is a bit slower (try youselves to see the difference), but statistically speaking is to be preferred when small samples are available. The reason why it is only a little bit slower and not a lot slower as one would imagine is because we are using a closed formula to calculate the two variance components (Demidenko, 2013, pg. 67-69). Yes, there are closed formulas for linear mixed models. Value For the "varcomps.mom": A vector with 5 elemets, The MSE, the estimate of the between variance, the variance components ratio and a 95% confidence for the ratio. For the "varcomps.mle": a list with a single component called "info". That is a matrix with 3 columns, The MSE, the estimate of the between variance and the log-likelihood value. If ranef = TRUE a list including "info" and an extra component called "ranef" containing the random effects. It is a matrix with the same number of columns as the data. Each column contains the randome effects of each variable. Author(s) Michail Tsagris and Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Multi-sample tests for vectors 183 References D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons. Charles S. Davis (2002). Statistical methods for the analysis of repeated measures. New York: Springer-Verlag. Demidenko E. (2013). Mixed Models: Theory and Applications with R 2nd Edition). New Jersey: John Wiley \& Sons (Excellent book). See Also colvarcomps.mle, rint.reg, rint.regbx Examples ## example from Montgomery, pages 514-517 x <- c(98,97,99,96,91,90,93,92,96,95,97,95,95,96,99,98) ina <- rep(1:4, each = 4) varcomps.mom(x, ina) varcomps.mle(x, ina) Multi-sample tests for vectors Multi-sample tests for vectors Description Multi-sample tests for vectors. Usage ftest(x, ina, logged = FALSE) anova1(x, ina, logged = FALSE) kruskaltest(x, ina, logged = FALSE) var2test(x, y, alternative = "unequal", logged = FALSE) mcnemar(x, y, logged = FALSE) ttest2(x, y, paired = FALSE, logged = FALSE) cqtest(x, treat, block, logged = FALSE) block.anova(x, treat, block, logged = FALSE) twoway.anova(y, x1, x2, interact = FALSE, logged = FALSE) Arguments x A numerical vector with the data. y A numerical vector with the data. ina A numerical vector with 1s, 2s, 3s and so one indicating the two groups. Be careful, the function is desinged to accept numbers greater than zero. Alternatively it can be a factor variable. 184 Multi-sample tests for vectors paired This is for the two sample t-test only and is TRUE or FALSE specifying whether the two samples are paired or not. alternative This can either be "unequal", "greater" or "less". treat In the case of the blocking ANOVA and Cochran’s Q test, this argument plays the role of the "ina" argument. block This item (in the blocking ANOVA and Cochran’s Q test) denotes the subjects which are the same. Similarly to "ina" a numeric vector with 1s, 2s, 3s and so on. x1 The first factor in the two way ANOVA. x2 The second factor in the two way ANOVA. The orderis not important. interact Should interaction in the two way ANOVA be included? The default value is FALSE (no interaction). logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details The Welch’s F-test (without assuming equal variances) is performed with the "ftest" function. The "anova" function perform the classical (Fisher’s) one-way analysis of variance (ANOVA) which assumes equal variance across the groups. The "kruskaltest" performs the Kruskal-Wallis non parametric alternative to analysis of variance test. The "var2tests" implement the classical F test for the equality of two sample variances. The "cqtest" performs the Cocrhan’s Q test for the equality of more than two groups whose values are strictly binary (0 or 1). This is a generalisation of the McNemar’s test in the multi-sample case. The "block.anova" is the ANOVA with blocking, randomised complete block design (RCBD). In this case, for every combination of the block and treatment values, there is only one observation. The mathematics are the same as in the case of "twoway.anova", but the assumptions different and the testing procedure also different. In addition, no interaction is present. Value A vector with the test statistic and the p-value of each test. For the case of t-test, an extra column with the degrees of freedom is given. For the two way ANOVA there can can be either 2 or three F test statistics and hence the same number of p-values. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References B.L. Welch (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3/4), 330-336. D.C. Montgomery (2001). Design and analysis of experiments (5th Edition). New York: John Wiley \& Sons. 185 Multinomial regression McNemar Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 12(2):153-157. See Also ttests, ftests Examples x <- rnorm(200) ina <- rbinom(200, 3, 0.5) + 1 anova1(x, ina) ftest(x, ina) ina <- rbinom(200, 1, 0.5) + 1 x1 <- x[ ina == 1 ] ; x2 <- x[ ina == 2 ] ttest2(x1, x2) var2test(x1, x2) ## RCBD example 4.1 from Montgomery (2001), page 131-132 x <- c(9.3, 9.4, 9.2, 9.7, 9.4, 9.3, 9.4, 9.6, 9.6, 9.8, 9.5, 10, 10, 9.9, 9.7, 10.2) tr <- rep(1:4, 4) bl <- rep(1:4, each = 4) block.anova(x, tr, bl) Multinomial regression Multinomial regression Description Multinomial regression. Usage multinom.reg(y, x, tol = 1e-07, maxiters = 50) Arguments y The response variable. A numerical or a factor type vector. x A matrix or a data.frame with the predictor variables. tol This tolerance value to terminate the Newton-Raphson algorithm. maxiters The maximum number of iterations Newton-Raphson will perform. 186 Multivariate kurtosis Value A list including: iters The number of iterations required by the Newton-Raphson. loglik The value of the maximised log-likelihood. be A matrix with the estimated regression coefficients. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Bohning, D. (1992). Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics, 44(1): 197-200. See Also glm_logistic, score.multinomregs logistic_only Examples y <- iris[, 5] x <- matrnorm(150, 3) multinom.reg(y, x) Multivariate kurtosis Multivariate kurtosis Description Multivariate kurtosis. Usage mvkurtosis(x) Arguments x A numerical matrix. Details The multivariate kurtosis is calcualted. Multivariate Laplace random values simulation 187 Value A number, the multivariate kurtosis. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr>. References K. V. Mardia (1970). Measures of Multivariate Skewness and Kurtosis with Applications Biometrika, 57(3):519-530. See Also colskewness, skew.test2, colmeans, colVars, colMedians Examples x <- as.matrix(iris[, 1:4]) mvkurtosis(x) Multivariate Laplace random values simulation Multivariate Laplace random values simulation Description Multivariate Laplace random values simulation. Usage rmvlaplace(n, lam, mu, G) Arguments n The sample size, a numerical value. lam The the parameter of the exponential distribution, a positive number. mu The mean vector. G A d × d covariance matrix with determinant 1. Details The algorithm uses univariate normal random values and transforms them to multivariate via a spectral decomposition. 188 Multivariate normal and t random values simulation Value A matrix with the simulated data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Eltoft T., Kim T., and Lee T.W. (2006). On the multivariate laplace distribution. Signal Processing Letters, IEEE, 13(5):300-303. See Also rmvnorm, racg, rmvt Examples m <- colmeans( as.matrix( iris[, 1:4] ) ) s <- cov(iris[,1:4]) s <- s / det(s)^0.25 lam <- 3 x <- rmvlaplace(100, lam, m, s) Multivariate normal and t random values simulation Multivariate normal and t random values simulation Description Multivariate normal and t random values simulation. Usage rmvnorm(n, mu, sigma) rmvt(n, mu, sigma, v) Arguments n The sample size, a numerical value. mu The mean vector in Rd . sigma The covariance matrix in Rd . v The degrees of freedom. Naive Bayes classifiers 189 Details The algorithm uses univariate normal random values and transforms them to multivariate via a spectral decomposition. It is faster than the command "mvrnorm" available from MASS, and it allows for singular covariance matrices. Value A matrix with the simulated data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Aitchison J. (1986). The statistical analysis of compositional data. Chapman & Hall. See Also racg, rmvlaplace, rmvt Examples x <- as.matrix(iris[, 1:4]) m <- colmeans(x) s <- cov(x) y <- rmvnorm(1000, m, s) colmeans(y) cov(y) y <- NULL Naive Bayes classifiers Naive Bayes classifiers Description Gaussian, Poisson, geometric and multinomial naive Bayes classifiers. Usage gaussian.nb(xnew = NULL, x, ina) poisson.nb(xnew, x, ina) multinom.nb(xnew, x, ina) geom.nb(xnew, x, ina, type = 1) gammanb(xnew = NULL, x, ina, tol = 1e-07) 190 Naive Bayes classifiers Arguments xnew A numerical matrix with new predictor variables whose group is to be predicted. For the Gaussian naive Bayes, this is set to NUUL, as you might want just the model and not to predict the membership of new observations. For the Gaussian case this contains any numbers, but for the multinomial and Poisson cases, the matrix must contain integer valued numbers only. x A numerical matrix with the observed predictor variable values. For the Gaussian case this contains any numbers, but for the multinomial and Poisson cases, the matrix must contain integer valued numbers only. ina A numerical vector with strictly positive numbers, i.e. 1,2,3 indicating the groups of the dataset. Alternatively this can be a factor variable. type Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. This is for the geometric distribution. This argument is for the geometric distribution. Type 1 refers to the case where the minimum is zero and type 2 for the case of the minimum being 1. tol The tolerance value to terminate the Newton-Raphson algorithm in the gamma distribution. Value For the Poisson and Multinomial naive Bayes classifiers the estimated group, a numerical vector with 1, 2, 3 and so on. For the Gaussian naive Bayes classifier a list including: mu A matrix with the mean vector of each group based on the dataset. sigma A matrix with the variance of each group and variable based on the dataset. ni The sample size of each group in the dataset. est The estimated group of the xnew observations. It returns a numerical value back regardless of the target variable being numerical as well or factor. Hence, it is suggested that you do \"as.numeric(target)\" in order to see what is the predicted class of the new data. For the Gamma classifier a list including: a A matrix with the shape parameters. b A matrix with the scale parameters. est The estimated group of the xnew observations. It returns a numerical value back regardless of the target variable being numerical as well or factor. Hence, it is suggested that you do \"as.numeric(target)\" in order to see what is the predicted class of the new data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Natural Logarithm each element of a matrix 191 See Also colmeans, colVars Examples x <- as.matrix(iris[, 1:4]) a <- gaussian.nb(x, x, iris[, 5]) x1 <- matrix( rpois(100 * 4, 5), ncol = 4) x2 <- matrix( rpois(50 * 4, 10), ncol = 4) x <- rbind(x1, x2) ina <- c( rep(1, 100), rep(2, 50) ) poisson.nb(x, x, ina) geom.nb(x, x, ina) multinom.nb(x, x, ina) Natural Logarithm each element of a matrix Natural Logarithm each element of a matrix Description Natural Logarithm each element of a matrix. Usage Log(x, na.rm = FALSE) Arguments x A matrix with data. na.rm A boolean value (TRUE/FALSE) for removing NA. Details The argument must be a matrix. For vector the time was the same as R’s "log" function so we did not add it. Value A matrix where each element is the natural logarithm of the given argument. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. 192 Natural logarithm of the beta function See Also Lbeta, Lchoose, Choose Examples x <-matrix( runif( 100 * 100), ncol = 100 ) a <- log(x) b <- Log(x) all.equal(a, b) # true x<-a<-b<-NULL Natural logarithm of the beta function Natural logarithm of the beta function Description Natural logarithm of the beta function. Usage Lbeta(x, y) Arguments x A numerical matrix, or a vector or just a number with positive numbers in either case. y A numerical matrix, or a vector or just a number with positive numbers in either case. The dimensions of y must match those of x. Details The function is faster than R’s lbeta when the dimensions of x any are large. If you have only two numbers, then lbeta is faster. But if you have for example two vectors of 1000 values each, Lbeta becomes two times faster than lbeta. Value The matrix, vector or number with the resulting values. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. Natural logarithm of the gamma function and its derivatives 193 References Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New York: Dover. https://en.wikipedia.org/wiki/Abramowitz_and_Stegun provides links to the full text which is in public domain. Chapter 6: Gamma and Related Functions. See Also Lgamma, beta.mle, diri.nr2 Examples x <- rexp(1000) y <- rexp(1000) a1 <- Lbeta(x, y) x<-y<-a1<-NULL Natural logarithm of the gamma function and its derivatives Natural logarithm of the gamma function and its derivatives. Description Natural logarithm of the gamma function and its derivatives. Usage Lgamma(x) Digamma(x) Trigamma(x) Arguments x A numerical matrix or vector with positive numbers in either case. Details We have spotted that the time savings come when there are more than 50 elements, with vector or matrix. Value The matrix or the vector with the resulting values. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. 194 Norm of a matrix References Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New York: Dover. https://en.wikipedia.org/wiki/Abramowitz_and_Stegun provides links to the full text which is in public domain. Chapter 6: Gamma and Related Functions. See Also beta.mle, diri.nr2 Examples x <- matrix( rnorm(500 * 500), ncol = 500 ) a1 <- Lgamma(x) a2 <- lgamma(x) all.equal(as.vector(a1), as.vector(a2)) a1 <- Digamma(x) a2 <- digamma(x) all.equal(as.vector(a1), as.vector(a2)) x<-a1<-a2<-NULL Norm of a matrix Norm of a matrix Description Norm of a matrix. Usage Norm(x, type = "F") Arguments x type A matrix with numbers. The type of norm to be calculated. The default is "F" standing for Frobenius norm ("f" in R’s norm). The other options are "C" standing for the one norm ("o" in R’s norm), "R" for the identiy norm ("I" in R’s norm) and "M" for the maximum modulus among elements of a matrix ("M" in R’s norm) Value A number, the norm of the matrix. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. Number of equal columns between two matrices 195 See Also Dist, dista, colmeans Examples x <- matrix( rnorm(10 * 10), ncol = 10 ) Norm(x, "F") norm(x, "F") Norm(x, "M") norm(x, "M") Number of equal columns between two matrices Number of equal columns between two matrices Description Number of equal columns between two matrices. Usage mat.mat(x, y) Arguments x A numerical matrix. See details for more information. It must have the same number of rows as y. y A numerical matrix. See details for more information. It must have the same number of rows as x. Details The function takes each column of x and checks the number of times it matches a column of y. In the example below, we take the first 3 columns of iris as the x matrix. The y matrix is the whole of iris. We will see how many times, each column of x appears in the y matrix. The answer is 1 for each column. Value A numerical vector of size equal to the number of columns of x. Author(s) Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. 196 Odds ratio and relative risk See Also Match, colmeans, colMedians Examples x <- as.matrix(iris[, 1:3]) y <- iris y[, 5] <- as.numeric(y[, 5]) y <- as.matrix(y) mat.mat(x, y) x<-y<-NULL Odds ratio and relative risk Odds ratio and relative risk Description Odds ratio and relative risk. Usage odds.ratio(x, a = 0.05, logged = FALSE) rel.risk(x, a = 0.05, logged = FALSE) Arguments x A 2 x 2 matrix or a vector with 4 elements. In the case of the vector make sure it corresponds to the correct table. a The significance level, set to 0.05 by default. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details The odds ratio and the confidence interval are calculated. Value A list including: res The estimated odds ratio and the p-value for the null hypothesis test that it is equal to 1. ci The (1-a)% confidence interval for the true value of the odds ratio. One sample empirical and exponential empirical likelihood test 197 Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Giorgos Athineou <athineou@csd.uoc.gr> References Mosteller Frederick (1968). Association and Estimation in Contingency Tables. Journal of the American Statistical Association. 63(321):1-28. Edwards A.W.F. (1963). The measure of association in a 2x2 table. Journal of the Royal Statistical Society, Series A. 126(1):109-114. See Also odds, g2Test Examples x <- rpois(4, 30)+2 odds.ratio(x) odds.ratio( matrix(x, ncol = 2) ) One sample empirical and exponential empirical likelihood test One sample exponential empirical likelihood test Description One sample exponential empirical likelihood test. Usage eel.test1(x, mu, tol = 1e-09, logged = FALSE) el.test1(x, mu, tol = 1e-07, logged = FALSE) Arguments x A numerical vector. mu The hypothesised mean value. tol The tolerance value to stop the iterations of the Newton-Raphson. logged Should the logarithm of the p-value be returned? TRUE or FALSE. 198 One sample t-test for a vector Details Exponential empirical likelihood is a non parametric method. In this case we use it as the non parametric alternative to the t-test. Newton-Raphson is used to maximise the log-likelihood ratio test statistic. In the case of no solution, NULL is returned. Despite the function having beeen written in R, it is pretty fast. As for the empirical likelihood ratio test, there is a condition for the range of possible values of mu. If mu is outside this range it is rejected immediately. Value iters The number of iterations required by the Newton-Raphson algorithm. If no covnergence occured this is NULL. This is not returned for the empircial likelihood ratio test. info A vector with three elements, the value of the λ, the likelihood ratio test statistic and the relevant p-value. If no convergence occured, the value of the λ before is becomes NA, the value of test statistic is 105 and the p-value is 0. No convergence can be interpreted as rejection of the hypothesis test. p The estimated probabilities, one for each observation. If no covnergence occured this is NULL. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press. See Also ftest, ttest1 Examples x <- rnorm(500) system.time(a1 <- eel.test1(x, 0) ) system.time(a2 <- el.test1(x, 0) ) One sample t-test for a vector One sample t-test for a vector Description One sample t-test for a vector. One sample t-test for a vector 199 Usage ttest1(x, m, alternative = "unequal", logged = FALSE, conf = NULL) Arguments x A numerical vector with the data. m The mean value under the null hypothesis. alternative The alternative hypothesis, "unequal", "greater" or "less". logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? conf If you want a confidence interval supply the confidence level. Details The usual one sample t-test is implemented, only faster. Value A list including: res A two valued vector with the test statistic and its (logged) p-value. ci In the case you supplied a number in the input argument "conf" the relevant confidence interval will be returned as well. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> See Also ttest, anova1, ttests Examples x = rnorm(500) t.test(x, mu = 0) ttest1(x, 0, conf = 0.95) 200 Operations between two matrices or matrix and vector Operations between two matrices or matrix and vector Operations between two matrices or matrix and vector Description Operations between two matrices or matrix and vector. Usage XopY.sum(x, y = NULL, oper = "*") eachrow(x,y,oper = "*",method = NULL) eachcol.apply(x,y,indices = NULL,oper = "*",apply = "sum") Arguments x A numerical matrix. y A second numerical matrix for "XopY.sum" whose dimensions must match the ones of x, or vector for "eachrow","eachcol.apply" whose length must match with the rows of x. oper The operation to be performed, either "*", "/", "+" or "-". method A character value for choosing option to apply in the result. Options: 1) sum 2) max 3) min indices An integer vector with indices to specific columns. Only for "eachcol.apply". apply A character value with the function to be applied in the columns of the matrix. Only for "eachcol.apply". Options: 1) sum 2) median 3) max 4) min Details XopY.sum: sum(X op Y) where op can be on of "+,-,*,/". eachrow: X op Y by row or FUNCTION(X op Y) where "x" is matrix, "y" is vector with length as much an the columns of x and "op" is one of "+,-,*,/", and "FUNCTION" is a specific method for applying in the result matrix (see argument method). eachcol.apply: FUNCTION(X op Y) by column where "x" is matrix, "y" is vector with length as much an the rows of x, "op" is one of "+,-,*,/" and "FUNCTION" is a specific method (see argument apply). Value XopY.sum: sum(X op Y) where "op" can be on of "+,-,*,/". eachrow: operation by row between a matrix and a vector."op" can be on of "+,-,*,/". If "suma=TRUE" then returns the sum of this operation. eachcol.apply: operation by column between a matrix and a vector and applied a specific function."op" can be on of "+,-,*,/". Orthogonal matching pursuit regression 201 Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Dist, dista, colmeans, Diag.fill,colMads, rowMads Examples x <- matrix( rnorm(5 * 5), ncol = 5 ) y <- matrix( rnorm(5 * 5), ncol = 5 ) XopY.sum(x, y, oper = "*") y <- x[,1] eachrow(x,y) all.equal(eachcol.apply(x,y),colsums(x*y)) x<-y<-NULL Orthogonal matching pursuit regression Orthogonal matching pursuit regression Description Orthogonal matching pursuit regression. Usage ompr(y, x, method = "BIC", tol = 2 ) omp(y, x, tol = qchisq(0.95, 1) + log( length(y) ), type = "logistic" ) Arguments y x method tol The response variable, a numeric vector. For "ompr" this is a continuous variable. For "omp" this can be either a vector with discrete (count) data, 0 and 1, non negative values, strictly positive or proportions including 0 and 1. A matrix with the data, where the rows denote the samples and the columns are the variables. You can choose between the change in the BIC ("BIC"), the adjusted R2 or the SSE ("SSE"). The tolerance value to terminate the algorithm. This is the change in the criterion value between two successive steps. For "ompr" the default value is 2 because the default method is "BIC". For "omp" the default value is the 95% quantile of the χ2 distribution with 1 degree of freedom plus the logarithm of the sample size. 202 Permutation type This denotes the parametric model to be used each time. It depends upon the nature of y. The possible values are "logistic", "poisson", "quasipoisson", "quasibinomial", "normlog", "weibull", or "mv" (for multivariate response variable). Value A matrix with two columns. The selected variable(s) and the criterion value at every step. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr>. References Pati Y. C., Rezaiifar R. & Krishnaprasad P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Signals, Systems and Computers. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on. IEEE. Mazin Abdulrasool Hameed (2012). Comparative analysis of orthogonal matching pursuit and least angle regression. MSc thesis, Michigan State University. https://www.google.gr/url?sa=t&rct=j&q=&esrc=s&source=web&c Lozano A., Swirszcz G., & Abe N. (2011). Group orthogonal matching pursuit for logistic regression. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. See Also cor.fbed, cor.fsreg, correls, fs.reg Examples x y a a x <- matrnorm(100, 400) <- rnorm(100) <- ompr(y, x) <- NULL Permutation Permutation Description Permute the given vector. Usage permutation(x, nperm = gamma(length(x)+1)) permutation.next(x, nperm = gamma(length(x)+1)) permutation.prev(x, nperm = gamma(length(x)+1)) bincomb(n) Permutation based p-value for the Pearson correlation coefficient 203 Arguments x A numeric vector with data. nperm An integer value for returning specific number of combinations. By defualt is set to all combinations. Must be between 0<=nperm<=gamma(length(x)+1) n An integer value for the length of the binary number. Details This function implements "Permutation", which means all the possible combinations. In the permutation.next and permutation.prev if there aren’t possible combinations it returns the same vector. "Binary Combinations" for "bincomb", means all the possible combinations for the binary number with length "n". Value Returns a matrix with all possible combinations of the given vector or a matrix row with one possible combinations. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> See Also combn,comb_n Examples y b b b g <<<<<- rnorm(3) permutation(y) permutation.next(y) permutation.prev(y) bincomb(3) Permutation based p-value for the Pearson correlation coefficient Permutation based p-value for the Pearson correlation coefficient Description Permutation based p-value for the Pearson correlation coefficient. Usage permcor(x, y, R = 999) 204 Prediction with some naive Bayes classifiers Arguments x A numerical vector with the first variable. y A numerical vector with the second variable. R The number of permutations to be conducted; set to 999 by default. Details This is a very low computational calculation of the p-value. Try it yourselves. Value A vector consisting of two values, the Pearson correlation and the permutation based p-value. Author(s) Marios Dimitriadis and Michail Tsagris R implementation and documentation: Marios Dimitriadis and Michail Tsagris <kmdimitriadis@gmail.com> and <mtsagris@csd.uoc.gr> See Also pc.skel Examples x <- iris[, 1] y <- iris[, 2] permcor(x, y) permcor(x, y, R = 9999) Prediction with some naive Bayes classifiers Prediction with some naive Bayes classifiers Description Prediction with some naive Bayes classifiers. Usage gaussiannb.pred(xnew, m, s, ni) poissonnb.pred(xnew, m) multinomnb.pred(xnew, m) gammanb.pred(xnew, a, b) geomnb.pred(xnew, prob) Quasi binomial regression for proportions 205 Arguments xnew A numerical matrix with new predictor variables whose group is to be predicted. For the Gaussian case this contains any numbers, but for the multinomial and Poisson cases, the matrix must contain integer valued numbers only. m A matrix with the group means. Each row corresponds to a group. s A matrix with the group colum-wise variances. Each row corresponds to a group. ni A vector with the frequencies of each group. a A vector with the shape parameters of each group. b A vector with the scale parameters of each group. prob A vector with the sprobability parameters of each group. Value A numerical vector with 1, 2, ... denoting the predicted group. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also gaussian.nb, colpoisson.mle colVars Examples ina <- sample(1:150, 100) x <- as.matrix(iris[, 1:4]) id <- as.numeric(iris[, 5]) a <- gaussian.nb(xnew = NULL, x[ina, ], id[ina]) est <- gaussiannb.pred(x[-ina, ], a$mu, a$sigma, a$ni) table(id[-ina], est) Quasi binomial regression for proportions Quasi binomial regression for proportions Description Quasi binomial regression for proportions. 206 Quasi binomial regression for proportions Usage prop.reg(y, x, varb = "quasi", tol = 1e-09, maxiters = 100) prop.regs(y, x, varb = "quasi", tol = 1e-09, logged = FALSE, maxiters = 100) Arguments y A numerical vector proportions. 0s and 1s are allowed. x For the "prop.reg" a matrix with data, the predictor variables. This can be a matrix or a data frame. For the "prop.regs" this must be a numerical matrix, where each columns denotes a variable. tol The tolerance value to terminate the Newton-Raphson algorithm. This is set to 10−9 by default. varb The type of estimate to be used in order to estimate the covariance matrix of the regression coefficients. There are two options, either "quasi" (default value) or "glm". See the references for more information. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? maxiters The maximum number of iterations before the Newton-Raphson is terminated automatically. Details We are using the Newton-Raphson, but unlike R’s built-in function "glm" we do no checks and no extra calculations, or whatever. Simply the model. The "prop.regs" is to be used for very many univariate regressions. The "x" is a matrix in this case and the significance of each variable (column of the matrix) is tested. The function accepts binary responses as well (0 or 1). Value For the "prop.reg" function a list including: iters The number of iterations required by the Newton-Raphson. varb The covariance matrix of the regression coefficients. phi The phi parameter is returned if the input argument "varb" was set to "glm", othwerise this is NULL. info A table similar to the one produced by "glm" with the estimated regression coefficients, their standard error, Wald test statistic and p-values. For the "prop.regs" a two-column matrix with the test statistics (Wald statistic) and the associated p-values (or their loggarithm). Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Quasi Poisson regression for count data 207 References Papke L. E. & Wooldridge J. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11(6): 619–632. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also anova_propreg univglms, score.glms, logistic_only Examples y <- rbeta(100, 1, 4) x <- matrix(rnorm(100 * 3), ncol = 3) a <- prop.reg(y, x) y <- rbeta(100, 1, 4) x <- matrix(rnorm(400 * 100), ncol = 400) b <- prop.regs(y, x) mean(b[, 2] < 0.05) Quasi Poisson regression for count data Quasi Poisson regression Description Quasi Poisson regression. Usage qpois.reg(x, y, full = FALSE, tol = 1e-09,maxiters = 100) qpois.regs(x, y, tol = 1e-09, logged = FALSE) Arguments x For the "qpois.reg" a matrix with data, the predictor variables. This can be a matrix or a data frame. For the "qpois.regs" this must be a numerical matrix, where each columns denotes a variable. y A numerical vector with positive discrete data. full If this is FALSE, the coefficients, the deviance and the estimated phi parameter will be returned only. If this is TRUE, more information is returned. tol The tolerance value to terminate the Newton-Raphson algorithm. This is set to 10−9 by default. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? maxiters The maximum number of iterations before the Newton-Raphson is terminated automatically. 208 Quasi Poisson regression for count data Details We are using the Newton-Raphson, but unlike R’s built-in function "glm" we do no checks and no extra calculations, or whatever. Simply the model, unless the user requests for the Wald tests of the coefficients. The "qpois.regs" is to be used for very many univariate regressions. The "x" is a matrix in this case and the significance of each variable (column of the matrix) is tested. Value For the "prop.reg" a list including: When full is FALSE be The regression coefficients. devi The deviance of the model. varb The covariance matrix of the beta coefficients. phi The phi parameter, the estimate of dispersion. When full is TRUE, the additional item is: info The regression coefficients, their standard error, their Wald test statistic and their p-value. For the "prop.regs" a two-column matrix with the test statistics (Wald statistic) and the associated p-values (or their loggarithm). Author(s) Manos Papadakis and Marios Dimitriadis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> and Marios Dimitriadis <kmdimitriadis@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also prop.reg univglms, score.glms, poisson_only Examples y <- rnbinom(100, 10, 0.6) x <- matrix(rnorm(100*3), ncol = 3) mod1 <- glm(y ~ x, quasipoisson) summary(mod1) qpois.reg(x, y, full = TRUE) qpois.regs(x, y) Random intercepts linear mixed models 209 Random intercepts linear mixed models Random intercepts linear mixed models Description Random intercepts linear mixed models (for balanced data with a single identical covariate). Usage rint.reg(y, x, id ,tol = 1e-08, ranef = FALSE, maxiters = 100) rint.regbx(y, x, id) Arguments y x id tol ranef maxiters A numerical vector with the data. The subject values. For the case of "rint.reg" this can be a vector or a numerical matrix with data. In the case of "rint.regbx" this is a numerical vector with the same length as y indicating the fixed predictor variable. Its values are the same for all levels of y. An example of this x is time which is the same for all subjects. A numerical variable with 1, 2, ... indicating the subject. The tolerance level to terminate the generalised elast squares algorithm. If you want to obtain the random effects (random intercepts) set this equal to TRUE. The max number of iterations that can take place in a regression. Details Random intercepts linear mixed models with compound covariance structure is fitted in both functions. The "rint.reg" allows any numerical matrix, with balanced or unbalanced data. See Demidenko (2013, pg. 65-67) for more information. The "rint.regbx" is a special case of a balanced random intercepts model with a compound symmetric covariance matrix and one single covariate which is constant for all replicates. An example, is time, which is the same for all subjects. Maximum likelihood estimation has been performed. In this case the mathematics exist in a closed formula (Demidenko, 2013, pg. 67-69). Value A list including: info be ranef A vector with the random intercepts variance (between), the variance of the errors (within), the log-likelihood, the deviance (twice the log-likelihood) and the BIC. In the case of "rint.reg" it also includes the number of iterations required by the generalised least squares. The estimated regression coefficients, which in the case of "rint.regbx" are simply two: the constant and the slope (time effect). The random intercepts effects. 210 Random values simulation from a von Mises distribution Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Eugene Demidenko (2013). Mixed Models: Theory and Applications with R, 2nd Edition. New Jersey: Wiley \& Sons (excellent book). See Also rm.lines, varcomps.mom, colvarcomps.mom Examples y <- rnorm(100) x <- rnorm(10) x <- rep(x, 10) id <- rep(1:10, each = 10) system.time( for (i in 1:40) a <- rint.reg(y, x, id) ) Random values simulation from a von Mises distribution Random values simulation from a von Mises distribution Description It generates random vectors following the von Mises distribution. The data can be spherical or hyper-spherical. Usage rvonmises(n, m, k, rads = TRUE) Arguments n The sample size. m The mean angle expressed in radians or degrees. k The concentration parameter. If k is zero the sample will be generated from the uniform distribution over (0, 2π). rads If the mean angle is expressed in radians, this should be TRUE and FALSE otherwise. The simulated data will be expressed in radians or degrees depending on what the mean angle is expressed. Ranks of the values of a vector 211 Details The mean direction is transformed to the Euclidean coordinates (i.e. unit vector) and then the fvmf function is employed. It uses a rejection smapling as suggested by Andrew Wood in 1994. I have mentioned the description of the algorithm as I found it in Dhillon and Sra in 2003. Finally, the data are transformed to radians or degrees. Value A vector with the simulated data. Author(s) Michail Tsagris and Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm85@gmail.com> References Wood, A. T. (1994). Simulation of the von Mises Fisher distribution. Communications in statisticssimulation and computation, 23(1): 157-164. Dhillon, I. S., & Sra, S. (2003). Modeling data using directional distributions. Technical Report TR03-06, Department of Computer Sciences, The University of Texas at Austin. http://citeseerx.ist.psu.edu/viewdoc/download?d See Also vm.mle, rvmf Examples x <- rvonmises(1000, 2, 25, rads = TRUE) vm.mle(x) Ranks of the values of a vector Ranks of the values of a vector Description Ranks of the values of a vector. Usage Rank(x,method = "average",descending = FALSE,stable = FALSE) 212 Reading the files of a directory Arguments x A numerical vector with data. method a character string for choosing method. Must be one of "average", "min", "max", "first". descending A boolean value (TRUE/FALSE) for sorting the vector in descending order. By default sorts the vector in ascending. stable A boolean value (TRUE/FALSE) for choosing a stable sort algorithm. Stable means that discriminates on the same elements. Only for the method "first". Details The ranks of the values are returned, the same job as "rank". If you want you can choose descending/ascending order for all methods. Only for method "first" you can choose for stable sorting but choose it if you have many duplicates values. Value A vector with the ranks of the values. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colRanks, correls Examples x <- rnorm(100) a1 <- Rank(x) a2 <- rank(x) Reading the files of a directory Reading the files of a directory Description Reading the files of a directory. Usage read.directory(path.directory) read.examples(path.man,dont.read = "") Reading the files of a directory 213 Arguments path.directory The full path to the directory. For example: \"C:\Users\username\Documents\R\Rfast_1.8.0\R\" path.man The full path to the directory with the Rd files in it. For example: \"C:\Users\username\Documents\R\Rfast dont.read A character vector with the name of the files that you wish not to read. By default it’s empty \"\". Details For function \"read.directory\": Takes as an argument a full path to a directory and returns the names of the files. For function \"read.examples\": Takes as an argument a full path to the directory of the Rd files and the name of the files that shouldn’t read. Value For function \"read.directory\": The names of the files. For function \"read.examples\": a list with 2 fields examples A character vector with the examples of each Rd file. files A character vector with the name of the file that each examples belongs. long_lines A character vector with the name of the file that has large examples. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also AddToNamespace, sourceR, sourceRd, checkRd, checkExamples Examples # for example: path="C:\some_file\" # system.time( read.directory(path) ) # system.time( list.dirs(path) ) # for example: path.man="C:\some_file\man\" # system.time( read.examples(path.man) ) # system.time( read.examples(path.man,dont.read=c("somef_1.Rd",...,"somef_n.Rd") ) ) 214 Repeated measures anova Repeated measures anova Repeated measures anova Description Repeated measures anova. Usage rm.anova(y, logged = FALSE) Arguments y A matrix with the data, where each column refers to a different measurement. The rows denote the subjects. logged Should the p-values be returned (FALSE) or their logarithm (TRUE)? Details Found in Davis (2002) is the usual repeated measures ANOVA. In this case, suppose you have taken measurements on one or more variables from the same group of people. See the example below on how to put such data. Value A vector with the test statistic (t-test) and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Charles S. Davis (2002). Statistical methods for the analysis of repeated measures. Springer-Verlag, New York. See Also rm.anovas, rint.reg, varcomps.mle 215 Replicate columns/rows Examples y <- c(74.5,81.5,83.6,68.6,73.1,79.4, 75.5,84.6,70.6,87.3,73.0,75.0, 68.9,71.6,55.9,61.9,60.5,61.8, 57.0,61.3,54.1,59.2,56.6,58.8, 78.3,84.9,64.0,62.2,60.1,78.7, 54.0,62.8,63.0,58.0,56.0,51.5, 72.5,68.3,67.8,71.5,65.0,67.7, 80.8,89.9,83.2,83.0,85.7,79.6) y <- matrix(y, ncol = 6, byrow = TRUE) rm.anova(y) Replicate columns/rows Replicate columns/rows Description Replicate columns/rows. Usage rep_col(x,n) rep_row(x,n) Arguments x A vector with data. n Number of new columns/rows. Value A matrix where each column/row is equal to "x". Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowFalse, nth, colrange, colMedians, colVars, sort_mat, rowTrue Examples x <- runif(10) all.equal(rep_col(x,10),matrix(x,nrow=length(x),ncol=10)) all.equal(rep_row(x,10),matrix(x,ncol=length(x),nrow=10,byrow=TRUE)) 216 Round each element of a matrix/vector Round each element of a matrix/vector Round each element of a matrix/vector Description Round each element of a matrix/vector. Usage Round(x,digit=0,na.rm = FALSE) Arguments x A numeric matrix/vector with data or NA. NOT integer values. digit An integer value for 0...N-1 where N is the number of the digits. By default is 0. na.rm TRUE or FAlSE for remove NAs if exists. Details Round is a very fast C++ implementation. Especially for large data. It handles NA. Value A vector/matrix where each element is been rounded in the given digit. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Lchoose, Log, Choose Examples x <-matrix( rnorm( 500 * 100), ncol = 100 ) system.time( a <- Round(x,5) ) system.time( b <- round(x,5) ) all.equal(a,b) #true x <-rnorm( 1000) system.time( a <- Round(x,5) ) system.time( b <- round(x,5) ) all.equal(a,b) # true Row - Wise matrix/vector count the frequency of a value 217 Row - Wise matrix/vector count the frequency of a value Row - Wise matrix/vector count the frequency of a value Description Row - Wise matrix/vector count the frequency of a value. Usage count_value(x, value) colCountValues(x, values, parallel = FALSE) rowCountValues(x, values, parallel = FALSE) countNA(x) Arguments x A vector with the data (numeric or character) or a numeric matrix. value The value, numeric or NA, to check its frequency in the vector "x". values a vector with the values to check its frequency in the matrix "x" by row or column. parallel Do you want to do it in parallel in C++? TRUE or FALSE. Works with every other argument. Details The functions is written in C++ in order to be as fast as possible. The "x" and "value" must have the same type. The type can be numeric or character. Value The frequency of a value/values in a vector in linear time or by row/column in a matrix. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also med, binary_search, Order, nth 218 Row-wise minimum and maximum Examples x <- rnorm(100) value <- x[50] system.time( count_value(x,value) ) countNA(x) y <- sample(letters,replace=TRUE) value <- "r" system.time( count_value(y,value) ) values <- sample(x,100,replace=TRUE) x <- matrix(x,100,100) colCountValues(x,values) rowCountValues(x,values) x<-value<-values<-y<-NULL Row-wise minimum and maximum Row-wise minimum and maximum of a matrix. Description Row-wise minimum and maximum of a matrix. Usage rowMins(x, value = FALSE) rowMaxs(x, value = FALSE) rowMinsMaxs(x) Arguments x A numerical matrix with data. value If the value is FALSE it returns the indices of the minimum/maximum, otherwise it returns the minimum and maximum values. Value A vector with the relevant values. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colMins, colMaxs, nth, rowrange colMedians, colVars, sort_mat 219 Row-wise true value Examples x <- matrix( rnorm(500 * 500), ncol = 500 ) system.time( s1 <- rowMins(x) ) system.time( s2 <- apply(x, 1, min) ) system.time( s1 <- rowMaxs(x) ) system.time( s2 <- apply(x, 1, max) ) system.time( s1 <- c(apply(x, 1, min),apply(x, 1, max) )) system.time( s2 <- rowMinsMaxs(x) ) x<-s1<-s2<-NULL Row-wise true value Row-wise true value of a matrix Description Row-wise true value of a matrix. Usage rowTrue(x) rowFalse(x) rowTrueFalse(x) Arguments x A logical matrix with data. Value An integer vector where item "i" is the number of the true/false values of "i" row. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, colFalse, nth, rowrange, rowMedians, rowVars, sort_mat, colTrue 220 Search for variables with zero range in a matrix Examples x <- matrix(as.logical(rbinom(100*100,1,0.5)),100,100) s1 <- rowTrue(x) s1 <- rowFalse(x) s1 <- rowTrueFalse(x) x<-s1<-NULL Search for variables with zero range in a matrix Search for variables with zero range in a matrix Description Search for variables with zero range in a matrix. Usage check_data(x, ina = NULL) Arguments x A matrix or a data.frame with the data, where rows denotes the observations and the columns contain the dependent variables. ina If your data are grouped, for example there is a factor or numerical variable indicating the groups of the data supply it here, otherwise leave it NULL. Details The function identifies the variabels with zero range, instead of a zero variance as this is faster. It will work with matrices and data.frames. Value A numerical vector of length zero if no zero ranged variable exists, or of length at least one with the index (or indices) of the variable(s) that need attention or need to be removed. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression 221 See Also colrange, colVars Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) check_data(x) ## some variables have a constant value x[, c(1,10, 50, 70)] <- 1 check_data(x) id <- rep(1:4, each = 25 ) x[1:25, 2] <- 0 check_data(x) ## did not use the id variable check_data(x, id) ## see now x <- NULL Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression Description Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression. Usage anova_propreg(mod, poia = NULL) anova_qpois.reg(mod, poia = NULL) Arguments mod poia An object as returned by the "prop.reg" or the "qpois.reg" function. If you want to test the significance of a single coefficient this must be a number. In this case, the "prop.reg" or the "qpois.reg" function contains this information. If you want more coefficients to be testes simultaneously, e.g. for a categorical predictor, then this must contain the positions of the coefficients. If you want to see if all coefficients are zero, like an overall F-test, leave this NULL. Details Even though the name of this function starts with anova it is not an ANOVA type significance testing, but a Wald type. Value A vector with three elements, the test statistic value, its associated p-value and the relevant degrees of freedom. 222 Simulation of random values from a Bingham distribution Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Papke L. E. & Wooldridge J. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11(6): 619-632. McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also prop.reg, qpois.reg, univglms, score.glms, logistic_only Examples y <- rbeta(1000, 1, 4) x <- matrix(rnorm(1000 * 3), ncol = 3) a <- prop.reg(y, x) ## all coefficients are tested anova_propreg(a) ## the first predictor variable is tested anova_propreg(a, 2) a ## this information is already included in the model output ## the first and the second predictor variables are tested anova_propreg(a, 2:3) Simulation of random values from a Bingham distribution Simulating from a Bingham distribution Description Simulation from a Bingham distribution using the code suggested by Kent et al. (2013). Usage rbing(n, lam) Arguments n Sample size. lam Eigenvalues of the diagonal symmetric matrix of the Bingham distribution. See details for more information on this. Simulation of random values from a Bingham distribution with any symmetric matrix 223 Details The user must have calculated the eigenvalues of the diagonal symmetric matrix of the Bingham distribution. The function accepts the q-1 eigenvalues only. This means, that the user must have subtracted the lowest eigenvalue from the rest and give the non zero ones. The function uses rejection sampling. Value A matrix with the simulated data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Kent J.T., Ganeiber A.M. and Mardia K.V. (2013). A new method to simulate the Bingham and related distributions in directional data analysis with applications. http://arxiv.org/pdf/1310.8110v1.pdf C.J. Fallaize and T. Kypraios (2014). Exact Bayesian Inference for the Bingham Distribution. Statistics and Computing (No volum assigned yet). http://arxiv.org/pdf/1401.2894v1.pdf See Also rvmf Examples x <- rbing( 100, c(1, 0.6, 0.1) ) x Simulation of random values from a Bingham distribution with any symmetric matrix Simulation of random values from a Bingham distribution with any symmetric matrix Description Simulation of random values from a Bingham distribution with any symmetric matrix. Usage rbingham(n, A) 224 Simulation of random values from a normal distribution Arguments n Sample size. A A symmetric matrix. Details The eigenvalues of the q x q symmetric matrix A are calculated and the smallest of them is subtracted from the rest. The q - 1 non zero eiqenvalues are then passed to rbing. The generated data are then right multiplied by V T , where V is the matrix of eigenvectors of the matrix A. Value A matrix with the simulated data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Kent J.T., Ganeiber A.M. and Mardia K.V. (2013). A new method to simulate the Bingham and related distributions in directional data analysis with applications. http://arxiv.org/pdf/1310.8110v1.pdf C.J. Fallaize and T. Kypraios (2014). Exact Bayesian Inference for the Bingham Distribution. Statistics and Computing (No volum assigned yet). http://arxiv.org/pdf/1401.2894v1.pdf See Also rvmf Examples A <- cov( iris[, 1:4] ) x <- rbingham(100, A) x Simulation of random values from a normal distribution Simulation of random values from a normal distribution Description Simulation of random values from a normal distribution. Simulation of random values from a von Mises-Fisher distribution 225 Usage Rnorm(n, m = 0, s = 1) Arguments n The sample size. m The mean, set to 0 by default. s The standard devation, set to 1 by default. Details By using the Ziggurat method of generating standard normal variates, this function is really fast when you want to generate large vectors. For less than 2,000 this might make no difference when compared with R’s "rnorm", but for 10,000 this will be 6-7 times faster. Value A vector with n values. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> See Also matrnorm, rvonmises, rvmf, rmvnorm Examples x <- Rnorm(500) Simulation of random values from a von Mises-Fisher distribution Random values simulation from a von Mises-Fisher distribution Description It generates random vectors following the von Mises-Fisher distribution. The data can be spherical or hyper-spherical. Usage rvmf(n, mu, k) 226 Simulation of random values from a von Mises-Fisher distribution Arguments n The sample size. mu The mean direction, a unit vector. k The concentration parameter. If k = 0, random values from the spherical uniform will be drwan. Values from a multivariate normal distribution with zero mean vector and the identity matrix as the covariance matrix. Then each vector becomes a unit vector. Details It uses a rejection smapling as suggested by Andrew Wood (1994). Value A matrix with the simulated data. Author(s) Michail Tsagris and Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm85@gmail.com> References Wood A. T. A. (1994). Simulation of the von Mises Fisher distribution. Communications in statistics-simulation and computation, 23(1): 157–164. Dhillon I. S. & Sra S. (2003). Modeling data using directional distributions. Technical Report TR03-06, Department of Computer Sciences, The University of Texas at Austin. http://citeseerx.ist.psu.edu/viewdoc/download?d See Also vmf.mle, rvonmises, iag.mle Examples m <- rnorm(4) m <- m/sqrt(sum(m^2)) x <- rvmf(1000, m, 25) m vmf.mle(x) Skeleton of the PC algorithm 227 Skeleton of the PC algorithm The skeleton of a Bayesian network produced by the PC algorithm Description The skeleton of a Bayesian network produced by the PC algorithm. Usage pc.skel(dataset, method = "pearson", alpha = 0.01, R = 1, stat = NULL, ini.pvalue = NULL) Arguments dataset A numerical matrix with the variables. If you have a data.frame (i.e. categorical data) turn them into a matrix using data.frame.to_matrix. Note, that for the categorical case data, the numbers must start from 0. No missing data are allowed. method If you have continuous data, you can choose either "pearson" or "spearman". If you have categorical data though, this must be "cat". In this case, make sure the minimum value of each variable is zero. The g2Test and the relevant functions work that way. alpha The significance level (suitable values in (0, 1)) for assessing the p-values. Default (preferred) value is 0.01. R The number of permutations to be conducted. The p-values are assessed via permutations. Use the default value if you want no permutation based assessment. stat If the initial test statistics (univariate associations) are available, pass them through this parameter. ini.pvalue if the initial p-values of the univariate associations are available, pass them through this parameter. Details The PC algorithm as proposed by Spirtes et al. (2000) is implemented. The variables must be either continuous or categorical, only. The skeleton of the PC algorithm is order independent, since we are using the third heuristic (Spirte et al., 2000, pg. 90). At every stage of the algorithm use the pairs which are least statistically associated. The conditioning set consists of variables which are most statistically associated with each other of the pair of variables. For example, for the pair (X, Y) there can be two conditioning sets for example (Z1, Z2) and (W1, W2). All p-values and test statistics and degrees of freedom have been computed at the first step of the algorithm. Take the p-values between (Z1, Z2) and (X, Y) and between (Z1, Z2) and (X, Y). The conditioning set with the minimum p-value is used first. If the minimum p-values are the same, use the second lowest p-value. If the unlikely, but not impossible, event of all p-values being the same, the test statistic divided by the degrees of freedom is used as a means of choosing which conditioning set is to be used first. 228 Skeleton of the PC algorithm If two or more p-values are below the machine epsilon (.Machine$double.eps which is equal to 2.220446e-16), all of them are set to 0. To make the comparison or the ordering feasible we use the logarithm of p-value. Hence, the logarithm of the p-values is always calculated and used. In the case of the G2 test of independence (for categorical data) with no permutations, we have incorporated a rule of thumb. If the number of samples is at least 5 times the number of the parameters to be estimated, the test is performed, otherwise, independence is not rejected according to Tsamardinos et al. (2006). We have modified it so that it calculates the p-value using permutations. Value A list including: stat The test statistics of the univariate associations. ini.pvalue The initial p-values univariate associations. pvalue The logarithm of the p-values of the univariate associations. runtime The amount of time it took to run the algorithm. kappa The maximum value of k, the maximum cardinality of the conditioning set at which the algorithm stopped. n.tests The number of tests conducted during each k. G The adjancency matrix. A value of 1 in G[i, j] appears in G[j, i] also, indicating that i and j have an edge between them. sepset A list with the separating sets for every value of k. Author(s) Marios Dimitriadis R implementation and documentation: Marios Dimitriadis <kmdimitriadis@gmail.com> References Spirtes P., Glymour C. and Scheines R. (2001). Causation, Prediction, and Search. The MIT Press, Cambridge, MA, USA, 3nd edition. Tsamardinos I., Borboudakis G. (2010) Permutation Testing Improves Bayesian Network Learning. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. 322-337. Tsamardinos I., Brown E.L. and Aliferis F.C. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning 65(1):31-78. See Also g2Test, g2Test_univariate, cora, correls Examples # simulate a dataset with continuous data dataset <- matrix(rnorm(1000 * 50, 1, 100), nrow = 1000) a <- pc.skel(dataset, method = "pearson", alpha = 0.01) Skewness and kurtosis coefficients 229 Skewness and kurtosis coefficients Skewness and kurtosis coefficients Description Skewness and kurtosis coefficients. Usage skew(x, pvalue = FALSE) kurt(x, pvalue = FALSE) Arguments x A numerical vector with data. pvalue If you want a hypothesis test that the skewness or kurtosis are significant set this to TRUE. This checks whether the skewness is significantly different from 0 and whether the kurtosis is significantly different from 3. Details The sample skewness and kurtosis coefficient are calculated. For the kurtosis we do not subtract 3. Value If "pvalue" is FALSE (default value) the skewness or kurtosis coefficients are returned. Otherwise, the p-value of the significance of the coefficient is returned. Author(s) Klio Lakiotaki R implementation and documentation: Klio Lakiotaki <kliolak@gmail.com>. References https://en.wikipedia.org/wiki/Skewness https://en.wikipedia.org/wiki/Kurtosis See Also colskewness, skew.test2, colmeans, colVars, colMedians 230 Some summary statistics of a vector for each level of a grouping variable Examples x <- rgamma(500,1, 4) skew(x) kurt(x, TRUE) Some summary statistics of a vector for each level of a grouping variable Some summary statistics of a vector for each level of a grouping variable. Description Some summary statistics of a vector for each level of a grouping variable. Usage group.var(x, ina,ina.max = max(ina)) group.all(x, ina,ina.max = max(ina)) group.any(x, ina,ina.max = max(ina)) group.mad(x, ina,method = "median") group.mean(x, ina,ina.max = max(ina)) group.med(x, ina) group.min(x, ina,ina.max = max(ina)) group.max(x, ina,ina.min = NULL,ina.max = NULL) group.min_max(x, ina,ina.max = max(ina)) Arguments x A numerical vector with data. ina A numerical vector or a factor variable. For every distinct value of "ina" the variance (or standard deviation) of "x" will be calculated. Note that negative values are not allowed as this can cause R to run forever or crash. method A character vector with values "median", for median absolute deviation or "mean", for mean absolute deviation. ina.max Maximum number for vector ina. ina.min Minimum number for vector ina. Details This is the like the "groupcolVars" (or rowsum), but only for vectors. No names are returned, simply a vector, similar to "groupcolVars". Note that this command works only for vectors. Median absolute deviation, mean, median, minium, maximum are some of the options offered. Value A vector with the variance, or standard deviation, or mean, or minimum, or maximum, or median, or minimum-maximum of x for each distinct value of ina. Sort - Sort a vector coresponding to another 231 Author(s) Manos Papadakis and Michail Tsagris R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> and Michail Tsagris <mtsagris@yahoo.gr>. See Also groupcolVars, group.sum, colmeans, colVars, colMedians Examples x <- rgamma(100,1, 4) ina <- sample(1:5, 100, TRUE) group.var(x, ina) #group.mean(x, ina) #run #group.med(x, ina) #run Sort - Sort a vector coresponding to another Sort - Sort a vector coresponding to another Description Fast sorting a vector. Usage Sort(x,descending=FALSE,partial=NULL,stable=FALSE,na.last=NULL) sort_cor_vectors(x, base, stable = FALSE, descending = FALSE) Arguments x A numerical/character vector with data. base A numerical/character vector to help sorting the x. descending A boolean value (TRUE/FALSE) for sorting the vector in descending order. By default sorts the vector in ascending. partial This argument has two usages. The first is an index number for sorting partial the vector. The second is a vector with 2 values, start and end c(start,end). Gives you a vector where the elements between start and end will be sorted only. Not character vector. stable A boolean value (TRUE/FALSE) for choosing a stable sort algorithm. Stable means that discriminates on the same elements. Not character vector. na.last Accept 4 values. TRUE, FALSE, NA, NULL. TRUE/FALSE: for put NAs last or first. NA: for remove NAs completely from vector. NULL: by default. Leave it like that if there is no NA values. 232 Sort - Sort a vector coresponding to another Details This function uses the sorting algorithm from C++. The implementation is very fast and highly optimised. Especially for large data. Value Sort: The sorted vector. sort_cor_vectors: The first argument but sorted acording to the second. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also nth, colnth, rownth,sort_unique, Round Examples x <- rnorm(1000) system.time( s1 <- Sort(x) ) system.time( s2 <- sort(x) ) all.equal(s1,s2) #true but not if many duplicates. system.time( s1 <- Sort(x,partial=100) ) system.time( s2 <- sort(x,partial=100) ) all.equal(s1,s2) #true system.time( s1 <- Sort(x,stable=TRUE) ) system.time( s2 <- sort(x) ) all.equal(s1,s2) #true x <- as.character(x) system.time( s1 <- Sort(x) ) system.time( s2 <- sort(x) ) all.equal(s1,s2) #true y <- runif(1000) b <- sort_cor_vectors(x,y) x<-y<-y<-s1<-s2<-NULL Sort and unique numbers 233 Sort and unique numbers Sort and unique Description Sort and unique numbers. Usage sort_unique(x) sort_unique.length(x) Arguments x A numeric vector. Details The "sort_unique" function implements R’s "unique" function using C++’s function but also sort the result. The "sort_unique.length" returns the length of the unique numbers. Value Returns the discrete values but sorted or their length (depending on the function you do). Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com> See Also sort_mat, sort_cor_vectors Examples y <- rnorm(100) a <- sort_unique(y) b <- sort.int(unique(y)) all.equal(as.vector(a),as.vector(b)) x <- rpois(1000,10) sort_unique.length(x) length(sort_unique(x)) x<-a<-b<-NULL 234 Sorting of the columns-rows of a matrix Sorting of the columns-rows of a matrix Sorting of the columns-rows of a matrix Description Fast sorting of the columns-rows of a matrix. Usage sort_mat(x, by.row = FALSE, descending = FALSE, stable = FALSE,parallel=FALSE) Arguments x A numerical matrix with data. by.row If you want to sort the rows of the matrix set this to TRUE. descending If you want the sorting in descending order, set this to TRUE. stable If you the stable version, so that the results are the same as R’s (in the case of ties) set this to TRUE. If this is TRUE, the algorithm is a bit slower. parallel Do you want to do it in parallel in C++? TRUE or FALSE. Works with every other argument. Value The matrix with its columns-rows (or rows) independently sorted. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also nth, colMaxs, colMins, colrange, sort_cor_vectors,sort_unique Examples x <- matrix( rnorm(100 * 500), ncol = 500 ) system.time( s1 <- sort_mat(x) ) system.time( s2 <- apply(x, 2, sort) ) all.equal(as.vector(s1), as.vector(s2)) x<-NULL 235 Source many R files Source many R files Source many R files Description Source many R/Rd files. Usage sourceR(path,local=FALSE,encode = "UTF-8",print.errors=FALSE) sourceRd(path,print.errors=FALSE) Arguments path An full path to the directory where R file are. local TRUE, FALSE or an environment, determining where the parsed expressions are evaluated. FALSE (the default) corresponds to the user’s workspace (the global environment) and TRUE to the environment from which source is called. encode Character vector. The encoding(s) to be assumed when file is a character string: see file. A possible value is "unknown" when the encoding is guessed: see the "Encodings" section. print.errors A boolean value (TRUE/FALSE) for printing the errors, if exists, for every file. Details Reads many R files and source them. Value Returns the files that had produced errors during source. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also read.directory, AddToNamespace Examples # # # # for example: system.time( for example: system.time( path="C:\some_file\R\" where is R files are a<-sourceR(path) ) path="C:\some_file\man\" where is Rd files are a<-sourceRd(path) ) 236 Spatial median for Euclidean data Spatial median for Euclidean data Spatial median for Euclidean data Description Spatial median for Euclidean data. Usage spat.med(x, tol = 1e-09) Arguments x A matrix with Euclidean data, continuous variables. tol A tolerance level to terminate the process. This is set to 1e-09 by default. Details The spatial median, using a fixed point iterative algorithm, for Euclidean data is calculated. It is a robust location estimate. Value A vector with the spatial median. Author(s) Manos Papadakis and Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Jyrki Mottonen, Klaus Nordhausen and Hannu Oja (2010). Asymptotic theory of the spatial median. In Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jureckova. T. Karkkaminen and S. Ayramo (2005). On computation of spatial median for robust data mining. Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, EUROGEN 2005, R. Schilling, W.Haase, J. Periaux, H. Baier, G. Bugeda (Eds) FLM, Munich. http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf See Also colMedians Spherical and hyperspherical median 237 Examples spat.med( as.matrix( iris[, 1:4] ) ) colMeans( as.matrix(iris[, 1:4]) ) colMedians( as.matrix(iris[, 1:4]) ) Spherical and hyperspherical median Fast calculation of the spherical and hyperspherical median Description It calculates, very fast, the (hyper-)spherical median of a sample. Usage mediandir(x) Arguments x The data, a numeric matrix with unit vectors. Details The "mediandir" employes a fixed poit iterative algorithm stemming from the first derivative (Cabrera and Watson, 1990) to find the median direction as described in Fisher (1985) and Fisher, Lewis and Embleton (1987). Value The median direction. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Fisher N. I. (1985). Spherical medians. Journal of the Royal Statistical Society. Series B, 47(2): 342-348. Fisher N. I., Lewis T. and Embleton B. J. (1987). Statistical analysis of spherical data. Cambridge university press. Cabrera J. and Watson G. S. (1990). On a spherical median related distribution. Communications in Statistics-Theory and Methods, 19(6): 1973-1986. 238 Standardisation See Also vmf.mle Examples m <- rnorm(3) m <- m / sqrt( sum(m^2) ) x <- rvmf(100, m, 10) mediandir(x) x <- NULL Standardisation Standardisation Description Standardisation. Usage standardise(x, center = TRUE, scale = TRUE) Arguments x A matrix with data. It has to be matrix, if it is data.frame for example the function does not turn it into a matrix. center Should the data be centred as well? TRUE or FALSE. scale Should the columns have unit variance, yes (TRUE) or no (FALSE)? Details Similar to R’s built in functions "scale" there is the option for centering or scaling only or both (default). Value A matrix with the standardised data. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. See Also colVars, colmeans, colMads 239 Sub-matrix Examples x <- matrnorm( 100, 100 ) a1 <- scale(x)[1:100, ] a2 <- standardise(x) all.equal(as.vector(a1), as.vector(a2)) x <- NULL Sub-matrix Sub-matrix Description Sub-matrix. Usage submatrix(x,rowStart=1,rowEnd=1,colStart=1,colEnd=1) Arguments x A Matrix, List, Dataframe or Vector. rowStart Start of the row. rowEnd End of the row. colStart Start of the col. colEnd End of the col. Value sub matrix like R’s, x[startrow:endrow,startcol:endcol]. Fast especially for big sub matrices. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Match, mvbetas, correls, univglms, colsums, colVars Examples x <- matrix( rnorm(100 * 100), ncol = 100 ) submatrix(x,1,50,1,25) # x[1:50,1:25] x<-NULL 240 Sum of all pairwise distances in a distance matrix Sum of a matrix Sum of a matrix Description Sum of a matrix. Usage matrix.sum(x) Arguments x A matrix with data. Value The sum of the matrix "x". Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also rowMins, rowFalse, nth, colrange, colMedians, colVars, sort_mat, rowTrue Examples x <- matrix(runif(100*100),100,100) f <- matrix.sum(x) f==sum(x) f<-NULL Sum of all pairwise distances in a distance matrix Sum of all pairwise distances in a distance matrix Description Sum of all pairwise distances in a distance matrix. Sum of all pairwise distances in a distance matrix 241 Usage total.dist(x, method = "euclidean", square = FALSE, p = 0) total.dista(x, y, square = FALSE) Arguments x A matrix with numbers. y A second matrix with data. The number of comlumns of this matrix must be the same with the matrix x. The number of rows can be different. method This is either "euclidean", "manhattan", "canberra1", "canberra2", "minimum", "maximum", "minkowski","bhattacharyya", "hellinger", "total_variation" or "kullback_leibler/jensen_shannon". The last two options are basically the same. square If you choose "euclidean" or "hellinger" as the method, then you can have the option to return the squared Euclidean distances by setting this argument to TRUE. p This is for the the Minkowski, the power of the metric. Details In order to do the total.dist one would have to calcualte the distance matrix and sum it. We do this internally in C++ without creating the matrix. For the total.dista it is the same thing. Value A numerical value, the sum of the distances. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also Dist, dista Examples x <- matrix( rnorm(50 * 10), ncol = 10 ) total.dist(x) y <- matrix( rnorm(40 * 10), ncol = 10) total.dista(x, y) total.dista(y, x) x<-y<-NULL 242 Sums of a vector for each level of a grouping variable Sums of a vector for each level of a grouping variable Sums of a vector for each level of a grouping variable Description Sums of a vector for each level of a grouping variable. Usage group.sum(x, ina,ina.max = NULL,ina.min = NULL) Arguments x A numerical vector whose sums are to be calculated for each value of "ina". ina A numerical vector or a factor variable. For evey distinct value of "ina" the sum of "x" will be calculated. Note that negative values are not allowed as this can cause R to run forever. ina.min Minimum INTEGER number for vector ina as vector or NULL. ina.max Maximum INTEGER number for vector ina as vector or NULL. Details This is the rowsum, but only for vectors. No names are returned, simply a matrix with one column, just like rowsum. Note that this command works only for vectors. Value A matrix with one column, the sum of x for each distinct value of ina. Author(s) We found the C++ code (written by Francois Romain) in http://stackoverflow.com/questions/16975034/rcppequivalent-for-rowsum Manos Papadakis then added it in Rfast. R Documentation: Michail Tsagris <mtsagris@yahoo.gr>. See Also colmeans, colVars, Var, med Examples x <- rnorm(1000) ina <- sample(1:5, 1000, replace = TRUE) a1 <- group.sum(x, ina) a2 <- rowsum(x, ina) Table Creation - Frequency of each value 243 Table Creation - Frequency of each value Table Creation - Frequency of each value Description Table Creation - Frequency of each value. Usage Table(x,y=NULL,names = TRUE,useNA = FALSE,rm.zeros = FALSE) Table.sign(x,names = TRUE,useNA = FALSE) Arguments x A vector with numeric/character data. names A logical value (TRUE/FALSE) for add names. y A vector with numeric/character data. Doesn’t work with "useNA". rm.zeros A logical value for removing zero columns/rows. Only for integer vectors for now. useNA Table: Integer/logical value: FALSE: not NA values in vector. TRUE: count NAs and add the value in the last position of the returned vector. any other integer except 0,1: for just removing NAs. Table.sign: Logical value, TRUE, for count NAs. Otherwise FALSE. Doesn’t work character data. Details Like R’s "table": for giving one argument,"x": If "names" is FALSE then, if "useNA" is TRUE then the NAs will be count, if is FALSE it means there are no NAs and for any other integer value the NAs will be ignored. for giving two arguments,"x","y": If "names" is FALSE then, creates the contigency table, otherwise sets the col-row names with discrete values. If "rm.zeros" is FALSE then it won’t remove the zero columns/rows from the result but it will work only for positive integers for now. For this if "names" is TRUE then the col-row names will be the seq(min(),max()) for "x","y". In future updates it will be changed. for both algorithms: You can’t use "useNA" with "names" for now. It is much faster to get the result without names (names = FALSE) but all the algorithms are more efficient than R’s. Like R’s "table(sign())" but more efficient. Count the frequencies of positives, negatives, zeros and NAs values. If argument "names" is FALSE then the returned vector doesn’t have names. Otherwise "-1,0,+1,NA". If "useNA" is TRUE then the NAs will be count, otherwise not. You can use "useNA" with "names". 244 Tests for the dispersion parameter in Poisson distribution Value Table: for giving one argument,"x": if "names" is TRUE then return a vector with names the discrete values of "x" and values there frequencies, otherwise only the frequencies for giving two arguments,"x","y": if "names" is TRUE then return a contigency matrix with rownames the discrete values of "x", colnames the dicrete values of "y" and values the freuquencies of the pairs, otherwise only the freuquencies of the pairs. Table.sign: A vector with 4 values/frequencies: index 1: negatives index 2: zeros index 3: postives if "names" is TRUE then the returned vector have names "-1,0,+1". if "useNA" is TRUE then 4th value has the frequencies of NAs and the returned vector will have one more name, "-1,0,+1,NA", if "names" is also TRUE. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colShuffle, colVars, colmeans, read.directory, is_integer, as_integer Examples x<-runif(10) y1<-Table(x) y2<-as.vector(table(x)) # Neads a lot of time. all.equal(y1,y2) y1<-Table(x,names=FALSE) all.equal(y1,y2) # the name attribute of y1 is null y1<-Table.sign(x) y2<-table(sign(x)) all.equal(y1,y2) x<-y1<-y2<-NULL Tests for the dispersion parameter in Poisson distribution Tests for the dispersion parameter in Poisson distribution Description Tests for the dispersion parameter in Poisson distribution. Usage poisdisp.test(y, alternative = "either", logged = FALSE) pois.test(y, logged = FALSE) Tests for the dispersion parameter in Poisson distribution 245 Arguments y A numerical vector with count data, 0, 1,... alternative Do you want to test specifically for either over or underspirsion ("either"), overdispersion ("over") or undersispersion ("under")? logged Set to TRUE if you want the logarithm of the p-value. Value A vector with two elements, the test statistic and the (logged) p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. References Yang Zhao, James W. Hardin, and Cheryl L. Addy. (2009). A score test for overdispersion in Poisson regression based on the generalized Poisson-2 model. Journal of statistical planning and inference 139(4): 1514-1521. Dimitris Karlis and Evdokia Xekalaki (2000). A Simulation Comparison of Several Procedures for Testing the Poisson Assumption. Journal of the Royal Statistical Society. Series D (The Statistician), 49(3): 355-382. Bohning, D., Dietz, E., Schaub, R., Schlattmann, P. and Lindsay, B. (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(): 373-388. See Also poisson.mle, negbin.mle, poisson.anova, poisson.anovas, poisson_only Examples y <- rnbinom(500, 10, 0.6) poisdisp.test(y, "either") poisdisp.test(y, "over") pois.test(y) y <- rpois(500, 10) poisdisp.test(y, "either") poisdisp.test(y, "over") pois.test(y) 246 Topological sort of a DAG Topological sort of a DAG Topological sort of a DAG Description Topological sort of a DAG. Usage topological_sort(dag) Arguments dag A square matrix representing a directed graph which contains 0s and 1s. If G[i, j] = 1 it means there is an arrow from node i to node j. When there is no edge between nodes i and j if G[i, j] = 0. Details The function is an R translation from an old matlab code. Value A vector with numbers indicating the sorting. If the dag is not a Directed acyclic Graph, NA will be returned. Author(s) Michail Tsagris and Manos Papadakis R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com> References Chickering, D.M. (1995). A transformational characterization of equivalent Bayesian network structures. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 87-98. See Also floyd, pc.skel 247 Transpose of a matrix Examples G <- matrix(0, 5, 5) G[2, 1] <- 1 G[3, 1] <- 1 G[4, 2] <- 1 G[5, 4] <- 1 topological_sort(G) G[2, 4] <- 1 topological_sort(G) Transpose of a matrix Transpose of a matrix Description Transpose of a matrix. Usage transpose(x) Arguments x A numerical square matrix with data. Value The transposed matrix. Author(s) Manos Papadakis R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. References Gilbert Strang (2006). Linear Algebra and its Applications (4th edition). See Also nth, colMaxs, colMins, colrange Examples x <- matrix( rnorm(500 * 500), ncol = 500, nrow=500 ) system.time( transpose(x) ) system.time( t(x) ) x<-NULL 248 Two sample exponential empirical likelihood test Two sample exponential empirical likelihood test Two sample exponential empirical likelihood test Description Two sample exponential empirical likelihood test. Usage eel.test2(x, y, tol = 1e-09, logged = FALSE) Arguments x A numerical vector. y Another numerical vector. tol The tolerance value to stop the iterations of the Newton-Raphson. logged Should the logarithm of the p-value be returned? TRUE or FALSE. Details Exponential empirical likelihood is a non parametric method. In this case we use it as the non parametric alternative to the t-test. Newton-Raphson is used to maximise the log-likelihood ratio test statistic. In the case of no solution, NULL is returned. Value iters The number of iterations required by the Newton-Raphson algorithm. If no covnergence occured this is NULL. info A vector with three elements, the value of the λ, the likelihood ratio test statistic and the relevant p-value. If no convergence occured, the value of the λ before is becomes NA, the value of test statistic is 105 and the p-value is 0. No convergence can be interpreted as rejection of the hypothesis test. p1 The estimated probabilities, one for each observation for the first sample. If no covnergence occured this is NULL. p2 The estimated probabilities, one for each observation for the second sample. If no covnergence occured this is NULL. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Uniformity test for circular data 249 References Owen A. B. (2001). Empirical likelihood. Chapman and Hall/CRC Press. See Also ftests, ttests,, ttest Examples x <- rnorm(100) y <- rnorm(200) system.time( eel.test2(x, y) ) x <- rnorm(100) system.time( eel.test2(x, y) ) x <- rnorm(50) y <- rexp(100) system.time( eel.test2(x, y) ) Uniformity test for circular data Uniformity tests for circular data Description Hypothesis tests of uniformity for circular data. Usage kuiper(u) watson(u) Arguments u A numeric vector containing the circular data which are expressed in radians. Details These tests are used to test the hypothesis that the data come from a circular uniform distribution. Value A vector with two elements, the value of the test statistic and its associated p-value. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> 250 Variance of a vector References Jammalamadaka, S. Rao and SenGupta, A. (2001). Topics in Circular Statistics, pg. 153-55 (Kuiper’s test) & 156-157 (Watson’s test). See Also vmf.mle, rvonmises Examples x <- rvonmises(n = 50, m = 2, k = 10) kuiper(x) watson(x) x <- runif(50, 0, 2 * pi) kuiper(x) watson(x) Variance of a vector Variance (and standard deviation) of a vector Description Variance (and standard deviation) of a vector. Usage Var(x, std = FALSE,na.rm = FALSE) Arguments x A vector with data. std If you want the standard deviation set this to TRUE, otherwise leave it FALSE. na.rm TRUE or FAlSE for remove NAs if exists. Details This is a faster calculaiton of the usual variance of a matrix. Value The variance of the vector. Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> and Manos Papadakis <papadakm95@gmail.com>. Vector allocation in a symmetric matrix 251 See Also colVars, cova Examples x <- rnorm(1000) system.time( for (i in 1:100) Var(x) ) #system.time( for (i in 1:100) var(x) ) x<-NULL Vector allocation in a symmetric matrix Vector allocation in a symmetric matrix Description Vector allocation in a symmetric matrix. Usage squareform(x) Arguments x An numverical vector whose size must be the one that matches the dimensions of the final matrix. See examples. Details The functions is written in C++ in order to be as fast as possible. Value A symmetric matrix. The vector is allocated in the upper and in the lower part of the matrix. The diagonal is filled with zeros. Author(s) R implementation and documentation: Manos Papadakis <papadakm95@gmail.com>. See Also colShuffle, colVars, colmeans 252 Weibull regression model Examples x <- rnorm(1) squareform(x) x <- rnorm(3) squareform(x) x <- rnorm(4) squareform(x) ## OK ## OK ## not OK Weibull regression model Weibull regression model Description Weibull regression model. Usage weib.reg(y, x, tol = 1e-07, maxiters = 100) Arguments y The dependent variable; a numerical vector with strictly positive data, i.e. greater than zero. x A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. This can be a matrix or a data.frame (with factors). tol The tolerance value to terminate the Newton-Raphson algorithm. maxiters The max number of iterations that can take place in each regression. Details The function is written in C++ and this is why it is very fast. No standard errors are returned as they are not corectly estimated. We focused on speed. Value When full is FALSE a list including: iters The iterations required by the Newton-Raphson. loglik The log-likelihood of the model. shape The shape parameter of the Weibull regression. be The regression coefficients. Yule’s Y (coefficient of colligation) 253 Author(s) Stefanos Fafalios R implementation and documentation: Stefanos Fafalios <stefanosfafalios@gmail.com>. References McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989. See Also poisson_only, logistic_only, univglms, regression Examples x <- matrix(rnorm(100 * 2), ncol = 2) y <- rexp(100, 1) weib.reg(y, x) x <- NULL Yule’s Y (coefficient of colligation) Yule’s Y (coefficient of colligation) Description Yule’s Y (coefficient of colligation). Usage yule(x) Arguments x A 2 x 2 matrix or a vector with 4 elements. In the case of the vector make sure it corresponds to the correct table. Details Yule’s coefficient of colligation is calculated. Value Yule’s Y is returned. 254 Yule’s Y (coefficient of colligation) Author(s) Michail Tsagris R implementation and documentation: Michail Tsagris <mtsagris@yahoo.gr> References Yule G. Udny (1912). On the Methods of Measuring Association Between Two Attributes. Journal of the Royal Statistical Society, 75(6):579-652. See Also col.yule, odds.ratio Examples x <- rpois(4, 30) + 2 yule(x) yule( matrix(x, ncol = 2) ) Index ∗Topic 2 sample proportions tests Many 2 sample proportions tests, 112 ∗Topic 2 variances test Many 2 sample tests, 113 ∗Topic AR(1) model Estimation of an AR(1) model, 72 ∗Topic All possibe combinations All k possible combinations from n elements, 7 ∗Topic Analysis of covariance Analysis of covariance, 8 Many ANCOVAs, 116 ∗Topic Analysis of variance Analysis of variance with a count variable, 9 ∗Topic Angular central Gaussian ∗Topic Canberra distance Distance matrix, 66 ∗Topic Cauchy MLE of continuous univariate distributions defined on the real line, 171 ∗Topic Checking Alias Check Namespace and Rd files, 21 ∗Topic Checking Examples Check Namespace and Rd files, 21 ∗Topic Checking Rd Check Namespace and Rd files, 21 ∗Topic Checking R Check Namespace and Rd files, 21 ∗Topic Checking for TRUE,FALSE Check Namespace and Rd files, 21 ∗Topic Cholesky decomposition Cholesky decomposition of a square matrix, 24 ∗Topic Circular data Column-wise uniformity Watson test for circular data, 50 Uniformity test for circular data, 249 ∗Topic Circular regression Circular or angular regression, 25 Many simple circular or angular regressions, 141 ∗Topic Circular-linear correlation Circular-linear correlation, 27 ∗Topic Cochran’s Q test Many non parametric multi-sample tests, 128 ∗Topic Column means Column and row-wise means of a matrix, 30 ∗Topic Column sums Column and row-wise sums of a matrix, 38 distribution Angular central Gaussian random values simulation, 10 ∗Topic Area aunder the curve Many (and one) area aunder the curve values, 111 ∗Topic BIC BIC (using partial correlation) forward regression, 14 BIC forward regression with generalised linear models, 15 ∗Topic Beta distribution MLE of distributions defined in the (0, 1) interval, 174 ∗Topic Beta function Natural logarithm of the beta function, 192 ∗Topic Binary search Algorithm Binary search algorithm, 16 ∗Topic Bradley-Terry model Fitted probabilities of the Terry-Bradley model, 80 255 256 INDEX ∗Topic Column-Row wise checking Check if any column or row is fill with zeros, 19 ∗Topic Column-wise Any Column and row-wise Any/All, 29 ∗Topic Column-wise Shuffle Column and row-wise Shuffle, 37 ∗Topic Column-wise median absolute deviations Column and rows-wise mean absolute deviations, 41 ∗Topic Column-wise medians Column and row-wise medians, 31 ∗Topic Column-wise minimum Column-wise minimum and maximum, 46 ∗Topic Column-wise nth Column and row-wise nth smallest value of a matrix/vector, 32 ∗Topic Column-wise ranges Column and row-wise range of values of a matrix, 35 ∗Topic Column-wise tabulate Column and row-wise tabulate, 39 ∗Topic Column-wise true Column-wise true/false value, 49 ∗Topic Column-wise variances Column and row-wise variances and standard deviations, 40 ∗Topic Column-wise Column-wise MLE of some univariate distributions, 47 ∗Topic Combinatorics All k possible combinations from n elements, 7 ∗Topic Continuous distributions MLE of continuous univariate distributions defined on the positive line, 169 MLE of continuous univariate distributions defined on the real line, 171 ∗Topic Correlations Correlation between pairs of variables, 53 Correlations, 55 ∗Topic Covariance matrix Covariance and correlation matrix, 56 ∗Topic Create - Fill Diagonal Matrix, 63 ∗Topic DAG Topological sort of a DAG, 246 ∗Topic Dataframe to Matrix data.frame.to_matrix, 60 ∗Topic Design Matrix Design Matrix, 62 ∗Topic Determinant Check if any column or row is fill with zeros, 19 ∗Topic Diagonal Matrix Diagonal Matrix, 63 ∗Topic Differences Column-wise differences, 43 ∗Topic Directional k-NN algorithm k-NN algorithm using the arc cosinus distance, 103 ∗Topic Dirichlet distribution Fitting a Dirichlet distribution via Newton-Rapshon, 81 ∗Topic Discrimination Prediction with some naive Bayes classifiers, 204 ∗Topic Distance correlation Distance correlation, 65 ∗Topic Distance covariance Distance variance and covariance, 67 ∗Topic Distance matrix Distance matrix, 66 ∗Topic Distance variance Distance variance and covariance, 67 ∗Topic Distances Distance between vectors and a matrix, 64 Sum of all pairwise distances in a distance matrix, 240 ∗Topic Divide and Qonquer Binary search algorithm, 16 Find element, 78 ∗Topic Eigenvalues Eigenvalues and eigenvectors in high dimensional principal component analysis, 68 ∗Topic Energy distances Energy distance between matrices, 257 INDEX 70 ∗Topic Equality check Equality of objects, 71 ∗Topic Euclidean distance Distance matrix, 66 ∗Topic Exponential regressions Many exponential regressions, 117 ∗Topic Export functions Insert new function names in the NAMESPACE file, 98 Source many R files, 235 ∗Topic Extract columns/rows Get specific columns/rows fo a matrix, 89 ∗Topic F-tests Many F-tests with really huge matrices, 118 Many multi-sample tests, 125 ∗Topic F-test Multi-sample tests for vectors, 183 ∗Topic Factor variables Index of the columns of a data.frame which are factor variables, 97 ∗Topic Factorials Binomial coefficient and its logarithm, 17 ∗Topic Find Value Find the given value in a hash table, 79 ∗Topic Find element Find element, 78 ∗Topic Floyd-Warshall algorithm Floyd-Warshall algorithm, 82 ∗Topic Forward regression BIC (using partial correlation) forward regression, 14 BIC forward regression with generalised linear models, 15 Correlation based forward regression, 52 Forward selection with generalised linear regression models, 84 ∗Topic GLMS Many score based GLM regressions, 136 ∗Topic GLMs Quasi binomial regression for proportions, 205 Quasi Poisson regression for count data, 207 Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression, 221 ∗Topic G^2 test of conditional independence G-square test of conditional indepdence, 85 ∗Topic G^2 test of independence Matrix with G-square tests of indepedence, 159 ∗Topic G^2 tests of independence Many G-square tests of indepedence, 119 ∗Topic Gini coefficient Many Gini coefficients, 121 ∗Topic Goodness of fit test Hypothesis test for von Mises-Fisher distribution over Kent distribution, 95 ∗Topic Grouped sums Sums of a vector for each level of a grouping variable, 242 ∗Topic Gumbel distribution MLE of continuous univariate distributions defined on the real line, 171 ∗Topic Hash Function Find the given value in a hash table, 79 Hash - Pair function, 90 ∗Topic Hash tables Hash object to a list object, 91 ∗Topic Hellinger distance Distance matrix, 66 ∗Topic High dimensional data High dimensional MCD based detection of outliers, 92 ∗Topic Hypothesis testing Many one sample tests, 131 One sample empirical and exponential empirical likelihood test, 197 Two sample exponential empirical likelihood test, 248 258 ∗Topic Hypothesis test Exponential empirical likelihood for a one sample mean vector hypothesis testing, 74 ∗Topic Inverse matrix Inverse of a symmetric positive definite matrix, 99 ∗Topic Inverted Dirichlet distribution MLE of the inverted Dirichlet distribution, 177 ∗Topic James test Multi-sample tests for vectors, 183 ∗Topic Kent distribution Hypothesis test for von Mises-Fisher distribution over Kent distribution, 95 ∗Topic Laplace distribution MLE of continuous univariate distributions defined on the real line, 171 ∗Topic Linear mixed models Column and row wise coefficients of variation, 28 Many random intercepts LMMs for balanced data with a single identical covariate., 133 Random intercepts linear mixed models, 209 ∗Topic Linear models Linear models for large scale data, 104 ∗Topic Linear time Find element, 78 ∗Topic Log matrix Natural Logarithm each element of a matrix, 191 ∗Topic Logarithm of gamma function Natural logarithm of the gamma function and its derivatives, 193 ∗Topic Logistic distribution MLE of continuous univariate distributions defined on the real line, 171 ∗Topic Logistic regressions Many univariate simple binary logistic regressions, 151 ∗Topic Logistic regression INDEX Logistic and Poisson regression models, 105 Logistic or Poisson regression with a single categorical predictor, 107 ∗Topic Lower and Upper triangular of a matrix Lower and Upper triangular of a matrix, 108 ∗Topic MCD estimation High dimensional MCD based detection of outliers, 92 ∗Topic Mahalanobis distance Mahalanobis distance, 110 ∗Topic Manhattan distance Distance matrix, 66 ∗Topic Many betas in regression Many multivariate simple linear regressions coefficients, 127 Many simple linear regressions coefficients, 145 ∗Topic Match Function Match, 157 ∗Topic Matrices Number of equal columns between two matrices, 195 ∗Topic McNemar’s test Many 2 sample tests, 113 ∗Topic Median direction Spherical and hyperspherical median, 237 ∗Topic Multinomial distribution MLE for multivariate discrete data, 166 Multinomial regression, 185 ∗Topic Multivariate analysis of variance James multivariate version of the t-test, 100 ∗Topic Multivariate data Multivariate kurtosis, 186 ∗Topic Multivariate hypothesis testing Exponential empirical likelihood hypothesis testing for two mean vectors, 75 ∗Topic Multivariate normal distribution Density of the multivariate INDEX normal and t distributions, 61 MLE of the multivariate normal distribution, 178 ∗Topic Namespace file Check Namespace and Rd files, 21 Insert new function names in the NAMESPACE file, 98 Source many R files, 235 ∗Topic Newton-Raphson Fitting a Dirichlet distribution via Newton-Rapshon, 81 MLE of distributions defined in the (0, 1) interval, 174 ∗Topic Norm of a matrix Norm of a matrix, 194 ∗Topic Odds ratios Many odds ratio tests, 129 ∗Topic Odds ratio Odds ratio and relative risk, 196 ∗Topic One sample t-test One sample t-test for a vector, 198 ∗Topic Orderings Column and row-wise Order - Sort Indices, 33 ∗Topic Ordinal model MLE of the ordinal model without covariates, 179 ∗Topic PC algorithm Skeleton of the PC algorithm, 227 ∗Topic Pair Function Hash - Pair function, 90 ∗Topic Pairs of vectors Column-row wise minima and maxima of two matrices, 42 Minima and maxima of two vectors/matrices, 163 ∗Topic Pareto MLE of continuous univariate distributions defined on the positive line, 169 ∗Topic Pearson correlation Correlation based forward regression, 52 ∗Topic Permutation Function Permutation, 202 ∗Topic Poisson distribution Analysis of variance with a count variable, 9 259 Many analysis of variance tests with a discrete variable, 114 Many tests for the dispersion parameter in Poisson distribution, 147 MLE of count data (univariate discrete distributions), 172 Prediction with some naive Bayes classifiers, 204 Tests for the dispersion parameter in Poisson distribution, 244 ∗Topic Poisson regressions Many univariate simple poisson regressions, 154 Many univariate simple quasi poisson regressions, 155 ∗Topic Poisson regression Logistic or Poisson regression with a single categorical predictor, 107 ∗Topic Poisson Forward selection with generalised linear regression models, 84 ∗Topic Products Column and row-wise products, 34 ∗Topic Quasi Poisson regression Quasi Poisson regression for count data, 207 ∗Topic Quasi regression Quasi binomial regression for proportions, 205 Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression, 221 ∗Topic Random values simulation Random values simulation from a von Mises distribution, 210 Simulation of random values from a von Mises-Fisher distribution, 225 ∗Topic Read Examples Reading the files of a directory, 212 ∗Topic Read directory Reading the files of a directory, 212 ∗Topic Repeated measures 260 INDEX Many regression based tests for single sample repeated measures, 134 Repeated measures anova, 214 ∗Topic Replicate in columns/rows Replicate columns/rows, 215 ∗Topic Replicate in columns Sum of a matrix, 240 ∗Topic Round vector/matrix Round each element of a matrix/vector, 216 ∗Topic Row - Wise matrix/vector count the frequency of a value Row - Wise matrix/vector count the frequency of a value, 217 ∗Topic Row sums Column and row-wise sums of a matrix, 38 ∗Topic Row-wise Any Column and row-wise Any/All, 29 ∗Topic Row-wise Shuffle Column and row-wise Shuffle, 37 ∗Topic Row-wise false Row-wise true value, 219 ∗Topic Row-wise medians Column and row-wise medians, 31 ∗Topic Row-wise minimum Row-wise minimum and maximum, 218 ∗Topic Row-wise nth Column and row-wise nth smallest value of a matrix/vector, 32 ∗Topic Row-wise tabulate Column and row-wise tabulate, 39 ∗Topic Row-wise true-false Row-wise true value, 219 ∗Topic Row-wise true Row-wise true value, 219 ∗Topic Shapiro-Francia Many Shapiro-Francia normality tests, 139 ∗Topic Significance testing Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression, 221 ∗Topic Simple linear regressions Many univariate simple linear regressions, 152 ∗Topic Skewness coefficient Column-wise kurtosis and skewness coefficients, 44 ∗Topic Skewness Hypothesis testing betseen two skewness or kurtosis coefficients, 96 Skewness and kurtosis coefficients, 229 ∗Topic Sort function Sort and unique numbers, 233 ∗Topic Sorting 2 vectors Sort - Sort a vector coresponding to another, 231 ∗Topic Sorting Sort - Sort a vector coresponding to another, 231 Sorting of the columns-rows of a matrix, 234 ∗Topic Stable Sorting Sort - Sort a vector coresponding to another, 231 ∗Topic Standardisation Standardisation, 238 ∗Topic Sub-matrix Sub-matrix, 239 ∗Topic Sum Operations between two matrices or matrix and vector, 200 ∗Topic Supervised classification k-NN algorithm using the arc cosinus distance, 103 ∗Topic Symmetric matrix Check whether a square matrix is symmetric, 23 ∗Topic Table Creation Table Creation - Frequency of each value, 243 ∗Topic Time series Estimation of an AR(1) model, 72 ∗Topic Tobit model MLE of the tobit model, 180 ∗Topic Topological sort Topological sort of a DAG, 246 ∗Topic Transpose Transpose of a matrix, 247 ∗Topic Two-way ANOVA 261 INDEX Many two-way ANOVAs, 148 ∗Topic Unequality of the covariance matrices James multivariate version of the t-test, 100 ∗Topic Univariate normality test Many Shapiro-Francia normality tests, 139 ∗Topic Variance components Moment and maximum likelihood estimation of variance components, 181 ∗Topic Variance Some summary statistics of a vector for each level of a grouping variable, 230 Variance of a vector, 250 ∗Topic Weibull regressions Many score based regression models, 138 ∗Topic Weibull MLE of continuous univariate distributions defined on the positive line, 169 ∗Topic Wigner semicircle distribution MLE of continuous univariate distributions defined on the real line, 171 ∗Topic Zero range Search for variables with zero range in a matrix, 220 ∗Topic analysis of variance Logistic or Poisson regression with a single categorical predictor, 107 Many analysis of variance tests with a discrete variable, 114 Many F-tests with really huge matrices, 118 Many multi-sample tests, 125 Many non parametric multi-sample tests, 128 Multi-sample tests for vectors, 183 ∗Topic balanced design Column and row wise coefficients of variation, 28 Many random intercepts LMMs for balanced data with a single identical covariate., 133 Random intercepts linear mixed models, 209 ∗Topic beta prime MLE of continuous univariate distributions defined on the positive line, 169 ∗Topic beta regressions Many score based regression models, 138 ∗Topic bias corrected Distance correlation, 65 ∗Topic binary data Forward selection with generalised linear regression models, 84 ∗Topic binomial distribution MLE of count data (univariate discrete distributions), 172 ∗Topic bivariate angular Gaussian MLE of some circular distributions, 176 ∗Topic blocking ANOVA Many multi-sample tests, 125 Multi-sample tests for vectors, 183 ∗Topic categorical variables Many univariate simple linear regressions, 152 ∗Topic censored observations MLE of the tobit model, 180 ∗Topic central angular Gaussian distribution MLE of (hyper-)spherical distributions, 167 ∗Topic circular data MLE of some circular distributions, 176 ∗Topic column-wise false Column-wise true/false value, 49 ∗Topic column-wise maximum Column-wise minimum and maximum, 46 ∗Topic column-wise minimum-maximum Column-wise minimum and maximum, 46 ∗Topic column-wise true-false Column-wise true/false value, 49 ∗Topic combinatorics Binomial coefficient and its logarithm, 17 262 ∗Topic conditional MLE Estimation of an AR(1) model, 72 ∗Topic continuous distributions Column-wise MLE of some univariate distributions, 47 ∗Topic cross-validation Cross-Validation for the k-NN algorithm, 58 ∗Topic data check Search for variables with zero range in a matrix, 220 ∗Topic density values Density of the multivariate normal and t distributions, 61 ∗Topic dependent binary data Multi-sample tests for vectors, 183 ∗Topic derivatives Natural logarithm of the gamma function and its derivatives, 193 ∗Topic digamma function Natural logarithm of the gamma function and its derivatives, 193 ∗Topic directed graph Floyd-Warshall algorithm, 82 ∗Topic directional data Angular central Gaussian random values simulation, 10 MLE of (hyper-)spherical distributions, 167 ∗Topic discrete distributions Column-wise MLE of some univariate distributions, 47 ∗Topic dispersion parameter Many tests for the dispersion parameter in Poisson distribution, 147 Tests for the dispersion parameter in Poisson distribution, 244 ∗Topic equality of variances Many multi-sample tests, 125 Multi-sample tests for vectors, 183 ∗Topic excessive zeros MLE of count data (univariate discrete distributions), 172 ∗Topic exponential regressions Many score based regression INDEX models, 138 ∗Topic fitted probabilities Fitted probabilities of the Terry-Bradley model, 80 ∗Topic folded normal MLE of continuous univariate distributions defined on the positive line, 169 ∗Topic fractional response Quasi binomial regression for proportions, 205 Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression, 221 ∗Topic gamma distribution MLE of continuous univariate distributions defined on the positive line, 169 ∗Topic gamma regressions Many score based regression models, 138 ∗Topic generalised linear models Logistic and Poisson regression models, 105 Many univariate simple binary logistic regressions, 151 Many univariate simple poisson regressions, 154 Many univariate simple quasi poisson regressions, 155 ∗Topic geometric distribution Analysis of variance with a count variable, 9 Many analysis of variance tests with a discrete variable, 114 MLE of count data (univariate discrete distributions), 172 ∗Topic grouppings Some summary statistics of a vector for each level of a grouping variable, 230 ∗Topic half normal MLE of continuous univariate distributions defined on the positive line, 169 ∗Topic harmonic means Column and row-wise means of a 263 INDEX matrix, 30 ∗Topic high dimensional data Eigenvalues and eigenvectors in high dimensional principal component analysis, 68 ∗Topic huge datasets Many F-tests with really huge matrices, 118 ∗Topic hypersecant distribution for proportions MLE of distributions defined in the (0, 1) interval, 174 ∗Topic hypothesis testing Column-wise uniformity Watson test for circular data, 50 Hypothesis testing betseen two skewness or kurtosis coefficients, 96 Uniformity test for circular data, 249 ∗Topic inflated beta distribution MLE of distributions defined in the (0, 1) interval, 174 ∗Topic interaction Many two-way ANOVAs, 148 ∗Topic is_integer Creation Check if values are integers and convert to integer, 20 ∗Topic k-NN algorithm Cross-Validation for the k-NN algorithm, 58 k nearest neighbours algorithm (k-NN), 101 ∗Topic kurtosis coefficient Column-wise kurtosis and skewness coefficients, 44 ∗Topic kurtosis Hypothesis testing betseen two skewness or kurtosis coefficients, 96 Multivariate kurtosis, 186 Skewness and kurtosis coefficients, 229 ∗Topic large scale data Linear models for large scale data, 104 ∗Topic left censoring MLE of the tobit model, 180 ∗Topic list Hash object to a list object, 91 ∗Topic logarithm Natural logarithm of the beta function, 192 ∗Topic logistic normal distribution MLE of distributions defined in the (0, 1) interval, 174 ∗Topic matrix Column and row-wise Order - Sort Indices, 33 Column and row-wise products, 34 Column-wise differences, 43 Transpose of a matrix, 247 ∗Topic maximum frequency minimum and maximum frequencies, 165 ∗Topic maximum likelihood estimation Column and row wise coefficients of variation, 28 Fitting a Dirichlet distribution via Newton-Rapshon, 81 Many random intercepts LMMs for balanced data with a single identical covariate., 133 MLE of (hyper-)spherical distributions, 167 MLE of distributions defined in the (0, 1) interval, 174 Moment and maximum likelihood estimation of variance components, 181 Random intercepts linear mixed models, 209 ∗Topic maximum Column-row wise minima and maxima of two matrices, 42 Minima and maxima of two vectors/matrices, 163 minimum and maximum, 164 ∗Topic mean vector Exponential empirical likelihood for a one sample mean vector hypothesis testing, 74 ∗Topic minimum frequency minimum and maximum frequencies, 165 264 INDEX ∗Topic minimum Column-row wise minima and maxima of two matrices, 42 Minima and maxima of two vectors/matrices, 163 minimum and maximum, 164 ∗Topic moments estimation Moment and maximum likelihood estimation of variance components, 181 ∗Topic multinomial distribution Prediction with some naive Bayes classifiers, 204 ∗Topic multinomial regressions Many score based GLM regressions, 136 ∗Topic multivariate Laplace distribution Multivariate Laplace random values simulation, 187 ∗Topic multivariate discrete data MLE for multivariate discrete data, 166 ∗Topic multivariate normal distribution Multivariate normal and t random values simulation, 188 ∗Topic multivariate t distribution Density of the multivariate normal and t distributions, 61 ∗Topic naive Bayes Prediction with some naive Bayes classifiers, 204 ∗Topic negative binomial MLE of count data (univariate discrete distributions), 172 ∗Topic non parametric statistics Many non parametric multi-sample tests, 128 ∗Topic non parametric test Exponential empirical likelihood hypothesis testing for two mean vectors, 75 One sample empirical and exponential empirical likelihood test, 197 Two sample exponential empirical likelihood test, 248 ∗Topic normal distribution Prediction with some naive Bayes classifiers, 204 ∗Topic nth elements Column and row-wise nth smallest value of a matrix/vector, 32 Median of a vector, 162 ∗Topic one sample Many one sample tests, 131 One sample empirical and exponential empirical likelihood test, 197 ∗Topic operations Operations between two matrices or matrix and vector, 200 ∗Topic outliers High dimensional MCD based detection of outliers, 92 ∗Topic partial correlation BIC (using partial correlation) forward regression, 14 Correlation based forward regression, 52 ∗Topic percentages Hypothesis test for two means of percentages, 94 Many hypothesis tests for two means of percentages, 122 ∗Topic poisson regression Logistic and Poisson regression models, 105 ∗Topic positive definite Inverse of a symmetric positive definite matrix, 99 ∗Topic positive multivariate data MLE of the inverted Dirichlet distribution, 177 ∗Topic projected normal distribution MLE of (hyper-)spherical distributions, 167 ∗Topic projected normal Circular or angular regression, 25 Many simple circular or angular regressions, 141 ∗Topic proportion test Many one sample tests, 131 ∗Topic proportional odds MLE of the ordinal model without INDEX covariates, 179 ∗Topic proportions Forward selection with generalised linear regression models, 84 MLE of distributions defined in the (0, 1) interval, 174 ∗Topic random values simulation Angular central Gaussian random values simulation, 10 Multivariate Laplace random values simulation, 187 Multivariate normal and t random values simulation, 188 ∗Topic regression Many regression based tests for single sample repeated measures, 134 Multinomial regression, 185 Repeated measures anova, 214 ∗Topic robust statistics Spatial median for Euclidean data, 236 ∗Topic row means Column and row-wise means of a matrix, 30 ∗Topic row-wise maximum Row-wise minimum and maximum, 218 ∗Topic row-wise variances Column and row-wise variances and standard deviations, 40 ∗Topic score based tests Many score based GLM regressions, 136 Many score based regression models, 138 ∗Topic shortest paths Floyd-Warshall algorithm, 82 ∗Topic single categorical predictor Logistic or Poisson regression with a single categorical predictor, 107 ∗Topic sorting Median of a vector, 162 ∗Topic spatial median Spatial median for Euclidean data, 236 ∗Topic spherical data MLE of (hyper-)spherical 265 distributions, 167 ∗Topic summary statistics Many regression based tests for single sample repeated measures, 134 Repeated measures anova, 214 ∗Topic symmetric matrix Inverse of a symmetric positive definite matrix, 99 Vector allocation in a symmetric matrix, 251 ∗Topic t distribution MLE of continuous univariate distributions defined on the real line, 171 ∗Topic t-tests Many 2 sample tests, 113 Many hypothesis tests for two means of percentages, 122 Matrix with all pairs of t-tests, 158 ∗Topic t-test Hypothesis test for two means of percentages, 94 Many one sample tests, 131 ∗Topic total sum Energy distance between matrices, 70 Sum of all pairwise distances in a distance matrix, 240 ∗Topic trigamma function Natural logarithm of the gamma function and its derivatives, 193 ∗Topic two samples Two sample exponential empirical likelihood test, 248 ∗Topic uniformity tests Column-wise uniformity Watson test for circular data, 50 ∗Topic uniformity test Uniformity test for circular data, 249 ∗Topic unique numbers Sort and unique numbers, 233 ∗Topic univariate approach Many regression based tests for single sample repeated 266 measures, 134 Repeated measures anova, 214 ∗Topic variable selection Forward selection with generalised linear regression models, 84 ∗Topic variance test Many one sample tests, 131 ∗Topic variances of many samples Column and row-wise variances and standard deviations, 40 ∗Topic von Mises distribution MLE of some circular distributions, 176 ∗Topic von Mises-Fisher distribution Hypothesis test for von Mises-Fisher distribution over Kent distribution, 95 MLE of (hyper-)spherical distributions, 167 Random values simulation from a von Mises distribution, 210 Simulation of random values from a von Mises-Fisher distribution, 225 ∗Topic wrapped Cauchy distribution MLE of some circular distributions, 176 ∗Topic zero inflated Poisson MLE of count data (univariate discrete distributions), 172 ∗Topic zero truncated Poisson MLE of count data (univariate discrete distributions), 172 .lm.fit, 105 AddToNamespace (Insert new function names in the NAMESPACE file), 98 acg.mle, 11, 26, 142 acg.mle (MLE of (hyper-)spherical distributions), 167 AddToNamespace, 23, 213, 235 All k possible combinations from n elements, 7 all_equals (Equality of objects), 71 allbetas, 54, 55, 87, 105, 127, 143, 145, 151–154, 156, 157 INDEX allbetas (Many simple linear regressions coefficients), 145 allttests (Matrix with all pairs of t-tests), 158 Analysis of covariance, 8 Analysis of variance with a count variable, 9 ancova1 (Analysis of covariance), 8 ancovas, 9, 149 ancovas (Many ANCOVAs), 116 Angular central Gaussian random values simulation, 10 anova, 10, 81, 108, 115 ANOVA for two quasi Poisson regression models, 11 anova1, 9, 199 anova1 (Multi-sample tests for vectors), 183 anova_propreg, 207 anova_propreg (Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression), 221 anova_qpois.reg, 12 anova_qpois.reg (Significance testing for the coefficients of Quasi binomial or the quasi Poisson regression), 221 anova_quasipois.reg (ANOVA for two quasi Poisson regression models), 11 anovas, 114, 116 anovas (Many multi-sample tests), 125 ar1 (Estimation of an AR(1) model), 72 as_integer, 21, 244 as_integer (Check if values are integers and convert to integer), 20 auc, 74 auc (Many (and one) area aunder the curve values), 111 Backward selection regression, 13 bc (Estimation of the Box-Cox transformation), 73 bcdcor, 94 bcdcor (Distance correlation), 65 beta.mle, 82, 170, 180, 193, 194 INDEX beta.mle (MLE of distributions defined in the (0, 1) interval), 174 betaprime.mle (MLE of continuous univariate distributions defined on the positive line), 169 BIC (using partial correlation) forward regression, 14 BIC forward regression with generalised linear models, 15 bic.corfsreg, 16 bic.corfsreg (BIC (using partial correlation) forward regression), 14 bic.fs.reg (BIC forward regression with generalised linear models), 15 Binary search algorithm, 16 binary_search, 79, 217 binary_search (Binary search algorithm), 16 bincomb (Permutation), 202 binom.mle (MLE of count data (univariate discrete distributions)), 172 Binomial coefficient and its logarithm, 17 block.anova (Multi-sample tests for vectors), 183 block.anovas, 129 block.anovas (Many multi-sample tests), 125 boot.ttest2 (Bootstrap t-test for 2 independent samples), 18 Bootstrap t-test for 2 independent samples, 18 borel.mle (MLE of count data (univariate discrete distributions)), 172 bs.reg (Backward selection regression), 13 btmprobs (Fitted probabilities of the Terry-Bradley model), 80 cat.goftests (Many one sample goodness of fit tests for categorical data), 130 cauchy.mle (MLE of continuous univariate distributions 267 defined on the real line), 171 Check if any column or row is fill with zeros, 19 Check if values are integers and convert to integer, 20 Check Namespace and Rd files, 21 Check whether a square matrix is symmetric, 23 check_data (Search for variables with zero range in a matrix), 220 checkAliases (Check Namespace and Rd files), 21 checkExamples, 213 checkExamples (Check Namespace and Rd files), 21 checkNamespace (Check Namespace and Rd files), 21 checkRd, 213 checkTF (Check Namespace and Rd files), 21 chisq.mle (MLE of continuous univariate distributions defined on the positive line), 169 cholesky, 24, 100 cholesky (Cholesky decomposition of a square matrix), 24 Cholesky decomposition of a square matrix, 24 Choose, 192, 216 Choose (Binomial coefficient and its logarithm), 17 circlin.cor (Circular-linear correlation), 27 Circular or angular regression, 25 Circular-linear correlation, 27 col.coxpoisrat (Cox confidence interval for the ratio of two Poisson variables), 57 col.yule, 254 col.yule (Column-wise Yule’s Y (coefficient of colligation)), 51 colAll (Column and row-wise Any/All), 29 colanovas (Many Welch’s F-tests), 156 colAny (Column and row-wise Any/All), 29 colar1 (Estimation of an AR(1) model), 72 268 colaucs (Many (and one) area aunder the curve values), 111 colCountValues (Row - Wise matrix/vector count the frequency of a value), 217 colcvs (Column and row wise coefficients of variation), 28 coldiffs, 34, 35 coldiffs (Column-wise differences), 43 colexp2.mle (Column-wise MLE of some univariate distributions), 47 colexpmle (Column-wise MLE of some univariate distributions), 47 colFalse, 63, 109, 219 colFalse (Column-wise true/false value), 49 colgammamle (Column-wise MLE of some univariate distributions), 47 colgeom.mle (MLE for multivariate discrete data), 166 colhameans (Column and row-wise means of a matrix), 30 colinvgauss.mle (Column-wise MLE of some univariate distributions), 47 colkurtosis (Column-wise kurtosis and skewness coefficients), 44 collaplace.mle (Column-wise MLE of some univariate distributions), 47 collindley.mle (Column-wise MLE of some univariate distributions), 47 colMads, 30, 162, 201, 238 colMads (Column and rows-wise mean absolute deviations), 41 colmaxboltz.mle (Column-wise MLE of some univariate distributions), 47 colMaxs, 8, 35, 42, 218, 234, 247 colMaxs (Column-wise minimum and maximum), 46 colMeans, 29, 31, 32, 38, 41 colmeans, 21, 38, 39, 41, 43, 44, 93, 97, 98, 110, 122, 187, 191, 195, 196, 201, 229, 231, 238, 242, 244, 251 colmeans (Column and row-wise means of a matrix), 30 INDEX colMedians, 20, 29, 30, 32, 34, 35, 38, 41, 42, 44, 47, 49, 67, 89, 93, 97, 162, 164, 165, 187, 196, 215, 218, 229, 231, 236, 240 colMedians (Column and row-wise medians), 31 colMins, 8, 30, 35, 42, 163, 218, 234, 247 colMins (Column-wise minimum and maximum), 46 colMinsMaxs (Column-wise minimum and maximum), 46 colnormal.mle (Column-wise MLE of some univariate distributions), 47 colnormlog.mle (Column-wise MLE of some univariate distributions), 47 colnth, 232 colnth (Column and row-wise nth smallest value of a matrix/vector), 32 colOrder (Column and row-wise Order Sort Indices), 33 colpareto.mle (Column-wise MLE of some univariate distributions), 47 colPmax (Column-row wise minima and maxima of two matrices), 42 colPmin (Column-row wise minima and maxima of two matrices), 42 colpois.mle (Column-wise MLE of some univariate distributions), 47 colpois.tests (Many tests for the dispersion parameter in Poisson distribution), 147 colpoisdisp.tests (Many tests for the dispersion parameter in Poisson distribution), 147 colpoisson.mle, 205 colpoisson.mle (MLE for multivariate discrete data), 166 colprods, 34 colprods (Column and row-wise products), 34 colrange, 8, 20, 41, 47, 49, 89, 164, 165, 174, 215, 221, 234, 240, 247 colrange (Column and row-wise range of values of a matrix), 35 colRanks, 212 colRanks (Column and row-wise ranks), 36 INDEX colrayleigh.mle (Column-wise MLE of some univariate distributions), 47 colrint.regbx, 124 colrint.regbx (Many random intercepts LMMs for balanced data with a single identical covariate.), 133 colrow.zero (Check if any column or row is fill with zeros), 19 colShuffle, 39, 98, 244, 251 colShuffle (Column and row-wise Shuffle), 37 colskewness, 97, 122, 187, 229 colskewness (Column-wise kurtosis and skewness coefficients), 44 colsums, 28, 30, 34, 35, 71, 146, 239 colsums (Column and row-wise sums of a matrix), 38 colTabulate, 46 colTabulate (Column and row-wise tabulate), 39 colTrue, 63, 109, 219 colTrue (Column-wise true/false value), 49 colTrueFalse (Column-wise true/false value), 49 Column and row wise coefficients of variation, 28 Column and row-wise Any/All, 29 Column and row-wise means of a matrix, 30 Column and row-wise medians, 31 Column and row-wise nth smallest value of a matrix/vector, 32 Column and row-wise Order - Sort Indices, 33 Column and row-wise products, 34 Column and row-wise range of values of a matrix, 35 Column and row-wise ranks, 36 Column and row-wise Shuffle, 37 Column and row-wise sums of a matrix, 38 Column and row-wise tabulate, 39 Column and row-wise variances and standard deviations, 40 Column and rows-wise mean absolute 269 deviations, 41 Column-row wise minima and maxima of two matrices, 42 Column-wise differences, 43 Column-wise kurtosis and skewness coefficients, 44 Column-wise matching coefficients, 45 Column-wise minimum and maximum, 46 Column-wise MLE of some univariate distributions, 47 Column-wise true/false value, 49 Column-wise uniformity Watson test for circular data, 50 Column-wise Yule’s Y (coefficient of colligation), 51 columns (Get specific columns/rows fo a matrix), 89 colvarcomps.mle, 134, 183 colvarcomps.mle (Many moment and maximum likelihood estimations of variance components), 123 colvarcomps.mom, 210 colvarcomps.mom (Many moment and maximum likelihood estimations of variance components), 123 colVars, 20, 21, 28, 31, 35, 38, 39, 41, 44, 47, 49, 57, 71, 89, 93, 97, 98, 112, 146, 187, 191, 205, 215, 218, 221, 229, 231, 238–240, 242, 244, 251 colVars (Column and row-wise variances and standard deviations), 40 colvm.mle (Column-wise MLE of some univariate distributions), 47 colwatsons (Column-wise uniformity Watson test for circular data), 50 colweibull.mle (Column-wise MLE of some univariate distributions), 47 comb_n, 18, 203 comb_n (All k possible combinations from n elements), 7 combn, 203 cor, 56, 57 cor.fbed, 202 cor.fbed (FBED variable selection 270 method using the correlation), 77 cor.fsreg, 13, 15, 16, 78, 85, 103, 105, 202 cor.fsreg (Correlation based forward regression), 52 cora, 24, 228 cora (Covariance and correlation matrix), 56 corpairs, 122 corpairs (Correlation between pairs of variables), 53 Correlation based forward regression, 52 Correlation between pairs of variables, 53 Correlations, 55 correls, 37, 54, 58, 71, 74, 78, 86, 105, 121, 127, 146, 151–153, 157, 160, 202, 212, 228, 239 correls (Correlations), 55 count_value (Row - Wise matrix/vector count the frequency of a value), 217 countNA (Row - Wise matrix/vector count the frequency of a value), 217 cov, 57 cova, 24, 100, 251 cova (Covariance and correlation matrix), 56 Covariance and correlation matrix, 56 Cox confidence interval for the ratio of two Poisson variables, 57 cox.poisrat (Cox confidence interval for the ratio of two Poisson variables), 57 cqtest (Multi-sample tests for vectors), 183 cqtests (Many non parametric multi-sample tests), 128 Cross-Validation for the k-NN algorithm, 58 ct.mle (MLE of continuous univariate distributions defined on the real line), 171 data.frame.to_matrix, 60, 227 dcor, 68 dcor (Distance correlation), 65 INDEX dcor.ttest, 65, 66 dcor.ttest (Hypothesis test for the distance correlation), 93 dcov, 66, 94 dcov (Distance variance and covariance), 67 Density of the multivariate normal and t distributions, 61 Design Matrix, 62 design_matrix (Design Matrix), 62 Diag.fill, 201 Diag.fill (Diagonal Matrix), 63 Diag.matrix (Diagonal Matrix), 63 Diagonal Matrix, 63 Digamma (Natural logarithm of the gamma function and its derivatives), 193 diri.nr2, 175, 178, 180, 193, 194 diri.nr2 (Fitting a Dirichlet distribution via Newton-Rapshon), 81 dirimultinom.mle (MLE for multivariate discrete data), 166 dirknn (k-NN algorithm using the arc cosinus distance), 103 Dist, 43, 60, 65, 70, 195, 201, 241 Dist (Distance matrix), 66 dista, 43, 60, 67, 70, 110, 195, 201, 241 dista (Distance between vectors and a matrix), 64 Distance between vectors and a matrix, 64 Distance correlation, 65 Distance matrix, 66 Distance variance and covariance, 67 dmvnorm, 179 dmvnorm (Density of the multivariate normal and t distributions), 61 dmvt (Density of the multivariate normal and t distributions), 61 dvar, 70 dvar (Distance variance and covariance), 67 eachcol.apply (Operations between two matrices or matrix and vector), 200 eachrow (Operations between two matrices or matrix and INDEX vector), 200 edist, 66, 68, 94 edist (Energy distance between matrices), 70 eel.test1 (One sample empirical and exponential empirical likelihood test), 197 eel.test2 (Two sample exponential empirical likelihood test), 248 Eigenvalues and eigenvectors in high dimensional principal component analysis, 68 el.test1 (One sample empirical and exponential empirical likelihood test), 197 Energy distance between matrices, 70 Equality of objects, 71 Estimation of an AR(1) model, 72 Estimation of the Box-Cox transformation, 73 exp2.mle (MLE of continuous univariate distributions defined on the positive line), 169 expmle (MLE of continuous univariate distributions defined on the positive line), 169 Exponential empirical likelihood for a one sample mean vector hypothesis testing, 74 Exponential empirical likelihood hypothesis testing for two mean vectors, 75 expregs (Many exponential regressions), 117 FBED variable selection method using the correlation, 77 Find element, 78 Find the given value in a hash table, 79 fish.kent (Hypothesis test for von Mises-Fisher distribution over Kent distribution), 95 Fitted probabilities of the Terry-Bradley model, 80 Fitting a Dirichlet distribution via Newton-Rapshon, 81 floyd, 246 floyd (Floyd-Warshall algorithm), 82 271 Floyd-Warshall algorithm, 82 foldnorm.mle (MLE of continuous univariate distributions defined on the positive line), 169 Forward selection with generalised linear regression models, 84 freq.max (minimum and maximum frequencies), 165 freq.min (minimum and maximum frequencies), 165 fs.reg, 13, 16, 78, 103, 202 fs.reg (Forward selection with generalised linear regression models), 84 ftest, 19, 95, 123, 198 ftest (Multi-sample tests for vectors), 183 ftests, 9, 111, 112, 114, 116, 118, 119, 129, 131, 132, 140, 149, 157, 159, 185, 249 ftests (Many multi-sample tests), 125 G-square test of conditional indepdence, 85 g2Test, 10, 121, 160, 197, 227, 228 g2Test (G-square test of conditional indepdence), 85 g2Test_perm, 121, 160 g2Test_perm (G-square test of conditional indepdence), 85 g2Test_univariate, 85, 86, 130, 159, 228 g2Test_univariate (Matrix with G-square tests of indepedence), 159 g2Test_univariate_perm, 85, 86 g2Test_univariate_perm (Matrix with G-square tests of indepedence), 159 g2tests, 81, 115 g2tests (Many G-square tests of indepedence), 119 g2tests_perm (Many G-square tests of indepedence), 119 gammamle, 48, 172, 181 gammamle (MLE of continuous univariate distributions defined on the positive line), 169 gammanb (Naive Bayes classifiers), 189 272 gammanb.pred (Prediction with some naive Bayes classifiers), 204 Gaussian regression with a log-link, 87 gaussian.nb, 179, 205 gaussian.nb (Naive Bayes classifiers), 189 gaussiannb.pred (Prediction with some naive Bayes classifiers), 204 Generates random values from a normal and puts them in a matrix, 88 geom.anova (Analysis of variance with a count variable), 9 geom.anovas (Many analysis of variance tests with a discrete variable), 114 geom.mle (MLE of count data (univariate discrete distributions)), 172 geom.nb (Naive Bayes classifiers), 189 geom.regs (Many simple geometric regressions), 143 geomnb.pred (Prediction with some naive Bayes classifiers), 204 Get specific columns/rows fo a matrix, 89 ginis (Many Gini coefficients), 121 glm_logistic, 85, 186 glm_logistic (Logistic and Poisson regression models), 105 glm_poisson, 85 glm_poisson (Logistic and Poisson regression models), 105 group.all (Some summary statistics of a vector for each level of a grouping variable), 230 group.any (Some summary statistics of a vector for each level of a grouping variable), 230 group.mad (Some summary statistics of a vector for each level of a grouping variable), 230 group.max (Some summary statistics of a vector for each level of a grouping variable), 230 group.mean (Some summary statistics of a vector for each level of a grouping variable), 230 group.med (Some summary statistics of INDEX a vector for each level of a grouping variable), 230 group.min (Some summary statistics of a vector for each level of a grouping variable), 230 group.min_max (Some summary statistics of a vector for each level of a grouping variable), 230 group.sum, 231 group.sum (Sums of a vector for each level of a grouping variable), 242 group.var (Some summary statistics of a vector for each level of a grouping variable), 230 groupcolVars, 231 groupcolVars (Column and row-wise variances and standard deviations), 40 groupcorrels (Correlations), 55 gumbel.mle (MLE of continuous univariate distributions defined on the real line), 171 halfnorm.mle (MLE of continuous univariate distributions defined on the positive line), 169 Hash - Pair function, 90 Hash object to a list object, 91 hash.find, 90, 91 hash.find (Find the given value in a hash table), 79 hash.list, 80, 91 hash.list (Hash - Pair function), 90 hash2list (Hash object to a list object), 91 hd.eigen (Eigenvalues and eigenvectors in high dimensional principal component analysis), 68 High dimensional MCD based detection of outliers, 92 hsecant01.mle (MLE of distributions defined in the (0, 1) interval), 174 Hypothesis test for the distance correlation, 93 Hypothesis test for two means of percentages, 94 INDEX Hypothesis test for von Mises-Fisher distribution over Kent distribution, 95 Hypothesis testing betseen two skewness or kurtosis coefficients, 96 iag.mle, 26, 61, 96, 142, 226 iag.mle (MLE of (hyper-)spherical distributions), 167 ibeta.mle (MLE of distributions defined in the (0, 1) interval), 174 Index of the columns of a data.frame which are factor variables, 97 Insert new function names in the NAMESPACE file, 98 invdir.mle (MLE of the inverted Dirichlet distribution), 177 Inverse of a symmetric positive definite matrix, 99 invgauss.mle (MLE of continuous univariate distributions defined on the positive line), 169 is.symmetric, 25, 61 is.symmetric (Check whether a square matrix is symmetric), 23 is_element, 17 is_element (Find element), 78 is_integer, 244 is_integer (Check if values are integers and convert to integer), 20 james, 75, 77 james (James multivariate version of the t-test), 100 James multivariate version of the t-test, 100 k nearest neighbours algorithm (k-NN), 101 k-NN algorithm using the arc cosinus distance, 103 knn, 60, 104 knn (k nearest neighbours algorithm (k-NN)), 101 273 knn.cv (Cross-Validation for the k-NN algorithm), 58 kruskaltest (Multi-sample tests for vectors), 183 kruskaltests (Many non parametric multi-sample tests), 128 kuiper (Uniformity test for circular data), 249 kurt (Skewness and kurtosis coefficients), 229 kurt.test2 (Hypothesis testing betseen two skewness or kurtosis coefficients), 96 laplace.mle (MLE of continuous univariate distributions defined on the real line), 171 Lbeta, 18, 192 Lbeta (Natural logarithm of the beta function), 192 Lchoose, 192, 216 Lchoose (Binomial coefficient and its logarithm), 17 Lgamma, 18, 193 Lgamma (Natural logarithm of the gamma function and its derivatives), 193 lindley.mle (MLE of continuous univariate distributions defined on the positive line), 169 Linear models for large scale data, 104 list.ftests (Many F-tests with really huge matrices), 118 lm, 105 lm.fit, 105 lmfit (Linear models for large scale data), 104 Log, 216 Log (Natural Logarithm each element of a matrix), 191 logcauchy.mle (MLE of continuous univariate distributions defined on the positive line), 169 Logistic and Poisson regression models, 105 Logistic or Poisson regression with a single categorical predictor, 274 107 logistic.cat1, 10 logistic.cat1 (Logistic or Poisson regression with a single categorical predictor), 107 logistic.mle (MLE of continuous univariate distributions defined on the real line), 171 logistic_only, 15, 16, 53, 85, 88, 103, 106, 108, 118, 137, 139, 145, 151, 154, 156, 186, 207, 222, 253 logistic_only (Many univariate simple binary logistic regressions), 151 logitnorm.mle (MLE of distributions defined in the (0, 1) interval), 174 loglogistic.mle (MLE of continuous univariate distributions defined on the positive line), 169 lognorm.mle (MLE of continuous univariate distributions defined on the positive line), 169 logseries.mle (MLE of count data (univariate discrete distributions)), 172 lomax.mle (MLE of continuous univariate distributions defined on the positive line), 169 Lower and Upper triangular of a matrix, 108 lower_tri (Lower and Upper triangular of a matrix), 108 mad2 (Mean - Median absolute deviation of a vector), 161 mahala, 65 mahala (Mahalanobis distance), 110 Mahalanobis distance, 110 Many (and one) area aunder the curve values, 111 Many 2 sample proportions tests, 112 Many 2 sample tests, 113 Many analysis of variance tests with a discrete variable, 114 Many ANCOVAs, 116 INDEX Many exponential regressions, 117 Many F-tests with really huge matrices, 118 Many G-square tests of indepedence, 119 Many Gini coefficients, 121 Many hypothesis tests for two means of percentages, 122 Many moment and maximum likelihood estimations of variance components, 123 Many multi-sample tests, 125 Many multivariate simple linear regressions coefficients, 127 Many non parametric multi-sample tests, 128 Many odds ratio tests, 129 Many one sample goodness of fit tests for categorical data, 130 Many one sample tests, 131 Many random intercepts LMMs for balanced data with a single identical covariate., 133 Many regression based tests for single sample repeated measures, 134 Many score based GLM regressions, 136 Many score based regression models, 138 Many Shapiro-Francia normality tests, 139 Many simple circular or angular regressions, 141 Many simple Gaussian regressions with a log-link, 142 Many simple geometric regressions, 143 Many simple linear mixed model regressions, 144 Many simple linear regressions coefficients, 145 Many simple multinomial regressions, 146 Many tests for the dispersion parameter in Poisson distribution, 147 Many two-way ANOVAs, 148 Many univariate generalised linear models, 149 Many univariate simple binary logistic regressions, 151 Many univariate simple linear INDEX regressions, 152 Many univariate simple poisson regressions, 154 Many univariate simple quasi poisson regressions, 155 Many Welch’s F-tests, 156 mat.mat (Number of equal columns between two matrices), 195 Match, 61, 71, 97, 157, 196, 239 match, 158 match.coefs (Column-wise matching coefficients), 45 Matrix with all pairs of t-tests, 158 Matrix with G-square tests of indepedence, 159 matrix.sum (Sum of a matrix), 240 matrnorm, 225 matrnorm (Generates random values from a normal and puts them in a matrix), 88 maxboltz.mle (MLE of continuous univariate distributions defined on the positive line), 169 mcnemar (Multi-sample tests for vectors), 183 mcnemars (Many 2 sample tests), 113 Mean - Median absolute deviation of a vector, 161 med, 29, 31, 32, 38, 162, 217, 242 med (Median of a vector), 162 Median of a vector, 162 mediandir (Spherical and hyperspherical median), 237 min_max (minimum and maximum), 164 Minima and maxima of two vectors/matrices, 163 minimum and maximum, 164 minimum and maximum frequencies, 165 MLE for multivariate discrete data, 166 MLE of (hyper-)spherical distributions, 167 MLE of continuous univariate distributions defined on the positive line, 169 MLE of continuous univariate distributions defined on the real line, 171 275 MLE of count data (univariate discrete distributions), 172 MLE of distributions defined in the (0, 1) interval, 174 MLE of some circular distributions, 176 MLE of the inverted Dirichlet distribution, 177 MLE of the multivariate normal distribution, 178 MLE of the ordinal model without covariates, 179 MLE of the tobit model, 180 model.matrix, 62 Moment and maximum likelihood estimation of variance components, 181 Multi-sample tests for vectors, 183 multinom.mle, 178, 179 multinom.mle (MLE for multivariate discrete data), 166 multinom.nb (Naive Bayes classifiers), 189 multinom.reg (Multinomial regression), 185 multinom.regs (Many simple multinomial regressions), 146 Multinomial regression, 185 multinomnb.pred (Prediction with some naive Bayes classifiers), 204 Multivariate kurtosis, 186 Multivariate Laplace random values simulation, 187 Multivariate normal and t random values simulation, 188 multivmf.mle (MLE of (hyper-)spherical distributions), 167 mv.eeltest1, 77 mv.eeltest1 (Exponential empirical likelihood for a one sample mean vector hypothesis testing), 74 mv.eeltest2, 75, 101 mv.eeltest2 (Exponential empirical likelihood hypothesis testing for two mean vectors), 75 mvbetas, 54, 71, 105, 146, 153, 239 mvbetas (Many multivariate simple linear regressions 276 coefficients), 127 mvkurtosis (Multivariate kurtosis), 186 mvnorm.mle, 61 mvnorm.mle (MLE of the multivariate normal distribution), 178 Naive Bayes classifiers, 189 Natural Logarithm each element of a matrix, 191 Natural logarithm of the beta function, 192 Natural logarithm of the gamma function and its derivatives, 193 negbin.mle, 148, 167, 245 negbin.mle (MLE of count data (univariate discrete distributions)), 172 Norm (Norm of a matrix), 194 Norm of a matrix, 194 normal.mle, 48, 170, 181 normal.mle (MLE of continuous univariate distributions defined on the real line), 171 normlog.mle (MLE of continuous univariate distributions defined on the positive line), 169 normlog.reg, 143 normlog.reg (Gaussian regression with a log-link), 87 normlog.regs, 87 normlog.regs (Many simple Gaussian regressions with a log-link), 142 nth, 8, 20, 35, 47, 49, 63, 89, 97, 109, 162, 164, 165, 215, 217–219, 232, 234, 240, 247 nth (Column and row-wise nth smallest value of a matrix/vector), 32 Number of equal columns between two matrices, 195 odds, 46, 51, 197 odds (Many odds ratio tests), 129 Odds ratio and relative risk, 196 odds.ratio, 130, 254 odds.ratio (Odds ratio and relative risk), 196 INDEX omp (Orthogonal matching pursuit regression), 201 ompr, 78 ompr (Orthogonal matching pursuit regression), 201 One sample empirical and exponential empirical likelihood test, 197 One sample t-test for a vector, 198 Operations between two matrices or matrix and vector, 200 Order, 217 Order (Column and row-wise Order Sort Indices), 33 ordinal.mle (MLE of the ordinal model without covariates), 179 Orthogonal matching pursuit regression, 201 pareto.mle (MLE of continuous univariate distributions defined on the positive line), 169 pc.skel, 204, 246 pc.skel (Skeleton of the PC algorithm), 227 percent.ttest (Hypothesis test for two means of percentages), 94 percent.ttests (Many hypothesis tests for two means of percentages), 122 permcor (Permutation based p-value for the Pearson correlation coefficient), 203 Permutation, 202 permutation, 61 permutation (Permutation), 202 Permutation based p-value for the Pearson correlation coefficient, 203 Pmax (Minima and maxima of two vectors/matrices), 163 Pmin (Minima and maxima of two vectors/matrices), 163 Pmin_Pmax (Minima and maxima of two vectors/matrices), 163 pois.test (Tests for the dispersion parameter in Poisson distribution), 244 INDEX poisdisp.test (Tests for the dispersion parameter in Poisson distribution), 244 poisson.anova, 81, 108, 115, 148, 245 poisson.anova (Analysis of variance with a count variable), 9 poisson.anovas, 10, 108, 148, 245 poisson.anovas (Many analysis of variance tests with a discrete variable), 114 poisson.cat1 (Logistic or Poisson regression with a single categorical predictor), 107 poisson.mle, 10, 48, 81, 115, 148, 167, 245 poisson.mle (MLE of count data (univariate discrete distributions)), 172 poisson.nb, 167 poisson.nb (Naive Bayes classifiers), 189 poisson_only, 10, 15, 16, 53, 81, 85, 106, 108, 115, 118, 137, 139, 144, 147, 148, 151, 152, 156, 174, 208, 245, 253 poisson_only (Many univariate simple poisson regressions), 154 poissonnb.pred (Prediction with some naive Bayes classifiers), 204 Prediction with some naive Bayes classifiers, 204 prop.reg, 94, 95, 122, 123, 208, 222 prop.reg (Quasi binomial regression for proportions), 205 prop.regs, 87, 143, 144, 147 prop.regs (Quasi binomial regression for proportions), 205 proptest (Many one sample tests), 131 proptests (Many 2 sample proportions tests), 112 qpois.reg, 12, 222 qpois.reg (Quasi Poisson regression for count data), 207 qpois.regs (Quasi Poisson regression for count data), 207 Quasi binomial regression for proportions, 205 Quasi Poisson regression for count data, 207 277 quasi.poisson_only (Many univariate simple quasi poisson regressions), 155 quasipoisson.anova, 12 quasipoisson.anova (Analysis of variance with a count variable), 9 quasipoisson.anovas (Many analysis of variance tests with a discrete variable), 114 racg, 168, 188, 189 racg (Angular central Gaussian random values simulation), 10 Random intercepts linear mixed models, 209 Random values simulation from a von Mises distribution, 210 Rank, 37 Rank (Ranks of the values of a vector), 211 Ranks of the values of a vector, 211 rayleigh.mle (MLE of continuous univariate distributions defined on the positive line), 169 rbing, 224 rbing (Simulation of random values from a Bingham distribution), 222 rbingham (Simulation of random values from a Bingham distribution with any symmetric matrix), 223 read.directory, 21, 23, 98, 235, 244 read.directory (Reading the files of a directory), 212 read.examples, 23 read.examples (Reading the files of a directory), 212 Reading the files of a directory, 212 regression, 15, 16, 53, 105, 106, 118, 137, 139, 151, 152, 154, 156, 157, 253 regression (Many univariate simple linear regressions), 152 rel.risk (Odds ratio and relative risk), 196 rep_col (Replicate columns/rows), 215 rep_row (Replicate columns/rows), 215 Repeated measures anova, 214 278 Replicate columns/rows, 215 Rfast-package, 6 rint.mle (Moment and maximum likelihood estimation of variance components), 181 rint.reg, 134, 136, 145, 183, 214 rint.reg (Random intercepts linear mixed models), 209 rint.regbx, 133, 134, 136, 183 rint.regbx (Random intercepts linear mixed models), 209 rint.regs (Many simple linear mixed model regressions), 144 rm.anova (Repeated measures anova), 214 rm.anovas, 73, 214 rm.anovas (Many regression based tests for single sample repeated measures), 134 rm.lines, 73, 134, 210 rm.lines (Many regression based tests for single sample repeated measures), 134 rmdp, 69 rmdp (High dimensional MCD based detection of outliers), 92 rmvlaplace, 11, 189 rmvlaplace (Multivariate Laplace random values simulation), 187 rmvnorm, 11, 61, 88, 188, 225 rmvnorm (Multivariate normal and t random values simulation), 188 rmvt, 11, 61, 188, 189 rmvt (Multivariate normal and t random values simulation), 188 Rnorm, 88 Rnorm (Simulation of random values from a normal distribution), 224 Round, 232 Round (Round each element of a matrix/vector), 216 Round each element of a matrix/vector, 216 Row - Wise matrix/vector count the frequency of a value, 217 Row-wise minimum and maximum, 218 Row-wise true value, 219 rowAll (Column and row-wise Any/All), 29 INDEX rowAny (Column and row-wise Any/All), 29 rowCountValues (Row - Wise matrix/vector count the frequency of a value), 217 rowcvs (Column and row wise coefficients of variation), 28 rowFalse, 20, 49, 89, 215, 240 rowFalse (Row-wise true value), 219 rowhameans (Column and row-wise means of a matrix), 30 rowMads, 201 rowMads (Column and rows-wise mean absolute deviations), 41 rowMaxs, 35, 47, 164, 165 rowMaxs (Row-wise minimum and maximum), 218 rowmeans (Column and row-wise means of a matrix), 30 rowMedians, 41, 63, 109, 219 rowMedians (Column and row-wise medians), 31 rowMins, 20, 35, 47, 49, 63, 89, 109, 164, 165, 215, 219, 240 rowMins (Row-wise minimum and maximum), 218 rowMinsMaxs (Row-wise minimum and maximum), 218 rownth, 232 rownth (Column and row-wise nth smallest value of a matrix/vector), 32 rowOrder (Column and row-wise Order Sort Indices), 33 rowprods (Column and row-wise products), 34 rowrange, 63, 109, 218, 219 rowrange (Column and row-wise range of values of a matrix), 35 rowRanks (Column and row-wise ranks), 36 rows (Get specific columns/rows fo a matrix), 89 rowShuffle (Column and row-wise Shuffle), 37 rowsums, 30 rowsums (Column and row-wise sums of a matrix), 38 rowTabulate (Column and row-wise tabulate), 39 INDEX rowTrue, 20, 49, 89, 215, 240 rowTrue (Row-wise true value), 219 rowTrueFalse (Row-wise true value), 219 rowVars, 63, 109, 219 rowVars (Column and row-wise variances and standard deviations), 40 rvmf, 88, 168, 177, 211, 223–225 rvmf (Simulation of random values from a von Mises-Fisher distribution), 225 rvonmises, 50, 88, 177, 225, 226, 250 rvonmises (Random values simulation from a von Mises distribution), 210 score.betaregs (Many score based regression models), 138 score.expregs (Many score based regression models), 138 score.gammaregs (Many score based regression models), 138 score.geomregs, 144, 147 score.geomregs (Many score based regression models), 138 score.glms, 15, 16, 53, 87, 118, 139, 143, 145, 207, 208, 222 score.glms (Many score based GLM regressions), 136 score.invgaussregs (Many score based regression models), 138 score.multinomregs, 186 score.multinomregs (Many score based GLM regressions), 136 score.negbinregs (Many score based GLM regressions), 136 score.weibregs (Many score based regression models), 138 score.ztpregs (Many score based regression models), 138 Search for variables with zero range in a matrix, 220 sftest (Many Shapiro-Francia normality tests), 139 sftests, 44, 88 sftests (Many Shapiro-Francia normality tests), 139 Significance testing for the coefficients of Quasi 279 binomial or the quasi Poisson regression, 221 Simulation of random values from a Bingham distribution, 222 Simulation of random values from a Bingham distribution with any symmetric matrix, 223 Simulation of random values from a normal distribution, 224 Simulation of random values from a von Mises-Fisher distribution, 225 Skeleton of the PC algorithm, 227 skew, 44, 97 skew (Skewness and kurtosis coefficients), 229 skew.test2, 44, 187, 229 skew.test2 (Hypothesis testing betseen two skewness or kurtosis coefficients), 96 Skewness and kurtosis coefficients, 229 Some summary statistics of a vector for each level of a grouping variable, 230 Sort, 42, 163 Sort (Sort - Sort a vector coresponding to another), 231 Sort - Sort a vector coresponding to another, 231 Sort and unique numbers, 233 sort_cor_vectors, 233, 234 sort_cor_vectors (Sort - Sort a vector coresponding to another), 231 sort_mat, 20, 35, 42, 47, 49, 63, 83, 89, 109, 163–165, 215, 218, 219, 233, 240 sort_mat (Sorting of the columns-rows of a matrix), 234 sort_unique, 232, 234 sort_unique (Sort and unique numbers), 233 Sorting of the columns-rows of a matrix, 234 Source many R files, 235 sourceR, 23, 213 sourceR (Source many R files), 235 sourceRd, 23, 213 sourceRd (Source many R files), 235 spat.med (Spatial median for Euclidean data), 236 280 Spatial median for Euclidean data, 236 spdinv (Inverse of a symmetric positive definite matrix), 99 Spherical and hyperspherical median, 237 spml.mle, 26, 104, 142 spml.mle (MLE of some circular distributions), 176 spml.reg, 27 spml.reg (Circular or angular regression), 25 spml.regs (Many simple circular or angular regressions), 141 squareform (Vector allocation in a symmetric matrix), 251 Standardisation, 238 standardise (Standardisation), 238 Sub-matrix, 239 submatrix (Sub-matrix), 239 Sum of a matrix, 240 Sum of all pairwise distances in a distance matrix, 240 Sums of a vector for each level of a grouping variable, 242 Table, 58 Table (Table Creation - Frequency of each value), 243 Table Creation - Frequency of each value, 243 Tests for the dispersion parameter in Poisson distribution, 244 tmle (MLE of continuous univariate distributions defined on the real line), 171 tobit.mle (MLE of the tobit model), 180 Topological sort of a DAG, 246 topological_sort (Topological sort of a DAG), 246 total.dist, 65, 70 total.dist (Sum of all pairwise distances in a distance matrix), 240 total.dista, 65, 70 total.dista (Sum of all pairwise distances in a distance matrix), 240 transpose (Transpose of a matrix), 247 Transpose of a matrix, 247 INDEX Trigamma (Natural logarithm of the gamma function and its derivatives), 193 ttest, 111, 114, 131, 140, 159, 199, 249 ttest (Many one sample tests), 131 ttest1, 198 ttest1 (One sample t-test for a vector), 198 ttest2, 19, 95, 123 ttest2 (Multi-sample tests for vectors), 183 ttests, 9, 111, 112, 116, 119, 126, 131, 132, 140, 149, 159, 185, 199, 249 ttests (Many 2 sample tests), 113 ttests.pairs (Matrix with all pairs of t-tests), 158 Two sample exponential empirical likelihood test, 248 twoway.anova (Multi-sample tests for vectors), 183 twoway.anovas (Many two-way ANOVAs), 148 Uniformity test for circular data, 249 univglms, 12, 13, 15, 16, 53, 55, 71, 86, 106, 118, 121, 127, 137, 139, 145, 146, 153, 154, 156, 160, 207, 208, 222, 239, 253 univglms (Many univariate generalised linear models), 149 univglms2 (Many univariate generalised linear models), 149 upper_tri (Lower and Upper triangular of a matrix), 108 Var, 242 Var (Variance of a vector), 250 var2test (Multi-sample tests for vectors), 183 var2tests (Many 2 sample tests), 113 varcomps.mle, 73, 124, 136, 214 varcomps.mle (Moment and maximum likelihood estimation of variance components), 181 varcomps.mom, 134, 210 varcomps.mom (Moment and maximum likelihood estimation of variance components), 181 Variance of a vector, 250 vartest (Many one sample tests), 131 INDEX vartests (Many multi-sample tests), 125 vecdist (Distance matrix), 66 Vector allocation in a symmetric matrix, 251 vm.mle, 48, 168, 172, 211 vm.mle (MLE of some circular distributions), 176 vmf.mle, 50, 96, 104, 177, 226, 238, 250 vmf.mle (MLE of (hyper-)spherical distributions), 167 watson, 50 watson (Uniformity test for circular data), 249 weib.reg (Weibull regression model), 252 Weibull regression model, 252 weibull.mle (MLE of continuous univariate distributions defined on the positive line), 169 which_isFactor (Index of the columns of a data.frame which are factor variables), 97 wigner.mle (MLE of continuous univariate distributions defined on the real line), 171 wrapcauchy.mle (MLE of some circular distributions), 176 XopY.sum (Operations between two matrices or matrix and vector), 200 yule, 51 yule (Yule’s Y (coefficient of colligation)), 253 Yule’s Y (coefficient of colligation), 253 zip.mle, 167, 170, 172 zip.mle (MLE of count data (univariate discrete distributions)), 172 ztp.mle, 167 ztp.mle (MLE of count data (univariate discrete distributions)), 172 281

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Log In

Rfast reference manual

Sign up to get access to over 50M papers

Sign up for access to the world's latest research

Related papers

Related papers

Related topics

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.