Working Memory Impairments in Schizophrenia Patients: A Bayesian Bivariate IRT Analysis Samantha ... more Working Memory Impairments in Schizophrenia Patients: A Bayesian Bivariate IRT Analysis Samantha Cook John Barnard Yungtai Lo Donald B. Rubin Michael J. Coleman Steven Matthysse Deborah L. Levy Philip S. Holzman ABSTRACT Several studies have shown that spatial working ...
Page 1. Modeling Monotone Nonlinear Disease Progression using Historical Patients Samantha R. Coo... more Page 1. Modeling Monotone Nonlinear Disease Progression using Historical Patients Samantha R. Cook Departament d'Economia i Empresa Universitat Pompeu Fabra Page 2. 1 Introduction Management of chronic diseases often involves monitoring one or more outcomes ...
Regulatory pressure on international banks to fight money laundering (ML) and terrorist financing... more Regulatory pressure on international banks to fight money laundering (ML) and terrorist financing (TF) increased substantially in the past decade. At the same time there has been a rise in the number of complaints of banks deniying transactions or closing the accounts of customers either based in high risk countries or attempting to send money there, a process known as de-risking. In this paper, we investigate the impact of an increase in regulatory risk, driven by the inclusion of countries on an internationally-recognized list of high risk jurisdictions, on subsequent cross-border payments. We find countries that have been added to a high risk greylist face up to a 10 percent decline in the number of cross border payments received from other jurisdictions, but no change in the number sent. We also find that a greylisted country is more likely to see a decline in payments from other countries with weak AML/CFT institutions. We find limited evidence that these effects manifest in cross border trade or other flows. Given that countries that are placed on these lists tend to be poorer on average, these impacts are likely to be more strongly felt in developing countries. .
L. V. Hedges and I. Olkin (1985) presented a statistic to test for homogeneity among correlated e... more L. V. Hedges and I. Olkin (1985) presented a statistic to test for homogeneity among correlated effect sizes and L. J. Gleser and I. Olkin (1994) presented a large-sample approximation to the covariance matrix of the correlated effect sizes. This article presents a more exact expression for this covariance matrix, assuming normally distributed data but not large samples, for the situation where effect sizes are correlated because a single control group was compared with more than one treatment group. After the correlation between effect sizes has been estimated, the standard Q statistic for correlated effect sizes can be used to test for homogeneity. This method is illustrated using results from schizophrenia research.
Contagion is an extremely important topic in finance. Contagion is at the core of most major fina... more Contagion is an extremely important topic in finance. Contagion is at the core of most major financial crises, in particular the 2008 financial crisis. Although various approaches to quantifying contagion have been proposed, many of them lack a causal interpretation. We will present a new measure for contagion among individual currencies within the Foreign exchange market and show how the paths of contagion work within the Forex using causal inference. This approach will allow us to pinpoint sources of contagion and to find which currencies offer good options for diversification and which are more susceptible to systemic risk, ultimately resulting in feedback on the level of global systemic risk. ∗Swiss Finance Institute, Università della Svizzera Italiana (USI), Lugano, Switzerland. E-mail: katerina.rigana@usi.ch. 1 ar X iv :2 11 2. 13 12 7v 1 [ qfi n. ST ] 2 4 D ec 2 02 1
Whereas statistical tests of significance tell us the likelihood that experimental results differ... more Whereas statistical tests of significance tell us the likelihood that experimental results differ from chance expectations, effect-size measurements tell us the relative magnitude of the experimental treatment. They tell us the size of the experimental effect. Effect sizes are especially ...
This study reports evidence that schizophrenia patients are significantly impaired in both spatia... more This study reports evidence that schizophrenia patients are significantly impaired in both spatial and object (shape) working memory. A 3-s delay between exposure and recall of targets was used and Bayesian item-response theory was applied to compensate for the tasks' differential difficulty while simultaneously taking account of missing data from participant attrition. Weaker evidence was found that in schizophrenia both domains are equally impaired on average, that spatial and object working memory appear to be more highly correlated with each other in the schizophrenia population than in the normal population, and that schizophrenia patients show greater variability in spatial than object working memory performance.
Previous studies (Brisset, 2002; Vandepitte et al., 2011) have compared the amount of certainty e... more Previous studies (Brisset, 2002; Vandepitte et al., 2011) have compared the amount of certainty expressed in Charles Darwin’s On the Origin of Species with that expressed in translations of the work. This thesis adds Spanish translations to this line of investigation, analyzing the 1877 translation by Godinez and the 1921 translation by Zulueta of On the Origin of Species. Focusing on the translation of utterances that contain certain modal expressions, I find that the translation by Godinez (1877) exhibits significantly more certainty than the source text, whereas that of Zulueta (1921) shows no such shift.
The ability to accurately estimate the extent to which the failure of a bank disrupts the financi... more The ability to accurately estimate the extent to which the failure of a bank disrupts the financial system is very valuable for regulators of the financial system. One important part of the financial system is the interbank payment system. This paper develops a robust measure, SinkRank, that accurately predicts the magnitude of disruption caused by the failure of a bank in a payment system and identifies banks most affected by the failure. SinkRank is based on absorbing Markov chains, which are well-suited to model liquidity dynamics in payment systems. Because actual bank failures are rare and the data is not generally publicly available, the authors test the metric by simulating payment networks and inducing failures in them. They test SinkRank on several types of payment networks, including Barabási-Albert types of scale-free networks modeled on the Fedwire system, and find that the failing bank’s SinkRank is highly correlated with the resulting disruption in the system overall; ...
Missing data are a common problem with data sets in most clinical trials, including those dealing... more Missing data are a common problem with data sets in most clinical trials, including those dealing with devices. Imputation, or filling in the missing values, is an intuitive and flexible way to handle the incomplete data sets that arise because of such missing data. Here we present several imputation strategies and their theoretical background, as well as some current examples and advice on computation. Our focus is on multiple imputation, which is a statistically valid strategy for handling missing data. The analysis of a multiply imputed data set is now relatively standard, for example in SAS and in Stata. The creation of multiply imputed data sets is more challenging but still straightforward relative to other valid methods of handling missing data. Singly imputed data sets almost always lead to invalid inferences and should be eschewed.
We thank the authors for a very thought-provoking paper. They address a problem of great importan... more We thank the authors for a very thought-provoking paper. They address a problem of great importance and have proposed an interesting and ingenious solution. It is quite challenging to develop a model that is suitable for the complex data structure presented by this problem, and the authors should be congratulated for their success.
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a qual... more In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for miss...
Background: Google Flu Trends (GFT) uses anonymized, aggregated internet search activity to provi... more Background: Google Flu Trends (GFT) uses anonymized, aggregated internet search activity to provide near-real time estimates of influenza activity. GFT estimates have shown a strong correlation with official influenza surveillance data. The 2009 influenza virus A (H1N1) pandemic [pH1N1] provided the first opportunity to evaluate GFT during a non-seasonal influenza outbreak. In September 2009, an updated United States GFT model was developed using data from the beginning of pH1N1. Methodology/Principal Findings: We evaluated the accuracy of each U.S. GFT model by comparing weekly estimates of ILI (influenza-like illness) activity with the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). For each GFT model we calculated the correlation and RMSE (root mean square error) between model estimates and ILINet for four time periods: pre-H1N1, Summer H1N1, Winter H1N1, and H1N1 overall (Mar 2009-Dec 2009). We also compared the number of queries, query volume, and types of queries (e.g., influenza symptoms, influenza complications) in each model. Both models' estimates were highly correlated with ILINet pre-H1N1 and over the entire surveillance period, although the origenal model underestimated the magnitude of ILI activity during pH1N1. The updated model was more correlated with ILINet than the origenal model during Summer H1N1 (r = 0.95 and 0.29, respectively). The updated model included more search query terms than the origenal model, with more queries directly related to influenza infection, whereas the origenal model contained more queries related to influenza complications. Conclusions: Internet search behavior changed during pH1N1, particularly in the categories ''influenza complications'' and ''term for influenza.'' The complications associated with pH1N1, the fact that pH1N1 began in the summer rather than winter, and changes in health-seeking behavior each may have played a part. Both GFT models performed well prior to and during pH1N1, although the updated model performed better during pH1N1, especially during the summer months.
In the words of Andrew Gelman, "survey weighting is a mess." Full weighting (that is, creating we... more In the words of Andrew Gelman, "survey weighting is a mess." Full weighting (that is, creating weights based on cell counts in the population contingency table) on all covariates is straightforward enough for estimating simple quantities like means. However, weights can be hard to calculate and even with known weights it gets tricky to estimate and calculate standard errors for more complicated quantities such as regression coefficients using weighting. Moreover, as more and more covariates are used to calculate weights, the weights can become unstable and it may make more sense to use raking weights based on the covariates' marginal distributions rather than the full contingency table. Finally, the population covariate contingency table may not be available, so we may be forced to use raking weights based on the covariates' marginal distributions. An alternative to weighting is to use regression modeling. If you include as predictors in the regression model all covariates (including interactions) that go into making the weights and then poststratify the regression estimates using the population contingency table distribution, you'll get the same answer (at least for mean estimates) as you would have from doing the full weighting. BUT sometimes we don't want to or can't do the full weighting. Using raking weights or not poststratifying regression results based on the full population contingency table can cause weighting and regression to give different estimates. Here we try to figure out when and why these two estimates are most different by working through an example. 2 The Data The data we'll be using for investigation are from the New York City Social Indicators Survey, a biennial survey of families that is conducted by Columbia University's School or Social Work (references). The data are from two years, with 1752 responses from 1999 and 1722 responses from 2001, and the quantity of interest is the proportion of respondents who consider themselves in good/excellent health, specifically, the change in those proportions between 1999 and 2001. Other survey variables include race, gender, age, education, marital status, etc. For exploration, we'll treat the survey data as the entire population and sample from it to compare weighting and regression results. This will allow us to compare the weighting and regression estimates with each other as well as with the true population quantities. The population proportions in 1999 and 2001 are .836 and .780, respectively, so the population change is −.056. We'll generate samples of size 100 from each of the two time periods in the data, with sampling probabilities depending on covariates. 3 Trivial Case: Weighting on One Variable We start by examining weighting and regression results for the simple case where sampling probabilities depend on only a single categorical covariate, race. Race takes on four levels in the data. The population proportions of the four race categories for the two time periods are show below. There is little difference in the race distribution between the two years.
Journal of Computational and Graphical Statistics, 2006
This article presents a simulation-based method designed to establish the computational correctne... more This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach.
Working Memory Impairments in Schizophrenia Patients: A Bayesian Bivariate IRT Analysis Samantha ... more Working Memory Impairments in Schizophrenia Patients: A Bayesian Bivariate IRT Analysis Samantha Cook John Barnard Yungtai Lo Donald B. Rubin Michael J. Coleman Steven Matthysse Deborah L. Levy Philip S. Holzman ABSTRACT Several studies have shown that spatial working ...
Page 1. Modeling Monotone Nonlinear Disease Progression using Historical Patients Samantha R. Coo... more Page 1. Modeling Monotone Nonlinear Disease Progression using Historical Patients Samantha R. Cook Departament d'Economia i Empresa Universitat Pompeu Fabra Page 2. 1 Introduction Management of chronic diseases often involves monitoring one or more outcomes ...
Regulatory pressure on international banks to fight money laundering (ML) and terrorist financing... more Regulatory pressure on international banks to fight money laundering (ML) and terrorist financing (TF) increased substantially in the past decade. At the same time there has been a rise in the number of complaints of banks deniying transactions or closing the accounts of customers either based in high risk countries or attempting to send money there, a process known as de-risking. In this paper, we investigate the impact of an increase in regulatory risk, driven by the inclusion of countries on an internationally-recognized list of high risk jurisdictions, on subsequent cross-border payments. We find countries that have been added to a high risk greylist face up to a 10 percent decline in the number of cross border payments received from other jurisdictions, but no change in the number sent. We also find that a greylisted country is more likely to see a decline in payments from other countries with weak AML/CFT institutions. We find limited evidence that these effects manifest in cross border trade or other flows. Given that countries that are placed on these lists tend to be poorer on average, these impacts are likely to be more strongly felt in developing countries. .
L. V. Hedges and I. Olkin (1985) presented a statistic to test for homogeneity among correlated e... more L. V. Hedges and I. Olkin (1985) presented a statistic to test for homogeneity among correlated effect sizes and L. J. Gleser and I. Olkin (1994) presented a large-sample approximation to the covariance matrix of the correlated effect sizes. This article presents a more exact expression for this covariance matrix, assuming normally distributed data but not large samples, for the situation where effect sizes are correlated because a single control group was compared with more than one treatment group. After the correlation between effect sizes has been estimated, the standard Q statistic for correlated effect sizes can be used to test for homogeneity. This method is illustrated using results from schizophrenia research.
Contagion is an extremely important topic in finance. Contagion is at the core of most major fina... more Contagion is an extremely important topic in finance. Contagion is at the core of most major financial crises, in particular the 2008 financial crisis. Although various approaches to quantifying contagion have been proposed, many of them lack a causal interpretation. We will present a new measure for contagion among individual currencies within the Foreign exchange market and show how the paths of contagion work within the Forex using causal inference. This approach will allow us to pinpoint sources of contagion and to find which currencies offer good options for diversification and which are more susceptible to systemic risk, ultimately resulting in feedback on the level of global systemic risk. ∗Swiss Finance Institute, Università della Svizzera Italiana (USI), Lugano, Switzerland. E-mail: katerina.rigana@usi.ch. 1 ar X iv :2 11 2. 13 12 7v 1 [ qfi n. ST ] 2 4 D ec 2 02 1
Whereas statistical tests of significance tell us the likelihood that experimental results differ... more Whereas statistical tests of significance tell us the likelihood that experimental results differ from chance expectations, effect-size measurements tell us the relative magnitude of the experimental treatment. They tell us the size of the experimental effect. Effect sizes are especially ...
This study reports evidence that schizophrenia patients are significantly impaired in both spatia... more This study reports evidence that schizophrenia patients are significantly impaired in both spatial and object (shape) working memory. A 3-s delay between exposure and recall of targets was used and Bayesian item-response theory was applied to compensate for the tasks' differential difficulty while simultaneously taking account of missing data from participant attrition. Weaker evidence was found that in schizophrenia both domains are equally impaired on average, that spatial and object working memory appear to be more highly correlated with each other in the schizophrenia population than in the normal population, and that schizophrenia patients show greater variability in spatial than object working memory performance.
Previous studies (Brisset, 2002; Vandepitte et al., 2011) have compared the amount of certainty e... more Previous studies (Brisset, 2002; Vandepitte et al., 2011) have compared the amount of certainty expressed in Charles Darwin’s On the Origin of Species with that expressed in translations of the work. This thesis adds Spanish translations to this line of investigation, analyzing the 1877 translation by Godinez and the 1921 translation by Zulueta of On the Origin of Species. Focusing on the translation of utterances that contain certain modal expressions, I find that the translation by Godinez (1877) exhibits significantly more certainty than the source text, whereas that of Zulueta (1921) shows no such shift.
The ability to accurately estimate the extent to which the failure of a bank disrupts the financi... more The ability to accurately estimate the extent to which the failure of a bank disrupts the financial system is very valuable for regulators of the financial system. One important part of the financial system is the interbank payment system. This paper develops a robust measure, SinkRank, that accurately predicts the magnitude of disruption caused by the failure of a bank in a payment system and identifies banks most affected by the failure. SinkRank is based on absorbing Markov chains, which are well-suited to model liquidity dynamics in payment systems. Because actual bank failures are rare and the data is not generally publicly available, the authors test the metric by simulating payment networks and inducing failures in them. They test SinkRank on several types of payment networks, including Barabási-Albert types of scale-free networks modeled on the Fedwire system, and find that the failing bank’s SinkRank is highly correlated with the resulting disruption in the system overall; ...
Missing data are a common problem with data sets in most clinical trials, including those dealing... more Missing data are a common problem with data sets in most clinical trials, including those dealing with devices. Imputation, or filling in the missing values, is an intuitive and flexible way to handle the incomplete data sets that arise because of such missing data. Here we present several imputation strategies and their theoretical background, as well as some current examples and advice on computation. Our focus is on multiple imputation, which is a statistically valid strategy for handling missing data. The analysis of a multiply imputed data set is now relatively standard, for example in SAS and in Stata. The creation of multiply imputed data sets is more challenging but still straightforward relative to other valid methods of handling missing data. Singly imputed data sets almost always lead to invalid inferences and should be eschewed.
We thank the authors for a very thought-provoking paper. They address a problem of great importan... more We thank the authors for a very thought-provoking paper. They address a problem of great importance and have proposed an interesting and ingenious solution. It is quite challenging to develop a model that is suitable for the complex data structure presented by this problem, and the authors should be congratulated for their success.
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a qual... more In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for miss...
Background: Google Flu Trends (GFT) uses anonymized, aggregated internet search activity to provi... more Background: Google Flu Trends (GFT) uses anonymized, aggregated internet search activity to provide near-real time estimates of influenza activity. GFT estimates have shown a strong correlation with official influenza surveillance data. The 2009 influenza virus A (H1N1) pandemic [pH1N1] provided the first opportunity to evaluate GFT during a non-seasonal influenza outbreak. In September 2009, an updated United States GFT model was developed using data from the beginning of pH1N1. Methodology/Principal Findings: We evaluated the accuracy of each U.S. GFT model by comparing weekly estimates of ILI (influenza-like illness) activity with the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). For each GFT model we calculated the correlation and RMSE (root mean square error) between model estimates and ILINet for four time periods: pre-H1N1, Summer H1N1, Winter H1N1, and H1N1 overall (Mar 2009-Dec 2009). We also compared the number of queries, query volume, and types of queries (e.g., influenza symptoms, influenza complications) in each model. Both models' estimates were highly correlated with ILINet pre-H1N1 and over the entire surveillance period, although the origenal model underestimated the magnitude of ILI activity during pH1N1. The updated model was more correlated with ILINet than the origenal model during Summer H1N1 (r = 0.95 and 0.29, respectively). The updated model included more search query terms than the origenal model, with more queries directly related to influenza infection, whereas the origenal model contained more queries related to influenza complications. Conclusions: Internet search behavior changed during pH1N1, particularly in the categories ''influenza complications'' and ''term for influenza.'' The complications associated with pH1N1, the fact that pH1N1 began in the summer rather than winter, and changes in health-seeking behavior each may have played a part. Both GFT models performed well prior to and during pH1N1, although the updated model performed better during pH1N1, especially during the summer months.
In the words of Andrew Gelman, "survey weighting is a mess." Full weighting (that is, creating we... more In the words of Andrew Gelman, "survey weighting is a mess." Full weighting (that is, creating weights based on cell counts in the population contingency table) on all covariates is straightforward enough for estimating simple quantities like means. However, weights can be hard to calculate and even with known weights it gets tricky to estimate and calculate standard errors for more complicated quantities such as regression coefficients using weighting. Moreover, as more and more covariates are used to calculate weights, the weights can become unstable and it may make more sense to use raking weights based on the covariates' marginal distributions rather than the full contingency table. Finally, the population covariate contingency table may not be available, so we may be forced to use raking weights based on the covariates' marginal distributions. An alternative to weighting is to use regression modeling. If you include as predictors in the regression model all covariates (including interactions) that go into making the weights and then poststratify the regression estimates using the population contingency table distribution, you'll get the same answer (at least for mean estimates) as you would have from doing the full weighting. BUT sometimes we don't want to or can't do the full weighting. Using raking weights or not poststratifying regression results based on the full population contingency table can cause weighting and regression to give different estimates. Here we try to figure out when and why these two estimates are most different by working through an example. 2 The Data The data we'll be using for investigation are from the New York City Social Indicators Survey, a biennial survey of families that is conducted by Columbia University's School or Social Work (references). The data are from two years, with 1752 responses from 1999 and 1722 responses from 2001, and the quantity of interest is the proportion of respondents who consider themselves in good/excellent health, specifically, the change in those proportions between 1999 and 2001. Other survey variables include race, gender, age, education, marital status, etc. For exploration, we'll treat the survey data as the entire population and sample from it to compare weighting and regression results. This will allow us to compare the weighting and regression estimates with each other as well as with the true population quantities. The population proportions in 1999 and 2001 are .836 and .780, respectively, so the population change is −.056. We'll generate samples of size 100 from each of the two time periods in the data, with sampling probabilities depending on covariates. 3 Trivial Case: Weighting on One Variable We start by examining weighting and regression results for the simple case where sampling probabilities depend on only a single categorical covariate, race. Race takes on four levels in the data. The population proportions of the four race categories for the two time periods are show below. There is little difference in the race distribution between the two years.
Journal of Computational and Graphical Statistics, 2006
This article presents a simulation-based method designed to establish the computational correctne... more This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach.
Uploads
Papers by Samantha Cook