Papers by Matthias Scholz
Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for class... more Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.
Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for class... more Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.
Soziologie: Gesellschaftsstruktur (community structure)
Netzwerk-Knoten können als Elemente mit eigener Dynamik in ihren Eigenschaften oder Verhalten bet... more Netzwerk-Knoten können als Elemente mit eigener Dynamik in ihren Eigenschaften oder Verhalten betrachtet werden. Eine Kopplung (Interaktion) zwischen diesen Einheiten führt zu einer Angleichung ihrer Verhaltensmuster.
An dieser Stelle möchte ich mich herzlich bei all denen bedanken, die mich bei der Anfertigung di... more An dieser Stelle möchte ich mich herzlich bei all denen bedanken, die mich bei der Anfertigung dieser Arbeit unterstützt haben. Besonderer Dank gilt Herrn Prof. Dr. Klaus-Robert Müller und allen anderen Mitgliedern der Arbeitsgruppe IDA (Intelligente Datenanalyse) am Fraunhofer Institut FIRST.
Abstract Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using ar... more Abstract Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics.
Wine micro-oxygenation is a globally used treatment and its effects were studied here by analysin... more Wine micro-oxygenation is a globally used treatment and its effects were studied here by analysing by untargeted LC-MS the wine metabolomic fingerprint. Eight different procedural variations, marked by the addition of oxygen (four levels) and iron (two levels) were applied to Sangiovese wine, before and after malolactic fermentation.
Summary This thesis deals with the analysis of large-scale molecular data. It is focused on the i... more Summary This thesis deals with the analysis of large-scale molecular data. It is focused on the identification of biologically meaningful components and explains the potentials of such analyses to gain deeper insight into biological issues. Many aspects are discussed including component search criteria to obtain the major information in the data and interpretation of components.
Abstract: Who is connecting to whom in social communities? A popular belief is that preferential ... more Abstract: Who is connecting to whom in social communities? A popular belief is that preferential attachment to a highly connected network-node is an adequate model for real networks. By contrast, this work reveals that node similarity is the fundamental mechanism that gives complex networks its typical scale-free power-law characteristics. Additionally, it turns out that power-law node-degree distributions are restricted to only sparsely connected communities.
Zusammenfassung Die Hauptkomponentenanalyse (Principal Component Analysis, PCA) ist eine weit ver... more Zusammenfassung Die Hauptkomponentenanalyse (Principal Component Analysis, PCA) ist eine weit verbreitete und vielfältig anwendbare Methode der Dimensionsreduktion und der Merkmalsextraktion. Sie wird benutzt zur Komprimierung, zum Entrauschen von Daten oder allgemein als Vorverarbeitung bei Klassifikations-, Regressions-oder Quellentrennungsaufgaben. Die PCA ist auf die Erkennung linearer Strukturen in Datenräumen beschränkt.
Metabolite fingerprinting is a technology for providing information from spectra of total composi... more Metabolite fingerprinting is a technology for providing information from spectra of total compositions of metabolites. We will show, that independent component analysis (ICA) applied to such high dimensional data has a higher informative power than the classical principal component analysis (PCA).
Abstract Statistical mining and integration of complex molecular data including metabolites, prot... more Abstract Statistical mining and integration of complex molecular data including metabolites, proteins, and transcripts is one of the critical goals of systems biology (Ideker, T., Galitski, T., and Hood, L.(2001) A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372). A number of studies have demonstrated the parallel analysis of metabolites and large scale transcript expression.
Modell: Ein Modell ist eine stark vereinfachte Beschreibung realer Systeme (Abstraktion). Problem... more Modell: Ein Modell ist eine stark vereinfachte Beschreibung realer Systeme (Abstraktion). Probleme können am Modell einfacher gelöst werden als am realen System, da nur die für eine Fragestellung relevanten Eigenschaften im Modell enthalten sind. Beispiel Königsberger Brückenproblem: wie weit die Brücken von einander entfernt sind oder wie lang sie sind, sind Eigenschaften die zum Lösen des Problems nicht benötigt werden und daher weggelassen werden können.

Metabolomics, Jan 1, 2005
A novel approach is presented combining quantitative metabolite and protein data and multivariate... more A novel approach is presented combining quantitative metabolite and protein data and multivariate statistics for the analysis of time-related regulatory effects of plant metabolism at a systems level. For the analysis of metabolites, gas chromatography coupled to a time-of-flight mass analyzer (GC-TOF-MS) was used. Proteins were identified and quantified using a novel procedure based on shotgun sequencing as described recently (Weckwerth et al., 2004b, Proteomics 4, 78–83). For comparison, leaves of Arabidopsis thaliana wild type plants and starchless mutant plants deficient in phosphoglucomutase activity (PGM) were sampled at intervals throughout the day/night cycle. Using principal and independent components analysis, each dataset (metabolites and proteins) displayed discrete characteristics. Compared to the analysis of only metabolites or only proteins, independent components analysis (ICA) of the integrated metabolite/protein dataset resulted in an improved ability to distinguish between WT and PGM plants (first independent component) and, in parallel, to see diurnal variations in both plants (second independent component). Interestingly, levels of photorespiratory intermediates such as glycerate and glycine best characterized phases of diurnal rhythm, and were not influenced by high sugar accumulation in PGM plants. In contrast to WT plants, PGM plants showed an inversely regulated cluster of N-rich amino acid metabolites and carbohydrates, indicating a shift in C/N partitioning. This observation corresponds to altered utilization of urea cycle intermediates in PGM plants suggesting enhanced protein degradation and carbon utilization due to growth inhibition. Among the proteins chloroplastidic GAPDH (At3g26650) was the best discriminator between WT and PGM plants in contrast to the cytosolic isoform (At1g13440) according to the primary effect of mutation located in the chloroplast. The described method is applicable to all kinds of biological systems and enables the unbiased identification of biomarkers embedded in correlative metabolite–protein networks.
Bioinformatics/computer Applications in The Biosciences, Jan 1, 2004
Genome Biol, Jan 1, 2008
Subtelomeric early transcription in Plasmodium <p>A mathematical model of the intraerythrocytic d... more Subtelomeric early transcription in Plasmodium <p>A mathematical model of the intraerythrocytic developmental cycle identifies a delay between subtelomeric and central chromosomal gene activities in the malaria parasite, <it>Plasmodium falciparum</it>.</p> Abstract Background: The malaria parasite, Plasmodium falciparum, replicates asexually in a well-defined infection cycle within human erythrocytes (red blood cells). The intra-erythrocytic developmental cycle (IDC) proceeds with a 48 hour periodicity.

Proceedings …, Jan 1, 2004
Changes in enzymatic activities in response to carbon starvation were investigated in Arabidopsis... more Changes in enzymatic activities in response to carbon starvation were investigated in Arabidopsis thaliana in two distinct experiments. One compares the Columbia ecotype (Col-0) and its starch deficient pgm mutant (plastidial phosphoglucomutase), the other investigates the enzymatic activities of Col-0 under extended night conditions. A classical technique for detecting and visualizing relevant information from the measured data is principal component analysis (PCA). We show that independent component analysis (ICA) is more suitable for our questions and the results are more precise than those obtained with PCA. This higher informative power is only achieved when ICA is combined with suitable pre-processing and evaluation criteria. It is essential to first reduce the dimensionality of the data set, using PCA. The number of principal components determines the quality of ICA significantly, therefore we propose a criterion for estimating the optimal dimension automatically. The measure of kurtosis is used to sort the extracted components. We found that ICA could detect on the one hand the time component of the extended night experiment, and on the other hand a discriminating component in the pgm mutant experiment. In both components the most important enzymes were the same, confirming the carbon starvation phenotype in the mutant.

Bioinformatics Research and Development, Jan 1, 2007
Experimental time courses often reveal a nonlinear behaviour. Analysing these nonlinearities is e... more Experimental time courses often reveal a nonlinear behaviour. Analysing these nonlinearities is even more challenging when the observed phenomenon is cyclic or oscillatory. This means, in general, that the data describe a circular trajectory which is caused by periodic gene regulation. Nonlinear PCA (NLPCA) is used to approximate this trajectory by a curve referred to as nonlinear component. Which, in order to analyse cyclic phenomena, must be a closed curve hence a circular component. Here, a neural network with circular units is used to generate circular components. This circular PCA is applied to gene expression data of a time course of the intraerythrocytic developmental cycle (IDC) of the malaria parasite Plasmodium falciparum. As a result, circular PCA provides a model which describes continuously the transcriptional variation throughout the IDC. Such a computational model can then be used to comprehensively analyse the molecular behaviour over time including the identification of relevant genes at any chosen time point.
Principal manifolds for data …, Jan 1, 2008
Nonlinear principal component analysis (NLPCA) as a nonlinear generalisation of standard principa... more Nonlinear principal component analysis (NLPCA) as a nonlinear generalisation of standard principal component analysis (PCA) means to generalise the principal components from straight lines to curves. This chapter aims to provide an extensive description of the autoassociative neural network approach for NLPCA. Several network architectures will be discussed including the hierarchical, the circular, and the inverse model with special emphasis to missing data. Results are shown from applications in the field of molecular biology. This includes metabolite data analysis of a cold stress experiment in the model plant Arabidopsis thaliana and gene expression analysis of the reproductive cycle of the malaria parasite Plasmodium falciparum within infected red blood cells.
Uploads
Papers by Matthias Scholz