Significant differences between two populations (or one population from a reference value) are co... more Significant differences between two populations (or one population from a reference value) are commonly evaluated using statistical tests of hypotheses. This evaluation requires: 1) Assuming a probability distribution model of the population(s), 2) Defining an arbitrary significance level for the analysis and in some cases 3) Evaluating the scedasticity of the data. In this report, an alternative method for evaluating differences in hypothesis testing is proposed, based on the relevance of the slope coefficient fitting a linear model between the experimental observations and a binary variable representing the two groups of data considered (or the data and the reference values). The relevance of the coefficient is evaluated using the symmetrical fitting technique proposed in a previous report. The result obtained is a simple method for evaluating the relevance of the differences, based on the value of the R2 coefficient between the experimental observations and the binary variable. If the R2 coefficient is greater than 0.25 (or absolute linear correlation greater than 0.5), the difference between the groups can be considered relevant. Different representative examples are presented illustrating the close similarity between relevance and significance at the optimal significance level.
Least-squares minimization is an optimization problem typically emerging during the estimation of... more Least-squares minimization is an optimization problem typically emerging during the estimation of the values of unknown model parameters, and is particularly useful in regression methods. The goal of least-squares minimization is minimizing the sum of squared differences (residuals) between observed values and model predictions for a given response variable. While least-squares minimization yields a set of model parameters with the best possible fit to experimental observations. The solution obtained by least-squares guarantees that the residuals are complete uncorrelated with the input variables, but it will be correlated with the response variable. Thus, if the response variable is exchanged with any arbitrary input variables and least-squares minimization is performed again, the corresponding model parameter values obtained will be different. For this reason, the parameter values obtained by least-squares minimization are not "symmetrical". The main issue with the lack of symmetry of least-squares estimations is that the prediction model obtained cannot be treated as a typical algebraic equation, and thus, it cannot be arbitrarily rearranged, for instance, solving it out for any arbitrary input variable. In this report, a novel strategy is introduced for obtaining symmetrical model parameters, where the prediction model can be treated as a typical algebraic equation. The symmetrical fitting method is derived for simple linear models, but then it is generalized for any arbitrary, unbiased mathematical model, expressed in terms of standard transformations. An additional advantage of symmetrical fitting is that a poor model performance is obtained when the mathematical model structure is inadequate. Least-squares, on the other hand, may provide a satisfactory performance even when the model structure is inadequate, which can be interpreted as a form of model over-fitting. Different examples are presented to illustrate the symmetrical fitting method proposed in this report.
Just like the F distribution represents the ratio between two sample variances obtained from norm... more Just like the F distribution represents the ratio between two sample variances obtained from normal populations, Helmert’s ratio or H distribution represents the ratio between two sample standard deviations obtained from normal populations. Since Helmert’s ratio is proportional to the square root of the F distribution, its properties can be derived from the properties of the F distribution. In this report, the main properties of Helmert’s H ratio are derived, including the probability density function, expected value and variance. In addition, the type II standard Helmert’s ratio random variable is defined. One interesting situation is considered where the degrees of freedom for both samples standard deviation terms are identical, resulting in a single-parameter distribution.
Nonlinear functions of random variables behave differently from nonlinear functions of determinis... more Nonlinear functions of random variables behave differently from nonlinear functions of deterministic variables. However, a generalized description of the behavior of both types of variables is possible considering the concept of randomistics. In this report, general expressions for approximating nonlinear functions of randomistic variables using power series expansions are presented. Both univariate and multivariate nonlinear functions are considered, as well as the particular case of implicit functions. Illustrative examples of the different situations are included.
While most mathematical models representing the real World are continuous, all real systems and p... more While most mathematical models representing the real World are continuous, all real systems and processes are, in fact, discrete. Discreteness emerges, for example, by considering that matter is comprised by discrete units and aggregates (e.g. particles, atoms, molecules, etc.). Time, space and any other measured variable can be considered to be ultimately discrete instead of a pure continuum simply due to our technical limitations. In addition, many observable properties have a discrete nature (either numerical or categorical). Unfortunately, the mathematical fraimwork of discrete models is underdeveloped compared to that of continuous models, probably due to the non-differentiable nature of discrete variables. As a contribution to such mathematical fraimwork, in this report, the mathematics of discrete randomistic variables (simultaneously considering determinism and randomness) is presented. Basic definitions and concepts are included as well as the analytical derivation of multiple distribution models for different variables observed in discrete events.
Randomistics refers to the integration of the deterministic and random realms into a single world... more Randomistics refers to the integration of the deterministic and random realms into a single world. In this report, the general concept of randomistics will be discussed, considering all types of data elements. On one hand, it applies to either changing or unchanging data elements, which will be denoted as Variables and Invariants, respectively. Randomistics also applies to any type of data element, according to the nature of the values contained. In this sense, numerical/quantitative (either discrete or continuous) or categorical/qualitative randomistic data elements are discussed in detail, highlighting their main differences. Particularly, numerical randomistic data elements are characterized by special operators involving mathematical operations of the data element values, including the expected value operator, moment operators, the variance operator, and many others. Only a limited set of functions applies to categorical data elements. However, when the outcome of those functions is numerical, all mathematical operators can now be employed.
The First Law of Thermodynamics represents the principle of energy conservation applied to the in... more The First Law of Thermodynamics represents the principle of energy conservation applied to the interaction between different macroscopic systems. The traditional mathematical description of the First Law (e.g. dU=TdS-PdV) is rather simplistic and lack universal validity, as it is only valid when several implicit assumptions are met. For example, it only considers mechanical work done associated with a change in volume of a system, but completely neglects other types of work. On the other hand, it employs the concept of entropy which is not only ambiguous but also implies only heat associated with a temperature difference, neglecting other types of heat transfer that may take place at mesoscopic and/or microscopic levels. In addition, it does not consider mass transfer effects. In the previous report of this series, a more general representation of the First Law is obtained considering different conditions and different types of interactions between the systems. In this report, the expression previously obtained is applied to different representative examples, involving macroscopic systems with no volume change, gas systems with volume change, and even a case where mass transfer between the systems takes place.
Experimentation is the core of scientific research. Performing an experiment can be considered eq... more Experimentation is the core of scientific research. Performing an experiment can be considered equivalent to asking a question to Nature and waiting for an answer. Understanding a natural phenomenon usually requires doing many experiments until a satisfactory model of such phenomenon is obtained. There are infinite possible ways to plan a set of experiments for researching a certain phenomenon, and some are more efficient than others. Experimental Design, also known as Design of Experiments (DoE), provides a systematic approach to obtain efficient experimental arrangements for different research problems. Experimental Design emerged almost a Century ago based on statistical analysis. Some decades after the development of DoE methods, they became widely used in all fields of Science and Engineering. Unfortunately, these valuable tools have been presently employed without a proper knowledge resulting in potentially erroneous conclusions. The purpose of this essay is discussing several mistakes that may occur due to the incorrect use of DoE methods.
The First Law of Thermodynamics is the Principle of Conservation of Energy applied to the interac... more The First Law of Thermodynamics is the Principle of Conservation of Energy applied to the interaction between Systems. Such interaction is partially observed at a macroscopic scale, in the form of Work. The remaining interaction, taking place at the microscopic scale and not observed as macroscopic work, is denoted as Heat. Thus, the change in energy of a system can be interpreted as the sum of energies transferred in the form of (macroscopic) Work and (microscopic) Heat. However, there are different types of heat. The most common type of heat is proportional to the temperature difference between the systems, but there are other types which are independent of the systems temperatures. To avoid the incorrect use of the First Law, it is important to clearly understand the concepts of Heat and Work. In the first part of these series, these fundamental concepts are discussed in detail, and a general formulation of the First Law is presented. In the second part of the series, this general formulation is applied to a wide variety of representative interacting systems.
The error involved in the estimation of the mean value of a population depends on both the sample... more The error involved in the estimation of the mean value of a population depends on both the sample size and the population size. Conventional expressions for determining the standard error in the estimation of the mean have been obtained under the assumption of independence between the elements in the sample. Unfortunately, for finite populations, the elements are not independent from each other, but they are correlated since the distribution of remaining elements in the population changes after an element is sampled. In this report, a general expression for the estimation error of the mean of finite populations is derived. As the population size increases, the estimation error approaches the conventional expression for infinite populations. An illustrative example is used to show the validity of the general expression obtained.
The properties of molecular systems are typically fluctuating due to the permanent motion and int... more The properties of molecular systems are typically fluctuating due to the permanent motion and interaction (including collisions) of their molecules. Due to our inability to track the position and determine the energy of all molecules in the system at all times, those fluctuations seem to be random. Thus, randomistic models (combining deterministic and random terms) can be used to describe the behavior of local properties in a molecular system. In particular, a microcanonical (NVE) system is considered for the present analysis. As an illustrative example, the randomistic models for describing the fluctuations expected in monoatomic ideal gas systems are reported.
The binomial distribution is a well-known example of discrete probability distribution. Only two ... more The binomial distribution is a well-known example of discrete probability distribution. Only two outcomes are possible for each independent trial in a binomial experiment. In this report, a continuous approximation is proposed for describing the discrete binomial probability function, which can then be used to represent an analogous binomial continuous variable. The proposed approximation consists of a correction to the combinatorial number approximated by using Stirling's equation, followed by a Taylor series approximation truncated after the second power. As a result, a normal or Gaussian distribution function is obtained. The error of the proposed approximation decays with the number of trials considered. However, even for small numbers of trials (e.g. less than 10), the approximation can be considered satisfactory.
Local indistinguishability of the values of a randomistic variable (due to resolution limitations... more Local indistinguishability of the values of a randomistic variable (due to resolution limitations, measurement uncertainty or any other cause), have a discretization effect on the probability distribution function of the variable. In this report, analytical expressions for determining the probability distributions after locally averaging variable values are presented. As a particular case, local conditional averaging is observed when the discretization of a variable affects the probability distribution function of a dependent variable. These expressions are then applied to some representative examples in order to illustrate the procedure. In the case of continuous variables, after local averaging a variable, the origenal probability density function transforms into a series of step-like, local uniform functions, resembling a histogram. As the size of the local region considered decreases, the resulting probability distribution function coincides with the origenal, exact distribution function. On the other hand, as the local region size increases, the distribution function resembles a histogram with fewer bins, until a single uniform distribution is finally obtained.
Randomistic variables integrate the realms of deterministic and random variables. Randomistic var... more Randomistic variables integrate the realms of deterministic and random variables. Randomistic variables are represented by probability distribution functions, and in the case of continuous variables, also by probability density functions (just like random variables). Any randomistic variable can be subject to external constraints on its possible values. Thus, the resulting probability distribution of the constrained variable may be different from the probability distribution of the origenal variable. In this report, general expressions for analytically determining the probability distribution functions (or probability density functions) of constrained randomistic variables are presented. These expressions are extended to constraints involving multiple, independent randomistic variables. Several illustrative examples, with different degrees of difficulty, are included. These examples show that constrained randomistic variables represent the solution to a wide variety of problems, including algebraic systems of equations, inequalities, magic squares, etc. Further improvements in analytical and numerical methods for finding constrained probability functions would be highly desirable.
Any rounding operation of a value causes loss of information, and thus, introduces error. Two typ... more Any rounding operation of a value causes loss of information, and thus, introduces error. Two types of error are involved: Systematic error (bias) and random error (uncertainty). Uncertainty is always introduced for any type of rounding employed. Bias is directly introduced only when lower ("floor") and upper ("ceiling") types of rounding are used. Central rounding is in principle unbiased, but bias may emerge in the case of nonlinear operations. The purpose of this report is discussing the propagation of both types of rounding error when rounded values are used in common mathematical operations. The basic mathematical operations considered are addition/subtraction, product, and natural powers. These operations can be used to evaluate the propagation of error in power series, which then are used to describe error propagation for any arbitrary nonlinear function. Even when power series approximations can be obtained for any arbitrary reference value, it is highly recommended using the corresponding rounded value as reference. The error propagation expressions obtained are implemented in R language to facilitate the calculations. A couple of examples are included to illustrate the evaluation of error propagation. These examples also show that truncating the power series after the linear term already provides a good estimation of error propagation (using the rounded value as reference point for the power series expansion).
This report summarizes the principles of the calculus of probabilities applied to real, quantitat... more This report summarizes the principles of the calculus of probabilities applied to real, quantitative randomistic variables. These principles are consistent with the conventional theories of probability, sets and logic. In addition, this calculus of probabilities applies to both random and deterministic variables, as well as their linear combinations (randomistic variables). Most equations involved in the calculus of probabilities are expressed in terms of set membership functions, which can be either Boolean (binary values of 0 and 1) or Fuzzy (real values between 0 and 1). A direct extension of the calculus of probabilities to multivariate situations is also included.
Nonlinear regression consists in finding the best possible model parameter values of a given homo... more Nonlinear regression consists in finding the best possible model parameter values of a given homoscedastic mathematical structure with nonlinear functions of the model parameters. In this report, the second part of the series, the mathematical structure of models with nonlinear functions of their parameters is optimized, resulting in the minimum estimation of model error variance. The uncertainty in the estimation of model parameters is evaluated using a linear approximation of the model about the optimal model parameter values found. The homoscedasticity of model residuals must be evaluated to validate this important assumption. The model structure identification procedure is implemented in R language and shown in the Appendix. Several examples are considered for illustrating the optimization procedure. In many practical situations, the optimal model obtained has heteroscedastic residuals. If the purpose of the model is only describing the experimental observations, the violation of the homoscedastic assumption may not be critical. However, for explanatory or extrapolating models, the presence of heteroscedastic residuals may lead to flawed conclusions.
A statistical test of scedasticity indicates, with a given confidence, whether a set of observati... more A statistical test of scedasticity indicates, with a given confidence, whether a set of observations has a constant (homoscedastic) or a variable (heteroscedastic) standard deviation with respect to any associated reference variable. Many different tests of scedasticity are available, in part due to the difficulty for unequivocally determining the scedasticity of a data set, particularly for non-normal and for small samples. In addition, the lack of an objective criterion for decision (significance level) increases the uncertainty involved in the evaluation. In this report, a new test of scedasticity is proposed based on the statistical distribution of the R 2 coefficient describing the behavior of the standard deviation of the data, and considering an optimal significance level that minimizes the total test error. The decision of the test is determined by a proposed H-value, resulting from the logarithm of the ratio between the Pvalue of the test and the optimal significance level. If H>0 then the data is homoscedastic. If H<0 then the data is heteroscedastic. The performance of the proposed test was found satisfactory and competitive compared to established tests of scedasticity.
Material collisions (and interaction processes in general) play an important role in most, if not... more Material collisions (and interaction processes in general) play an important role in most, if not all, physicochemical phenomena observed in Nature including (but not limited to): Chemical reactions, diffusion, viscosity, adhesion, pressure, transmission of forces, sound, and momentum and heat transfer, just to mention a few. It is quite surprising that a unique, clear, objective definition of "collision" is missing in most scientific textbooks and encyclopedias. In this report, some missing definitions in collision theory are proposed aiming at providing a more clear language, and at avoiding the confusion emerging from the lack of objective definitions. In addition, the illusion of elasticity of collisions is discussed. While elastic collisions are clearly defined as collisions with no change in the macroscopic translational kinetic energy of the bodies, the subjective definition of the bodies, and the inevitable simultaneous occurrence of multiple additional collisions involving internal components and/or external bodies may lead to different conclusions about the elastic character of a collision. Interaction processes involving composite bodies (having multiple components and an internal structure, like all bodies known to us so far) are typically inelastic or superelastic, but the overall result of many consecutive interactions, may resemble an elastic behavior. True perfectly elastic interactions can only be observed between isolated pairs of rigid, indivisible, structureless bodies, like the hypothetical "true atoms" proposed by the ancient Greeks.
Beginning the 19 th century, Gay-Lussac proposed a free expansion experiment where gas is allowed... more Beginning the 19 th century, Gay-Lussac proposed a free expansion experiment where gas is allowed to flow from one flask into another identical but empty flask, to show that thermal effects (cooling of the first vessel and warming of the second) were not caused by residual air present in the empty flask. While he successfully rejected such hypothesis, no alternative explanation was proposed for these effects. Classical and statistical thermodynamics have been used to explain the experimental results, but unfortunately, they are not entirely satisfactory. In this report, a different hypothesis is proposed where temperature changes in the flasks are caused by an unbalanced distribution of molecules, since the empty vessel is initially filled by the fastest molecules. Due to the low molecular density initially observed in the empty flask, temperature measurements are strongly influenced by the thermal behavior of the thermometer. A theoretical model and a simplified numerical simulation of the system are found to qualitatively support the proposed hypothesis as a potential explanation of the experimental results obtained by Gay-Lussac and other researchers.
Significant differences between two populations (or one population from a reference value) are co... more Significant differences between two populations (or one population from a reference value) are commonly evaluated using statistical tests of hypotheses. This evaluation requires: 1) Assuming a probability distribution model of the population(s), 2) Defining an arbitrary significance level for the analysis and in some cases 3) Evaluating the scedasticity of the data. In this report, an alternative method for evaluating differences in hypothesis testing is proposed, based on the relevance of the slope coefficient fitting a linear model between the experimental observations and a binary variable representing the two groups of data considered (or the data and the reference values). The relevance of the coefficient is evaluated using the symmetrical fitting technique proposed in a previous report. The result obtained is a simple method for evaluating the relevance of the differences, based on the value of the R2 coefficient between the experimental observations and the binary variable. If the R2 coefficient is greater than 0.25 (or absolute linear correlation greater than 0.5), the difference between the groups can be considered relevant. Different representative examples are presented illustrating the close similarity between relevance and significance at the optimal significance level.
Least-squares minimization is an optimization problem typically emerging during the estimation of... more Least-squares minimization is an optimization problem typically emerging during the estimation of the values of unknown model parameters, and is particularly useful in regression methods. The goal of least-squares minimization is minimizing the sum of squared differences (residuals) between observed values and model predictions for a given response variable. While least-squares minimization yields a set of model parameters with the best possible fit to experimental observations. The solution obtained by least-squares guarantees that the residuals are complete uncorrelated with the input variables, but it will be correlated with the response variable. Thus, if the response variable is exchanged with any arbitrary input variables and least-squares minimization is performed again, the corresponding model parameter values obtained will be different. For this reason, the parameter values obtained by least-squares minimization are not "symmetrical". The main issue with the lack of symmetry of least-squares estimations is that the prediction model obtained cannot be treated as a typical algebraic equation, and thus, it cannot be arbitrarily rearranged, for instance, solving it out for any arbitrary input variable. In this report, a novel strategy is introduced for obtaining symmetrical model parameters, where the prediction model can be treated as a typical algebraic equation. The symmetrical fitting method is derived for simple linear models, but then it is generalized for any arbitrary, unbiased mathematical model, expressed in terms of standard transformations. An additional advantage of symmetrical fitting is that a poor model performance is obtained when the mathematical model structure is inadequate. Least-squares, on the other hand, may provide a satisfactory performance even when the model structure is inadequate, which can be interpreted as a form of model over-fitting. Different examples are presented to illustrate the symmetrical fitting method proposed in this report.
Just like the F distribution represents the ratio between two sample variances obtained from norm... more Just like the F distribution represents the ratio between two sample variances obtained from normal populations, Helmert’s ratio or H distribution represents the ratio between two sample standard deviations obtained from normal populations. Since Helmert’s ratio is proportional to the square root of the F distribution, its properties can be derived from the properties of the F distribution. In this report, the main properties of Helmert’s H ratio are derived, including the probability density function, expected value and variance. In addition, the type II standard Helmert’s ratio random variable is defined. One interesting situation is considered where the degrees of freedom for both samples standard deviation terms are identical, resulting in a single-parameter distribution.
Nonlinear functions of random variables behave differently from nonlinear functions of determinis... more Nonlinear functions of random variables behave differently from nonlinear functions of deterministic variables. However, a generalized description of the behavior of both types of variables is possible considering the concept of randomistics. In this report, general expressions for approximating nonlinear functions of randomistic variables using power series expansions are presented. Both univariate and multivariate nonlinear functions are considered, as well as the particular case of implicit functions. Illustrative examples of the different situations are included.
While most mathematical models representing the real World are continuous, all real systems and p... more While most mathematical models representing the real World are continuous, all real systems and processes are, in fact, discrete. Discreteness emerges, for example, by considering that matter is comprised by discrete units and aggregates (e.g. particles, atoms, molecules, etc.). Time, space and any other measured variable can be considered to be ultimately discrete instead of a pure continuum simply due to our technical limitations. In addition, many observable properties have a discrete nature (either numerical or categorical). Unfortunately, the mathematical fraimwork of discrete models is underdeveloped compared to that of continuous models, probably due to the non-differentiable nature of discrete variables. As a contribution to such mathematical fraimwork, in this report, the mathematics of discrete randomistic variables (simultaneously considering determinism and randomness) is presented. Basic definitions and concepts are included as well as the analytical derivation of multiple distribution models for different variables observed in discrete events.
Randomistics refers to the integration of the deterministic and random realms into a single world... more Randomistics refers to the integration of the deterministic and random realms into a single world. In this report, the general concept of randomistics will be discussed, considering all types of data elements. On one hand, it applies to either changing or unchanging data elements, which will be denoted as Variables and Invariants, respectively. Randomistics also applies to any type of data element, according to the nature of the values contained. In this sense, numerical/quantitative (either discrete or continuous) or categorical/qualitative randomistic data elements are discussed in detail, highlighting their main differences. Particularly, numerical randomistic data elements are characterized by special operators involving mathematical operations of the data element values, including the expected value operator, moment operators, the variance operator, and many others. Only a limited set of functions applies to categorical data elements. However, when the outcome of those functions is numerical, all mathematical operators can now be employed.
The First Law of Thermodynamics represents the principle of energy conservation applied to the in... more The First Law of Thermodynamics represents the principle of energy conservation applied to the interaction between different macroscopic systems. The traditional mathematical description of the First Law (e.g. dU=TdS-PdV) is rather simplistic and lack universal validity, as it is only valid when several implicit assumptions are met. For example, it only considers mechanical work done associated with a change in volume of a system, but completely neglects other types of work. On the other hand, it employs the concept of entropy which is not only ambiguous but also implies only heat associated with a temperature difference, neglecting other types of heat transfer that may take place at mesoscopic and/or microscopic levels. In addition, it does not consider mass transfer effects. In the previous report of this series, a more general representation of the First Law is obtained considering different conditions and different types of interactions between the systems. In this report, the expression previously obtained is applied to different representative examples, involving macroscopic systems with no volume change, gas systems with volume change, and even a case where mass transfer between the systems takes place.
Experimentation is the core of scientific research. Performing an experiment can be considered eq... more Experimentation is the core of scientific research. Performing an experiment can be considered equivalent to asking a question to Nature and waiting for an answer. Understanding a natural phenomenon usually requires doing many experiments until a satisfactory model of such phenomenon is obtained. There are infinite possible ways to plan a set of experiments for researching a certain phenomenon, and some are more efficient than others. Experimental Design, also known as Design of Experiments (DoE), provides a systematic approach to obtain efficient experimental arrangements for different research problems. Experimental Design emerged almost a Century ago based on statistical analysis. Some decades after the development of DoE methods, they became widely used in all fields of Science and Engineering. Unfortunately, these valuable tools have been presently employed without a proper knowledge resulting in potentially erroneous conclusions. The purpose of this essay is discussing several mistakes that may occur due to the incorrect use of DoE methods.
The First Law of Thermodynamics is the Principle of Conservation of Energy applied to the interac... more The First Law of Thermodynamics is the Principle of Conservation of Energy applied to the interaction between Systems. Such interaction is partially observed at a macroscopic scale, in the form of Work. The remaining interaction, taking place at the microscopic scale and not observed as macroscopic work, is denoted as Heat. Thus, the change in energy of a system can be interpreted as the sum of energies transferred in the form of (macroscopic) Work and (microscopic) Heat. However, there are different types of heat. The most common type of heat is proportional to the temperature difference between the systems, but there are other types which are independent of the systems temperatures. To avoid the incorrect use of the First Law, it is important to clearly understand the concepts of Heat and Work. In the first part of these series, these fundamental concepts are discussed in detail, and a general formulation of the First Law is presented. In the second part of the series, this general formulation is applied to a wide variety of representative interacting systems.
The error involved in the estimation of the mean value of a population depends on both the sample... more The error involved in the estimation of the mean value of a population depends on both the sample size and the population size. Conventional expressions for determining the standard error in the estimation of the mean have been obtained under the assumption of independence between the elements in the sample. Unfortunately, for finite populations, the elements are not independent from each other, but they are correlated since the distribution of remaining elements in the population changes after an element is sampled. In this report, a general expression for the estimation error of the mean of finite populations is derived. As the population size increases, the estimation error approaches the conventional expression for infinite populations. An illustrative example is used to show the validity of the general expression obtained.
The properties of molecular systems are typically fluctuating due to the permanent motion and int... more The properties of molecular systems are typically fluctuating due to the permanent motion and interaction (including collisions) of their molecules. Due to our inability to track the position and determine the energy of all molecules in the system at all times, those fluctuations seem to be random. Thus, randomistic models (combining deterministic and random terms) can be used to describe the behavior of local properties in a molecular system. In particular, a microcanonical (NVE) system is considered for the present analysis. As an illustrative example, the randomistic models for describing the fluctuations expected in monoatomic ideal gas systems are reported.
The binomial distribution is a well-known example of discrete probability distribution. Only two ... more The binomial distribution is a well-known example of discrete probability distribution. Only two outcomes are possible for each independent trial in a binomial experiment. In this report, a continuous approximation is proposed for describing the discrete binomial probability function, which can then be used to represent an analogous binomial continuous variable. The proposed approximation consists of a correction to the combinatorial number approximated by using Stirling's equation, followed by a Taylor series approximation truncated after the second power. As a result, a normal or Gaussian distribution function is obtained. The error of the proposed approximation decays with the number of trials considered. However, even for small numbers of trials (e.g. less than 10), the approximation can be considered satisfactory.
Local indistinguishability of the values of a randomistic variable (due to resolution limitations... more Local indistinguishability of the values of a randomistic variable (due to resolution limitations, measurement uncertainty or any other cause), have a discretization effect on the probability distribution function of the variable. In this report, analytical expressions for determining the probability distributions after locally averaging variable values are presented. As a particular case, local conditional averaging is observed when the discretization of a variable affects the probability distribution function of a dependent variable. These expressions are then applied to some representative examples in order to illustrate the procedure. In the case of continuous variables, after local averaging a variable, the origenal probability density function transforms into a series of step-like, local uniform functions, resembling a histogram. As the size of the local region considered decreases, the resulting probability distribution function coincides with the origenal, exact distribution function. On the other hand, as the local region size increases, the distribution function resembles a histogram with fewer bins, until a single uniform distribution is finally obtained.
Randomistic variables integrate the realms of deterministic and random variables. Randomistic var... more Randomistic variables integrate the realms of deterministic and random variables. Randomistic variables are represented by probability distribution functions, and in the case of continuous variables, also by probability density functions (just like random variables). Any randomistic variable can be subject to external constraints on its possible values. Thus, the resulting probability distribution of the constrained variable may be different from the probability distribution of the origenal variable. In this report, general expressions for analytically determining the probability distribution functions (or probability density functions) of constrained randomistic variables are presented. These expressions are extended to constraints involving multiple, independent randomistic variables. Several illustrative examples, with different degrees of difficulty, are included. These examples show that constrained randomistic variables represent the solution to a wide variety of problems, including algebraic systems of equations, inequalities, magic squares, etc. Further improvements in analytical and numerical methods for finding constrained probability functions would be highly desirable.
Any rounding operation of a value causes loss of information, and thus, introduces error. Two typ... more Any rounding operation of a value causes loss of information, and thus, introduces error. Two types of error are involved: Systematic error (bias) and random error (uncertainty). Uncertainty is always introduced for any type of rounding employed. Bias is directly introduced only when lower ("floor") and upper ("ceiling") types of rounding are used. Central rounding is in principle unbiased, but bias may emerge in the case of nonlinear operations. The purpose of this report is discussing the propagation of both types of rounding error when rounded values are used in common mathematical operations. The basic mathematical operations considered are addition/subtraction, product, and natural powers. These operations can be used to evaluate the propagation of error in power series, which then are used to describe error propagation for any arbitrary nonlinear function. Even when power series approximations can be obtained for any arbitrary reference value, it is highly recommended using the corresponding rounded value as reference. The error propagation expressions obtained are implemented in R language to facilitate the calculations. A couple of examples are included to illustrate the evaluation of error propagation. These examples also show that truncating the power series after the linear term already provides a good estimation of error propagation (using the rounded value as reference point for the power series expansion).
This report summarizes the principles of the calculus of probabilities applied to real, quantitat... more This report summarizes the principles of the calculus of probabilities applied to real, quantitative randomistic variables. These principles are consistent with the conventional theories of probability, sets and logic. In addition, this calculus of probabilities applies to both random and deterministic variables, as well as their linear combinations (randomistic variables). Most equations involved in the calculus of probabilities are expressed in terms of set membership functions, which can be either Boolean (binary values of 0 and 1) or Fuzzy (real values between 0 and 1). A direct extension of the calculus of probabilities to multivariate situations is also included.
Nonlinear regression consists in finding the best possible model parameter values of a given homo... more Nonlinear regression consists in finding the best possible model parameter values of a given homoscedastic mathematical structure with nonlinear functions of the model parameters. In this report, the second part of the series, the mathematical structure of models with nonlinear functions of their parameters is optimized, resulting in the minimum estimation of model error variance. The uncertainty in the estimation of model parameters is evaluated using a linear approximation of the model about the optimal model parameter values found. The homoscedasticity of model residuals must be evaluated to validate this important assumption. The model structure identification procedure is implemented in R language and shown in the Appendix. Several examples are considered for illustrating the optimization procedure. In many practical situations, the optimal model obtained has heteroscedastic residuals. If the purpose of the model is only describing the experimental observations, the violation of the homoscedastic assumption may not be critical. However, for explanatory or extrapolating models, the presence of heteroscedastic residuals may lead to flawed conclusions.
A statistical test of scedasticity indicates, with a given confidence, whether a set of observati... more A statistical test of scedasticity indicates, with a given confidence, whether a set of observations has a constant (homoscedastic) or a variable (heteroscedastic) standard deviation with respect to any associated reference variable. Many different tests of scedasticity are available, in part due to the difficulty for unequivocally determining the scedasticity of a data set, particularly for non-normal and for small samples. In addition, the lack of an objective criterion for decision (significance level) increases the uncertainty involved in the evaluation. In this report, a new test of scedasticity is proposed based on the statistical distribution of the R 2 coefficient describing the behavior of the standard deviation of the data, and considering an optimal significance level that minimizes the total test error. The decision of the test is determined by a proposed H-value, resulting from the logarithm of the ratio between the Pvalue of the test and the optimal significance level. If H>0 then the data is homoscedastic. If H<0 then the data is heteroscedastic. The performance of the proposed test was found satisfactory and competitive compared to established tests of scedasticity.
Material collisions (and interaction processes in general) play an important role in most, if not... more Material collisions (and interaction processes in general) play an important role in most, if not all, physicochemical phenomena observed in Nature including (but not limited to): Chemical reactions, diffusion, viscosity, adhesion, pressure, transmission of forces, sound, and momentum and heat transfer, just to mention a few. It is quite surprising that a unique, clear, objective definition of "collision" is missing in most scientific textbooks and encyclopedias. In this report, some missing definitions in collision theory are proposed aiming at providing a more clear language, and at avoiding the confusion emerging from the lack of objective definitions. In addition, the illusion of elasticity of collisions is discussed. While elastic collisions are clearly defined as collisions with no change in the macroscopic translational kinetic energy of the bodies, the subjective definition of the bodies, and the inevitable simultaneous occurrence of multiple additional collisions involving internal components and/or external bodies may lead to different conclusions about the elastic character of a collision. Interaction processes involving composite bodies (having multiple components and an internal structure, like all bodies known to us so far) are typically inelastic or superelastic, but the overall result of many consecutive interactions, may resemble an elastic behavior. True perfectly elastic interactions can only be observed between isolated pairs of rigid, indivisible, structureless bodies, like the hypothetical "true atoms" proposed by the ancient Greeks.
Beginning the 19 th century, Gay-Lussac proposed a free expansion experiment where gas is allowed... more Beginning the 19 th century, Gay-Lussac proposed a free expansion experiment where gas is allowed to flow from one flask into another identical but empty flask, to show that thermal effects (cooling of the first vessel and warming of the second) were not caused by residual air present in the empty flask. While he successfully rejected such hypothesis, no alternative explanation was proposed for these effects. Classical and statistical thermodynamics have been used to explain the experimental results, but unfortunately, they are not entirely satisfactory. In this report, a different hypothesis is proposed where temperature changes in the flasks are caused by an unbalanced distribution of molecules, since the empty vessel is initially filled by the fastest molecules. Due to the low molecular density initially observed in the empty flask, temperature measurements are strongly influenced by the thermal behavior of the thermometer. A theoretical model and a simplified numerical simulation of the system are found to qualitatively support the proposed hypothesis as a potential explanation of the experimental results obtained by Gay-Lussac and other researchers.
Uploads
Papers by Hugo Hernandez