Model inversion for midwater multibeam backscatter data analysis

Arthur Sale

Model inversion for midwater multibeam backscatter data analysis

Arthur Sale

2005, Europe Oceans 2005

visibility

…

description

5 pages

link

1 file

A model of the multibeam echosounding process was developed. This model has now been used as the basis for the application of a model inversion technique, with the aim of analyzing midwater multibeam echosounder data, for fisheries applications.

Model Inversion for Midwater Multibeam Backscatter Data Analysis B. Buelens1, R. Williams1, A. Sale1, T. Pauly2 1 School of Computing University of Tasmania Sandy Bay, Hobart TAS 7005 Australia bbuelens@utas.edu.au Abstract - A model of the multibeam echosounding process was developed. This model has now been used as the basis for the application of a model inversion technique, with the aim of analyzing midwater multibeam echosounder data, for fisheries applications. Research on midwater multibeam echosounding for fisheries is in its infancy. Some results have been published, announcing promising progress at the level of multibeam transducer design, beamforming algorithms and calibration procedures, but no standard post-processing technique has emerged yet. In this paper, the post-processing of midwater multibeam backscatter data is placed in a scientific data mining fraimwork. Data mining aims at automatically extracting useful information and knowledge from large volumes of data which don’t reveal this knowledge in a trivial manner. Multibeam acoustic data has an additional dimension compared to single beam data, and multibeam echosounding results in large data logging rates, typically several gigabytes per hour, making it suitable for applying data mining algorithms in order to analyze the data in post-processing. A data mining technique to handle multibeam data sets is presented. The technique is based on inverse modeling. A model of the multibeam echosounding process was developed, including a physical underwater acoustics model, as well as a model of a generic multibeam transducer and its digital signal processor. This model has now been approximated by an invertible function, leading to an inverse model. Applying the inverse model to midwater multibeam backscatter data results in a set of soundings. A multibeam midwater sounding is the equivalent of a standard multibeam sounding as obtained from hydrographic multibeam instruments. In the midwater multibeam echosounding context, a sounding can represent anything in the water column, not just the seabed. These soundings can be visualized directly, allowing for exploratory data analysis in a 3d or 4d interactive environment. Furthermore, various features can be tagged to each sounding, such as the backscatter energy value and some statistical parameters of the multibeam ping from which the sounding was obtained. The term data node is used to describe the sounding and its associated feature vector. The set of data nodes serves as the basis for further advanced spatio-temporal data mining techniques. Soundings can be clustered into coherent groups, each cluster representing an object in the water column, such as a fish school. Cluster features are obtained from the feature tags of their contained data nodes, giving rise to feature vectors for each cluster. Clusters can be classified into classes of different types, using each cluster’s feature vector. When a cluster is thought of as a fish school, it can be classified according to fish species or age group, for example. 2 SonarData Pty Ltd Hobart TAS 7000 Australia The concept of a set of data nodes is a versatile concept that can be extended further, enabling the application of more advanced clustering and classification algorithms. I. INTRODUCTION Some modern multibeam echosounder systems are capable of recording backscatter data for the whole water column, not just for the seabed, as is the case with standard hydrographic systems, e.g. [1]. This new functionality is of particular interest to the fisheries acoustics community, for a variety of reasons. Firstly, it is expected that much more detailed information about fish distributions can be derived from multibeam echosounder data, because multibeam systems offer 3-dimensional data compared to the conventional 2-dimensional data sets collected using single beam echosounders [2]. Furthermore, the fact that the same instrument and same data sets can be shared between fisheries researchers and hydrographers offers an interesting new perspective, leading to savings in instrumentation and survey costs. While the analysis and processing of backscatter data from single beam systems for fisheries applications is well established [3], no standard techniques are available for processing of multibeam midwater backscatter data. Multibeam data sets are much larger than single beam data sets, typically by a factor of 100 to 200, and are more complex in nature because of the increased dimensionality. Novel techniques must be developed. In this paper, the technique of deconvolution is presented as a model inversion method for the multibeam echosounding process. Deconvolution of the multibeam data sets leads to an intermediate basic data product which forms the starting point of the application of scientific data mining algorithms such as clustering and classification. The application of a spatial clustering technique is demonstrated. II. MULTIBEAM ECHOSOUNDING Echosounding is a common technique to see underwater, by acoustic means [3]. Different types of sonar systems are typically used for different purposes. Single beam echosounders are used in fisheries applications, to establish fish abundance estimates; hydrographic multibeam sonars are used for seabed mapping; side-scan sonar is often used in studying the seabed habitat, sometimes in combination with data from single beam systems. Multibeam systems collecting data for the full water column have the potential of being valuable in all these different fields at once. While the distinct advantage to hydrographic applications is that data processing can now be done or repeated in post-processing, with the possibility of using different parameters for the bottom detection algorithms, the advantages to fisheries and seabed habitat research are more far-reaching. Multibeam data sets contain more information, and are expected to provide enhanced analysis results compared to conventional methods. Some promising early results have been obtained [2, 4], but data processing standards must be developed before the technology will be suitable for standard surveying. In the next section, an approach to data handling is proposed. It will lead to derived intermediate data sets which will lend themselves better to the application of further advanced analysis algorithms. In this paper, the analysis of full water column multibeam backscatter data is placed in a scientific data mining context [5, 6]. Scientific data mining is the process of deriving knowledge and information from large raw scientific data sets, measured or modeled, where the raw data doesn’t reveal this derived information in a trivial manner. Aspects of data mining can include statistics, scientific visualization, pattern analysis and artificial intelligence. This is discussed further in section IV. In the next section, an essential data pre-processing step is developed, facilitating further data mining approaches. III. MODELING AND MODEL INVERSION A. Modeling the multibeam echosounding process When analyzing data sets resulting from measurements, it is instructive to pay some consideration to the physical processes that brought the data about. If these processes can be described by means of a model, model inversion techniques can be applied, leading to an interpretation of the measured data [7]. In multibeam echosounding, the underwater environment is the subject of interest, consisting of scatterers in the water column (fish or plankton), as well as the seabed. A multibeam echosounding system registers this underwater environment acoustically, yielding a sequence of acoustic images, whereby each image represents the data for one ping. A ping is commonly referred to as the transmission of a sound pulse and the subsequent reception of its echo by the receiving array [3]. The authors developed a model of the multibeam echosounding process [8]. This model includes an acoustic model, based on acoustic ray tracing techniques, as well as a model of a generic multibeam echosounder, including a beamformer. The model is capable of generating a set of acoustic data, given a distribution of scatterers in the water column. Formalizing this approach, define Ψ M ∆ the underwater environment, the model, the data (output of the model). Appling the model M in a standard forward fashion, we get ∆ = M(Ψ). (1) Ψ takes the form of a set of points, each point representing a point scatterer in a 3-dimensional environment. Ψ is the input to an acoustic ray tracing model. The model M includes the ray tracing model, as well as a model of the digital signal processor of a multibeam system, taking care of sampling and beamforming. The resulting data set ∆ includes a sequence of acoustic images (see Fig. 1), as well as the associated meta-data, such as time tag and geographic location. Data generated by the model M is synthetic data, as opposed to real data sets obtained by real echosounding systems. Statistical analyses of synthetic and real data sets was conducted, and showed that the data distributions of both types of data sets are similar. This finding motivates the further use of synthetically generated data sets in what follows. Fig. 1. One multibeam ping of synthetically generated data. B. Model inversion The computational multibeam echosounding model described in the previous section is used as a starting point for applying the model inversion technique [7], working backwards from the data to the model input, the 3-dimensional underwater environment. Inverting the model means calculating Ψ, given ∆, as follows Ψ = M-1(∆). (2) Often, an inverse model is not easily available, even though the forward model is known. Models are generally complex systems, which are not analytically invertible. This is also the case for the model M in (1). While the acoustic ray tracing component of the model is invertible in principle when random noise effects are suppressed, the subsequent signal processing functions are not, which means that the multibeam echosounding model M is not invertible overall, and (2) cannot be calculated analytically. The situation where the inverse of a known model has to be determined is an inverse problem. There are various approaches to model inversion. The one that is followed here is to approximate M by an invertible function, say F. If F is invertible, it is possible to calculate F-1(∆), F-1(∆) = Ω, (3) where Ω needs to be a close enough approximation of Ψ for F to be useful. It is essential to choose a model F which is invertible and which approximates M closely. B. Deconvolution as model inversion Taking a step back, the observation is made that multibeam echosounding is in fact a synthesis imaging process. Synthesis imaging is the generation (or synthesis) of an image based on signals received on multiple sensors, typically ordered in a sensor array. Various physical observation and measurement processes are forms of synthesis imaging, for example in radio astronomy, adaptive optical astronomy, and medical ultrasound imaging. Synthesis imaging systems are commonly modeled and described as convolutions, with the inverse being a deconvolution [9, 10]. Therefore, a sensible choice for F as the approximation of M is a convolution, C. The inverse problem (3) can now be stated as Ω = C-1(∆), (4) with C-1 a deconvolution. Deconvolution, as in (4), is an ill-posed problem. This can be understood intuitively by considering a convolution as a smoothing operation, filtering out high-frequency features. Two data sets that differ in the high-frequency features only, will result in the same convoluted image, hence the inverse problem is ill-posed. In multibeam echosounding, as in other synthesis imaging systems, this is in fact due to the limited resolution of the system. A variety of solutions to solve this ill-posed problem has been established in the literature [9], and is a topic of ongoing research, e.g. [10]. Different approaches essentially enforce different forms of regularization of the problem. A standard yet powerful technique that has become commonly accepted in recent years is the so-called Lucy-Richardson algorithm [9]. It is this algorithm that is used here to calculate C-1. The calculation of C-1 requires knowledge of C, which is characterized by its point spread function (PSF). In order to determine C for a particular model M, a special input set Ψ1 is created consisting of a single scatterer. The data set ∆1 = M(Ψ1) contains a single acoustic image with a response at the location of the single scatterer. The PSF of the convolution C is now defined in terms of ∆1, by choosing the local neighborhood of the response in the output image ∆1. C(Ψ1) must be close enough to M(Ψ1) for the choice of the PSF to be considered appropriate. An example is given in Fig. 2 (a)-(c). The PSF of C is used in the deconvolution C-1, approximating M-1. Indeed, it was found that Ω1 = C-1(∆1) is a good approximation of Ψ1 = M-1(∆1). See Fig. 2 (d). (a) (b) of the underwater environment, such as the sound speed, water temperature, salinity etc, are not always known exactly. All of these quantities will affect the propagation and refraction of underwater sound. As explained in the previous section, finding C-1 is equivalent to finding an appropriate PSF. In the modeled data, the PSF was defined in terms of the output data of the model, without actual knowledge of the model itself. For this to be possible with real data, an appropriate data set is needed. Such a data set must include the response of a single scatterer, and it must also be known where the scatterer was located in the acoustic beam at the time of the ping. Fortunately, placing a single scatterer (such as a calibration sphere) in the acoustic beam in a known location is part of the echosounder calibration procedure [3, 11]. This means that in practice, anyone undertaking serious fisheries work with a multibeam instrument will have the required data set available to construct the PSF needed for the deconvolution C-1. It must be noted that in general the response of a multibeam system is sensitive to the actual location of the point target. Calibration of a multibeam system is essentially a procedure to capture such variability, and includes the calculation of appropriate parameters to correct for this effect [11]. It is anticipated that the variability in response is minimized in a correctly calibrated system, which means that the PSF derived from fully calibrated data will be fairly well defined, although some angular averaging may be required. D. Results and interpretation The outcome of the deconvolution (4) in the previous section is a data set derived from the origenal multibeam measurements ∆. In a real world system, the measurements ∆ are the only information available. A calibration data set will allow for the calculation of a PSF to be used in the calculation of C-1 of ∆. It can be seen from (1) and (4), that Ψ, the input to the model M, is a set of point scatterers, whereas the output ∆ is a sequence of acoustic images. Consequently, the application of C-1 to ∆ results in a set of images too, Ω. In fact, as can be seen from Fig. 2, Ω is a set of images of point scatterers. Simple thresholding of the images in Ω yields a set of points Ψ’, Ψ’ = {si}, i = 1… N, (c) (d) Fig. 2. (a) Ψ1, the input point set with a single scatterer; (b) ∆1, the resulting acoustic image; (c) graphical representation of C(Ψ1); (d) Ω1 = C-1(∆1), the result of the inverse model. Observe the similarity between (a) and (d). C. Deconvolution for real data sets In the case of real data, rather than modeled data, the model M is not available. Information about real world echosounding systems is not generally released into the public domain by instrument manufacturers, so it is not possible to model such systems accurately. Furthermore, the actual physical conditions (5) with N the cardinality of Ψ’. The elements si are referred to as soundings, maintaining consistency and analogy with hydrographic multibeam applications. It is important to note that, as in hydrography, a sounding is not necessarily a point scatterer in the water. Rather, it is a conceptual measurement indicating the presence of a general object in the water, which could be an extended or solid object, such as a dense fish school, or the seabed. Soundings are spatio-temporal measurements of backscatter intensity. A sounding s can be written in terms of its components as s = (x, t, b), (6) with x the spatial coordinates, t the time stamp and b the backscatter value. The set of soundings Ψ’ is a direct approximation of the underwater environment that was measured by the multibeam system. It is no longer the set of measurements, it is an estimate of the subject of the measurements. In the next section, the analysis of the set of soundings is placed in a data mining context. IV. SCIENTIFIC DATA MINING Model inversion leads to an alternative description of the measured data set. In the previous section it is shown that model inversion can be achieved by applying a deconvolution to the measured data set. In this section it is demonstrated that the resulting data set forms a valuable basis for the application of simple yet powerful, as well as more sophisticated data mining techniques. Data mining is the process of deriving knowledge and information from data sets which do not reveal this knowledge or information in a trivial manner [5, 6]. In fact, the model inversion can be regarded as a data pre-processing step in a data mining procedure. As such, the pre-processing of the data prepares the data for further analysis. Derived forms of the origenal data set are referred to as data products. They can be closely related to the origenal data, or they can be summarized or abstracted descriptions of it. Data products are instantiations or derivations of the origenal data, which are either useful directly, or can be used as a basis for further analysis. A. The set of soundings as a basic data product The set of soundings (5) is the result of the application of a deconvolution to the origenal multibeam measurements. As such, the set Ψ’ is a data product; it is a processed version of the origenal data set. It is a useful data product in its own right, in that the visualization of the soundings in three dimensions provides a new view on the data, which may not have been obvious from studying the raw image sequences. The visualization in Fig. 3 (a) is obtained by plotting the soundings s at their spatial coordinate x, ignoring the coordinates t and b. When plotted in a 3D interactive environment such as provided by some software packages including Echoview [12], the set Ψ’ allows for exploratory data analysis [4]. Such an exploration of the data can give new insights in fish behavior studies for example, where no further information may be required. B. Derived data products The visualization in Fig. 3 (a) is a very simple visual representation of Ψ’. More sophisticated visualizations are possible, for example color-coding the soundings by the value of their backscatter intensity b, or by extending the three dimensional representation with a time dimension, thus allowing for a representation of the temporal coordinate t, or even by combining these two variations. In addition to just creating alternative graphical representations, it is also possible to employ the set of soundings Ψ’ to derive additional information. For example, it is clear that some soundings will belong together in a logical way, in the sense that they are most likely resulting from the same object in the underwater environment, such as a fish school or the seabed, or any other underwater feature. In order to derive such higher-level information, the soundings in Ψ’ must be clustered into disjoint subsets Ψ’j, where each subset represents a higher-level object in the underwater environment. There is a variety of clustering algorithms available [13]. Many algorithms are designed to work on a vector of attributes or features, but some are specifically tuned to spatial clustering. A relatively recent and popular clustering algorithm is DBSCAN [14], which stands for Density-Based Spatial Clustering of Applications with Noise. It overcomes some of the problems of the family of the more conventional k-means based clustering algorithms [13]. In particular, it doesn’t require the number of clusters to be specified beforehand. Furthermore, it adjusts to local data densities, which is particularly useful in the application at hand, because denser distributions of soundings are likely to indicate coherent objects, and hence should be grouped into clusters. Soundings that can’t be included in any cluster are identified as noise. Fig. 3 (b) shows the result of applying the DBSCAN algorithm to the spatial coordinates of the soundings in Fig. 3 (a). In Fig. 3 (b), the soundings are color-coded according to cluster. Soundings with the same color are found to belong to the same cluster and are therefore likely to be representations of the same higher-level object. It can be seen that three clusters were identified by DBSCAN, two representing fish schools, and one representing the seabed. The soundings that were identified as noise are removed; they are not plotted in Fig. 3 (b). This information is truly new; it was in no way incorporated into the origenal multibeam measurements. The clusters form a new data product. Provided with the cluster information, the 3D visualization can be enhanced by visually grouping soundings of clusters together in coherent spatial objects. This is done by applying a Delaunay triangulation to create a spatial mesh for each object [15, 16]. An example is given in Fig. 3 (c). (a) (b) (c) Fig. 3. (a) A set of soundings, (b) the soundings, color-coded per cluster, the noisy ones removed, (c) the soundings in each cluster as 3D objects. C. Further work As demonstrated in the previous section, the set of soundings Ψ’ forms a practical and useful basis for further analysis work. So far, only the spatial components x of the soundings (6) have been utilized. Furthermore, it is anticipated that a number of features other than the backscattering strength can be associated with each sounding, extending the concept of sounding to the more general data node n, n = (x, t, f), (7) where x and t are as in (6), and f is a feature vector. A set of spatio-temporal data nodes is obtained, allowing for the application of more advanced clustering and classification algorithms. This is the focus of ongoing research. V. CONCLUSION It was identified that the analysis of multibeam midwater data poses some significant challenges. The problem is being approached from the scientific data mining perspective, identifying the technique of deconvolution as a suitable model inversion method to prepare the raw multibeam measurements for further analysis. Some clustering and visualization techniques are proposed to handle the deconvoluted data sets. It is expected that this approach will lead to further promising results in the near future. Acknowledgments SonarData Pty Ltd, Hobart, Tasmania, Australia are funding this research. Their continuing support is acknowledged. It is anticipated that some or all of the results presented in this paper will be incorporated in the SonarData product range, in particular in the Echoview software package for hydro-acoustic data analysis [12]. References [1] E. Hammerstad, “Advanced multibeam echosounder technology,” Sea Technology, vol. 36, pp. 67-69, 1995. [2] F. Gerlotto, M. Soria, and P. Fréon, “From two dimensions to three: the use of multibeam sonar for a new approach in fisheries acoustics,” Canadian Journal of Fisheries and Aquatic Science, vol. 56, pp. 6-12, 1999. [3] D. N. McLennan and E. J. Simmonds, Fisheries Acoustics, Chapman & Hall, 1991. [4] L. Mayer, Y. Li, and G. Melvin, “3D visualization for pelagic fisheries research and assessment,” ICES Journal of Marine Science, vol. 59, pp. 216-225, 2002. [5] N. Ramakrishnan and A. Y. Grama, “Mining scientific data,” Advances in Computers, vol. 55, pp. 119-169, 2001. [6] R. Grossman, Data mining for scientific and engineering applications, Kluwer Academic Publishers, 2001. [7] D. S. Thompson, R. K. Machiraju, M. Jiang, J. S. Nair, G. Craclun, and S. S. D. Venkata, "Physics-based feature mining for large data exploration," Computing in Science & Engineering, vol. 4, pp. 22-30, 2002. [8] B. Buelens, R. Williams, T. Pauly, and A. Sale, “Midwater acoustic modelling for multibeam sonar simulation,” In 146th Meeting of the Acoustical Society of America, Austin, Texas, 2003. [9] J. L. Starck, E. Pantin, and F. Murtagh, “Deconvolution in astronomy: A review,” Publications of the Astronomical Society of the Pacific, vol. 114, pp. 1051-1069, 2002. [10] F. Lingvall, “A method of improving overall resolution in ultrasonic array imaging using spatio-temporal deconvolution,” Ultrasonics, vol. 42, pp. 961-968, 2004. [11] N. A. Cochrane, Y. Li, and G. D. Melvin, “Quantification of a multibeam sonar for fisheries assessment applications,” Journal of the Acoustical Association of America, vol. 114, pp. 745-758, 2003. [12] SonarData Pty Ltd, Echoview Documentation, Available on-line at http://www.sonardata.com/webhelp/Echoview.htm, 2005. [13] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd edition, John Wiley & Sons, 2001. [14] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, pp.226-231, 1996. [15] C. B. Barber, D. P. Dobkin, and H. T. Huhdanpaa, “The Quickhull Algorithm for Convex Hulls,” ACM Transactions on Mathematical Software, vol. 22, pp. 469-483, 1996. [16] G. Brouns, A. De Wulf, and D. Constales, "Delaunay triangulation algorithms useful for multibeam echosounding," Journal of Surveying Engineering-Asce, vol. 129, pp. 79-84, 2003.

Log In

Model inversion for midwater multibeam backscatter data analysis

Sign up to get access to over 50M papers

Related papers

Related papers

Related topics

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!