SB Mirza
SB Mirza
SB Mirza
Mohamed Radhouene Aniba,Sophie Siguenza, Anne Friedrich,Frederic Plewniak ,Olivier Poch,Aron Marchler-Bauer and Julie Dawn Thompson
ABSTRACT
Bioinformatic information being produced to handle the various applications used to study the information presents a major data management and analysis challenge to researchers. It is impossible to analyse manually all the information. And new approaches are needed that are capable of processing the large-scale heterogeneous data to extract the pertinent information.
Cont
A general methodology for building knowledgebased expert systems is described, focusing on the (Unstructured Information Management Architecture), UIMA, which provides facilities for both data and process management.
New Challenges
Such system level studies necessitate a combination of experimental, theoretical and computational approaches. A major challenge for bioinformaticians in the post-genomic era is clearly the management, validation and analysis of this mass of experimental and predicted data.
Approaches
One approach has been data warehousing, where all the relevant databases are stored in a unified format and mined through a uniform interface Distributed systems implement software to access heterogeneous databases that are dispersed over the internet and provide a query facility to access the data.
SRS and entrez are probably the most widely used database query and navigation systems for the life science community. Semantic web based methods have been introduced which add meaning to the raw data by using formal descriptions of the concepts
Cont
Neural networks implement software simulations of massively parallel processes involving the processing of elements that are interconnected in a network architecture. Fuzzy expert systems use the method of fuzzy logic which deals with uncertainty and is used in areas where the results are not always binary.
The knowledge base contains domain expertise in the form of facts that the expert system will use to make determinations. The working storage is a database containing data specific to a problem being solved. The inference engine is the code at the core of the system which derives recommendations from the knowledge base and problem-specific data in the working storage
Cont
The knowledge acquisition module is used to update or expand dynamic knowledge bases. The user interface controls the dialog between the user and the system.
Cont
Problem solving is accomplished by applying specific knowledge rather than a specific technique. when the expert system does not produce the desired results, the solution is to expand the knowledge base rather than to reprogramme the procedures.
Expert system shell is a piece of software which provides a development framework containing the user interface. (CLIPS) (JESS) The use of a shell can reduce the amount of maintenance required and increase reusability and flexibility of the application. An alternative is toBuild a customised expert system using conventional languages. It has been recommended to use a
Cont
An active area of research has been the reconstruction of functional networks, such as expression data or interaction data, and intelligent systems, neural networks, genetic algorithms, etc. These approaches are also finding applications, e.g drug discovery and design or medical diagnosis. An important task in bioinformatics is the extraction of knowledge from the biomedical literature.
Cont
FIGENIX addresses the problems of automatic structural and functional annotation under the supervision of a rule-based expert system.
Case-based approaches and rule-based systems have also been applied widely. But there is no standard architecture that would allow exchange of information code between the many different applications.
Cont
OpenNLP, GATE and UIMA, Open NLP is an umbrella structure to open source projects for the treatment of natural language. Open NLP and GATE, contain many powerful and robust algorithms for the processing of natural language texts. UIMA allows the analysis structured and unstructured data. It provides powerful capabilities for distributed computing through services and is scalable and extensible software architecture.
Overview of UIMA
A Collection Reader (CR), which allows the CPE to treat all the data in a given collection. A corpus: AEs and AAEs, possibly managed by FCs and CAS Consumer (CC), takes the results from the CAS and stores them in an exploitable format The first layer of a UIMA application, represents the Working Storage and contains the systems memory and used to
Cont
store the metadata associated with each AE.
Cont
AAEs and each module in the application layer can then update the metadata in the CAS at each step of the data analysis. In an expert system, the analysis pipeline is not pre-defined by the developer or user, but depends the previous experience gained by the system. The third layer corresponds to the User Interface between the final user and the system. It provides powerful capabilities for distributed computing through services and is scalable and extensible software architecture.
Cont
UIMA is now exploited by numerous application related to the biological domain.
Cont
This method involves three main steps: 1:Pairwise sequence alignment and distance matrix calculation 2: guide tree construction 3:multiple alignment
The final enhancement to the ClustalW algorithm involved the addition of a post-processing refinement step Many approaches are developed to improve MSA, based on an optimisation of the Sum of Pairs objective function.
CONCLUSION
a number of requirements for a general expert system for systems biology: Easy integration of heterogeneous, distributed data structured, semi-structured and unstructured. Easy integration of different analysis modules and reuse of existing modules. complex workflow capabilities, support for decision rules and automatic reasoning facilities for implementation in a distributed grid computing environment.