Dna Computing
Dna Computing
Abstract
The aim of this manuscript is to illustrate the current state of the art of DNA computing achievements, especially of new approaches or methods contributing to solve either theoretical or application problems. Starting with the NP-problem that Adleman solved by means of wet DNA experiment in 1994, DNA becomes one of appropriate alternatives to overcome the silicon computer limitation. Today, many researchers all over the world concentrate on subjects either to improve available methods used in DNA computing or to suggest a new way to solve engineering or application problems with a DNA computing approach. This paper gives an overview of research achievements in DNA computing and touches on the achievements of improved methods employed in DNA computing as well as in solving application problems. At the end of discussion we address several challenges that DNA computing faces in the society.
1. Introduction
DNA computing is one interdisciplinary research area that is growing fast since DNA molecules are implemented in a computational process. One of the main objectives of this research area is to produce, in near future, a biologically inspired computer based on DNA molecules to replace or at least beneficially complement with a silicon based computer. Since R. Feynman has suggested to construct a computer from molecules in 1964 [1]. It spent 20 years till Adleman in 1994 made proof of the principle study that DNA molecules can solve an NP problem of Hamiltonian Path Problem (HPP) through bio-chemical procedure [2]. DNA is a basic storage medium for all living cells. The main function of DNA is to absorb and transmit the data of life for billions years. Roughly, it is around 10 trillions of DNA molecules could fit into a space
the size of a marbles. Since all these molecules can process data simultaneously, theoretically, we can calculate 10 trillions times simultaneously in a small space at one time. DNA computing is more generally known as molecular computing. It is interdisciplinary field where it is combination of biology, chemistry, and mathematics and computer science. Computing with DNA offers a completely new paradigm for computation. The main idea of computing with DNA is to encode data in a DNA strand form, and laboratory techniques of molecule biology, called as biooperations will be involved to manipulate DNA strands in a test tube in order to simulate arithmetical and logical operations. It is estimated that a mix of 1018 DNA strands could operate 104 times faster than the speed of a today's advanced supercomputer [3]. Since then, DNA computing is the area of exciting multidisciplinary researches. Rozenberg et al. in 1999 distinguished two major lines of researches in DNA computing as (i) the theoretical line concerned with models, algorithms and paradigms for DNA computing and (ii) the experimental line concerned with the design of laboratory experiment to test the biochemical feasibility [4]. Even though there is still a long way to implement DNA algorithm in real life problem, but researchers are interested in modeling and testing the solution in a case study in order to challenge the limitation of DNA itself. Today, lot groups of active researchers in this field develop models and do the laboratories experiment especially in challenges of biochemical feasibility. However, there are other groups concerning to develop a real DNA computer and building DNA algorithms to solve engineering or application problems. The paper is organized as follows. Section 1 is devoted to the introduction of this research topic. A definition of DNA computing is shortly provided in this section. In order to have better understanding of a DNA computing approach, Section 2 will discuss the basic structure and technique of DNA computation which are available in molecular biology for DNA
288
processing. These techniques can be considered as a basic toolbox for the experiment of DNA computing. Section 3 will discuss detail of Adleman experiment as a first ever experiment in DNA computing. From there, achievements in improving experimental and theoretical methods will be discussed in Section 4. However, we will put emphasis on the achievements in improving models, algorithms and paradigms in solving engineering and application problems. In Section 5, we are going to address some challenges and promise in this area. We complete our discussion with concluding remarks in Section 6.
to solve the directed Hamiltonian path problem for an input ( G , Vin , Vout ) as follows: 1. 2. Generate a set of random paths in G . Extract all paths beginning with Vin and ending with Vout . Extract all paths with length exactly n 1 . Extract all paths that contain every vertex at most once. 5. Accept that there is a Hamiltonian path if there are any paths left; otherwise, reject. The above steps are realized as molecular computation phases. Vertices and edges of G are coded by DNA polymers. On step 1, ligation builds DNA strands that represent random paths in G . On step 2, the Watson-Crick complements of the codings of Vin and Vout are used to extract the strands with 3. 4. the correct start and end. On step 3, in order to get codings of length n 1 , the DNA strands are separated in agarose gel. Next, the DNA is denatured. On step 4, by Watson-Crick complement of its coding, each vertex is checked if only present in a path once. On step 5, to obtain the result, the gel electrophoresis is used for testing whether there is any strand left or not. In between the steps, polymerase chain reaction (PCR) is used to amplify the intermediate results [34].
2. DNA Computing
Adleman [32] has discovered a technique from molecular biology for combinatorial problems that are hard to solve. The example from his experiment was a directed Hamiltonian path problem, which is NPcomplete. The vertices and edges of the graph were encoded in oligonucleotides of DNA from which the Hamiltonian path was produced through the processes of hybridization, ligation, and amplification [33].
Vin , Vout ) where G has a Hamiltonian path from Vin to Vout . Adleman uses the nondeterministic algorithm
289
3. Adleman's breakthrough
The first ever wet experiment that prove DNA and bio-chemical process could be used as a computing to tools to solve of complex computational problem was been done by Prof. L.M. Adleman in 1994 [2]. In seven days of lab experiment, Adleman has successful to solve Hamiltation Path Problem (HPP) of seven cities. HPP is a special case of the traveling salesman problem, obtained by setting the distance between two cities to a finite constants if they are adjacent and infinity otherwise. In HPP, we can assume G consists of vertices v1 , v2, , vn , and vstart and vend. It is can be shows as in Figure 2. One of the main requirements that should be meet by HPP is, the directed graph if and only if there are exists a sequences of compatible one way edges, v1 , v2, , vn, begin with vstart and end with vend , enter other vertex exactly only one time. Figure 2 shows the HPP problem that solved by Adleman in his first wet experiment. To find out the unique Hamiltonian path from directed graph, Adleman has follows nondeterministic algorithm as bellows: Step 1: Generate random paths through the graph. Step 2: Keep only those path that begin with vstart and end with vend . Step 3: If the graph have n vertices, keep only those paths that enter exactly n vertices. Step 4: Keep only those paths that enter all of the vertices of the graphs at least once. Step 5: If any path remains, say Yes otherwise say No. In step 1, Adleman has generated randomly 20-mer of DNA sequences to represent each city denoted as O . On the other hands, to represent vertex that connected between two different cities, Adleman suggested the DNA designed a combination of Oi and Oi 1 . After all DNA has been synthesize to represent all available vertices, 50 mol of O and 50 mol of O (represent WC complementary of all vertices) added together in one test tube for mix together in single ligation process. This process will produce randomly all possible combinatorial solution for the graph. Only vertex that passes all cities will be considered as a feasible solution. So that, Adleman employed PCR technique to check either the strand pass all cities or not. By using and as a primer, bio-chemical technique allowed selecting only related strands and discharging unrelated strands that not fulfill our requirement. At the end of this process, Adleman can only have thus strands that start with city and end
290
with city as earlier requirement in his hand. This process explained step 2 in his algorithm. The output of step 2 will be undergoing electrophoresis process. For this propose, Adleman choose to run gel electrophoresis process. This process will sort the strand according their size. Because of we know the number of cities is seven, so only strands have 140-bp (base pair) band (corresponding to double stranded strand) was excited and soaked in double distilled H 2O ( ddH 2O ). So, only strands that pass seven cities will be extracted in this process to pursue to the next process. This process explained step 3 in Adleman algorithm. In order to realize step 4 in Adlemans algorithm, he employed magnetic beads separation process. It is most labor-intensive work. In this process, each complementary of city was used to check either all the cities are existed in the strands. The procedure will be iterated until the result is obtained. Even though the experiment took almost seven day work bench experiment, and labor-intensive work, but the result is very acceptable and it is give a new approach to solve most complicated calculation especially when deal with huge amount of variables.
In this section, because a lot of researchers deal with a variable of numerical values in solving application and engineering problems, we focus our discussion on achievements of DNA computing methods in presenting numerical values. In particular, there are four ways in presenting numerical value in DNA computing. Presenting numerical value is an important part to solve weighted graph problem in this research area. Incorrect way to present a weighted graph value can be driven a wrong result at the end. The four ways of presenting numerical value in DNA computing are widely employed as (i) constant length based [2][5], (ii) direct proportional length based [6],(iii) concentration control [7] and (iv) gradient temperature method [8]. In Adleman's experiment in 1994, he employed the constant length based method to present distances between two cities [2]. However, in this experiment, Adleman did not put the labels on the arcs to represent the distances between cities. As well as Lipton in 1995 [9] proposed an enhancement of Adleman's model but still did not deal with any information yet regarding distances between cities. The first models that deal with information or labels on arcs were done by Narayanan and Zorbalas in 1998 [5] in solving a weighted graph problem. Narayanan and Zorbalas have proposed to employ constant based length to represent information of arcs, in their case, distances between cities. In this algorithm, for example, distance 1 will be presented by 3-mer of DNA meanwhile distance 2 will be presented by 6-mer of DNA and so on. As a result, the longer distance will be presented by the longer DNA strand and shorter distance by shorter strand. At the end of algorithm, the shortest strand will correspond to the optimal result for the problem. However, in this technique, the number of distances is limited to employ, because if we employ too many distances, the size of strands will become longer and longer. It is not advisable to employ too long DNA strands in DNA computing because some error comes out during other processes such as mutation or error on reading. In Narayanan and Zorbalas' study, there are not implemented in any laboratory experiment. So that, Ibrahim et al. took an initiative to make an experiment for solving weighted graph problem by implementing direct-proportional length based in 2004 [6] by solving shortest path problem with considering 5 cities and 7 edges with costs. Ibrahim et al. have proposed a new alternative approach to overcome constantproportional length based disadvantage, named as direct-proportional length based technique. In this technique, the cost of an edge is encoded as a directproportional length oligos. After an initial pool
291
generation and amplification, since numerous numbers of feasible candidates are generated, by using the standard bio-molecular laboratory operations, it is possible to extract the optimal combination which represents a solution to the shortest path problem. On the other hand, Yamamoto et al. proposed to employ concentration control in solving weighted graph problem in 2000 [7]. In this technique, since chemical reactions are controlled using a DNA concentration, the concentrations of DNA are used as input and output data. Yamamoto et al. believed this technique can reduce the experiment operation costs in detecting process of DNA computing, because this technique has to extract and analyze only relatively intensive bands. In this technique, the concentrations of complementary oligonucleotide encoding vertices are set to the same values, and the relative concentration Dij of each oligonucleotide encoding edge i -> j with cost Cij is calculated as explained in [7]. In 2004, Lee et al. proposed a novel encoding method to solve weighted graph problem [8]. This method was utilized a temperature gradient to overcome a drawback of previous methods. Melting temperature method uses fixed-length DNA strands and represent costs by melting temperatures of given DNA strands. In this technique, cost or weight for each arc is designed with various melting temperatures according to the values. A smaller value or cost is represented by a DNA sequence with a lower melting temperature, and therefore a more economical path has a lower melting temperature. On the other hand, each city sequence is designed with the same melting temperature because city sequences should contribute equally to the thermal stability of paths. Finally, road sequences that connect two cities are generated using the sequences of departure cities, arrival cities and costs. However, until now, the researchers still continuing to find out most optimal solution in presenting numerical values. Solutions for this problem will be open a new horizon in this field to solve engineering and application problems. 4.2 Engineering applications Although a lot of researchers are still focusing on solving shortest part problems, there are some other researcher groups that take an initiative to solve other applications cryptography [11] [12] [22], scheduling [16] [10] [19] [20] [26] [27], clustering [28] [29], encryption [18] [21] [13], forecasting [23] and even tried to employ it in signal and image processing
application [15] [17]. In this section, we will discuss briefly recent application problems that have been solved using DNA computing method. Zhixing et al. proposed an DNA computing based algorithm to solve job scheduling problem in 2006 [20]. The authors illustrated the working operation problem in order to explain their proposed model where six tasks are considered. With the attention of solving this problem, Zhixing et al. have mimicked the method used in HPP. It is not the first time DNA is employed to solve scheduling problems. In 2005, Watada et al. [16] proposed a DNA algorithm to schedule elevator systems and this work was polished by Jeng et al.[23], Jeng [36] and Muhammad et al.[10] [35] in 2006. In 2007, Bakar et al [26][27] proposed another DNA computing model to solve re-arrangement of flexible manufacturing systems (FMS)in production line. However, because of limitation to represent numerical values in DNA, all the researchers, so far only consider a small or medium size of scheduling problems to illustrate their proposed solution. On the other hand, some other researchers in this field are working on proposing DNA algorithm employed in information security technology [14]. For example, Boneh et al.[21] and Adleman et al.[18] have proposed a model to break a Data Encryption Standard(DES) as a alternative way for encryption data technology. DNA cryptography has been proposed by Gehani et al.[11], Kartalopoulos [12] and Tanaka et al.[13] as a new born cryptography field. Beside DNA cryptography and DES, there are some development in DNA steganography and DNA certification. Recently, DNA is employed as a intrusion detection model for computer and telecommunication systems by Boukerche et al. [30]. Among all DNA computing models proposed in this research area DNA certification is most matured and the application is most widely studied [14]. In optimization fields, several methods or models have been proposed to solve application problems. Most of these problems require huge processing time and considering a number of feasible combinatorial in order to find out optimum solution. For example, Bakar et al. have proposed a model to solve clustering problem in mutual distance [29] and proximity approaches [28]. On the other hands, Jeng et al. have introduced a merging technique between DNA computing and fuzzy set to forecast a money exchange rate [23]. Meanwhile, Kim et al. have solved optimum re-arrangement of clique density in a company using DNA computing [25].
The main issue in implementing DNA computing technologies to solve real application is how we can present numerical values especially when a number of numerical values are related in DNA strands form. Recently various researches have done and still investigating in order to solve this problem. Researchers have proposed several techniques as discussed before to solve this problem. However, at the time, all proposed solutions are only suitable for the limited number of numerical values and not tested for a number of numerical values. This problem still is open to solve. Solving problems in presenting numerical values in DNA strands form will enable DNA computing more practically to solve a lot of engineering and real application problems. Developing robust method in wet lab experiment to solve engineering problems are critically essential in DNA computing. One of important process in wet lab experiment is to reading an end result during the experiment. Recently, gel electrophoresis, where the strands will be sorted by their bands, is the most popular technique for this step. However, gel electrophoresis has their own limitation where this limitation should be disadvantage for DNA technique. One of the drawbacks of gel electrophoresis technique is coming from the fact we cannot analyze the gel images in one time for all bands when we are dealing with a number of base pairs. It is that because some bands especially the earlier one might not exist in the buffer reader yet. So that, we are only able to read a certain part of bands in one time. It will be difficult to made analysis process pursued properly. Another important technique in wet technologies of DNA computing is PCR. PCR is used to amplify the number of copies of a specific region of DNA, in order to produce enough DNA to be adequately tested. This technique can be used to identify with a very highprobability, disease-causing viruses and/or bacteria, a diseased person, or a criminal suspect. However, traditional PCR itself has several limitations that may affect results in DNA computing. Thus, several researchers focus on enhancing this technique to overcome this limitation. As a result, real time PCR is employed in several wet experiment in order to enhance the readability of end results from the experiment real time PCR. Even though current difficulties found in translating theoretical DNA computing models into real life are not sufficiently overcome, there is still potential for other areas of development. DNA computing offers a new approach to solve combinatorial problems such as NP-hard problems in parallel. This advantage offers a potential to solve problems that faced by a traditional machine in processing a number of tasks. Thus,
considering this benefit, researchers are able to solve a problem, especially one dealing with a number of calculations such as optimization of clustering, scheduling problem and so on. On the other hand, DNA is capable to store a lot of information in small space compared to digital way of storing information. Back to today's situation we are dealing with huge size of information. Today, our information not only in word or document yet, but also in images, video format and so on in these formats require a huge size of storage. So, DNA seems to offer a right choice to solve today storage problem. As a started research dealing with huge size of storage, Tsaftaris et al.[15] have proposed a solution to employ DNA in signal processing field and Tsuboi et al. in image processing [17]. Adams stated in his study [31] three main reasons why DNA computation is practical, firstly, there are a specific computer will be easier to design and implement, with less need for functional complexity and flexibility; secondly, DNA computing may prove entirely inefficient for a wide range of problems, and directing efforts on universal models may be diverting energy away from its true calling; thirdly, the types of hard computational problems that DNA based computers may be able to effectively solve are of sufficient economic importance that a dedicated processor would be financially reasonable. With so many possible advantages over conventional techniques, DNA computing has bright development potential for practical use. Future work in this field should begin to incorporate cost-benefit analysis so that comparisons can be more appropriate with existing techniques.
management engineering problems. If we considered all the advantages offered by DNA computing, it should become an alternative way to solve difficulty faced by current silicon computer. However, until now, there are still some obstacles in employing this method into engineering problems as we discussed in Section 4.
Acknowledgement
The second author would thank Universiti Malaysia Pahang and Kementerian Pengajian Tinggi Malaysia for supporting her study leave.
References
[1] R.P. Feynman, Miniaturization, New York, Reinhold, 1991, pp.282-296. [2] L.M.Adleman, Molecular computation of solutions to combinatorial problems Sciences,Vol.266, No.5187, 1994, pp. 1021-1024. [3] L.Kari, From micro-soft to bio-soft: Computing with DNA, Biocomputing and emergent computation: Proceedings of BCEC97. World Scientific 1997, Skovde, Sweden, 1997, pp. 146 - 164. [4] G. Rozenberg and A. Salomaa, DNA computing: New ideas and paradigms, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Vol. 7, 2006, pp. 188 - 200. [5] A.Narayanan and S.Zorbalas, DNA algorithm for computing shortest paths, In J.R. Koza (ed.), Proceedings of the Genetic Programming, Morgan Kaufman, 1998, pp. 718 - 723. [6] Z.Ibrahim, Y. Tsuboi, O.Ono and M.Khalid, Directproportional length-based DNA computing for shortest path problem, International Journal of Computer Sciences & Applications, Vol.1, No.1, 2004, pp. 46-60. [7] M.Yamamoto, N.Matsuura, T.Shiba, Y.Kawazoe and A.Ohuchi, DNA solution of the shortest path problem by concentration control, Lecture Notes in Computer Science (LNCS), Vol.2340, 2004, pp. 203-212. [8] J.Y. Lee, S.Y. Shin, T.H. Park and B-T. Zhang, Solving traveling salesman problems with DNA molecules encoding numerical value, Biosystems, Vol.78, No. 1-2, 2004, pp. 39-47. [9] R.J. Lipton, DNA solution of hard computational problems, Sciences, Vol. 268, No.5210, 1995, pp. 542 - 545. [10] M.S. Muhammad, Z. Ibrahim, O.Ono and M.Khalid, Application of length-based DNA computing for complex scheduling problem, International Journal of Information Technology, Vol.12, No.3, 2006, pp. 100110. [11] A. Gehani, T. La Bean and J. Reif, DNA-based cryptography, Lecture Notes in Computer Science (LNCS), Vol.2950, 2004, pp.167-188.
[12] S.V. Kartalopoulos, DNA-inspired cryptographic method in optical communications, authentication and data mimicking, Military Communications Conference, 2005. MILCOM 2005. IEEE, Vol.2, 2005, pp.774-779. [13] K. Tanaka, A. Okamoto and I. Saito, Public-key system using DNA as a one-way function for key distribution, Biosystems ,Vol.81, No.1, 2005, pp.25-29. [14] G. Cui, L. Qin, Y. Wang and X. Zhang, Information Security Technology Based on DNA Computing, Anticounterfeiting, Security, Identifica- tion, 2007 IEEE International Workshop on, Xiamen, China, 2007, pp.288 - 291. [15] S.A. Tsaftaris, A.K. Katsaggelos, T.N. Pappas and E.T. Papoutsakis, How can DNA computing be applied to digital signal processing?, Signal Processing Magazine, IEEE, Vol.21, No.6, 2004, pp.57-61. [16] J. Watada, S. Kojima, S. Ueda and O. Ono, DNA computing approach to optimal decision problems, Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on, Vol.3, 2004, pp. 15791584, also available In: the International Journal of Innovative Computing, Information and Control (IJICIC), Vol. 2, No. 1, 2006, pp. 273-282 [17] Y. Tsuboi and O. Ono, Applicability of DNA computing algorithm solving image recognition in intelligent visual mechanics, Industrial Electronics Society, 2003. IECON '03. The 29th Annual Conference of the IEEE, Vol.3,2003, pp. 3800. [18] L.M Adleman, P.W.K. Rothemund, S.T. Roweis and E. Winfree, On applying molecular computation to the data encryption standard, Journal of Computational Biology, Vol.6, No. 1,1999, pp. 53-64. [19] F. Zhang, B. Liu, W. Liu and J. Xu, A DNA computing model based on acrydite TM gel technology to solve the timetable problem, IEEE/ICME International Conference on Complex Medical Engineering, 2007, pp. 1632-1635. [20] Z.X. Yang, J. Cui, Y. Yang and Y. Ma, Job shop scheduling problem based on DNA computing, Journal of Systems Enginering and Electronics,Vol. 17, No. 3, 2006, pp. 654-659. [21] D. Boneh, C. Dunworth and R. Lipton, Breaking DES using a molecular computer, Technical Report, CSTR-489-95, Princeton University, 1995. [22] G. Xiao, M. Lu, L. Qin and X. Lai, New field of cryptography: DNA cryptography, Chinese Science Bulletin, Vol. 51, No. 12, 2006, pp. 1413-1420. [23] D.J.-F. Jeng, J. Watada, B. Wu and J.-Y. Wu, Fuzzy forecasting with DNA computing, Lecture Notes in Computer Science (LNCS), Vol. 4287, 2006, pp. 324336. [24] D.J.-F. Jeng, I. Kim and J. Watada, Bio-soft computing with fixed-length DNA to a group control optimization
problem, Soft-Computing, Springer, Vol. 12, No. 13, 2007, pp. 223-228. [25] I. Kim, D.J.-F. Jeng and J. Watada, Redesigning subgroups in a personnel network based on DNA computing, International Journal Innovative, Computing, Information and Control (IJICIC), Vol. 2, No. 4, 2006, pp. 885-896. [26] R.B.A, Bakar and J. Watada, A bio-soft computing approach to re-arrange a flexible manufacturing robot, Lecture Notes in Computer Science (LNCS), Vol. 4694, 2007, pp. 308-315. [27] R.B.A, Bakar and J. Watada, Biological computation of optimal task arrangement to multiple robot flexible machining cell, Proceedings of International Conference on Soft-Computing and Human Sciences, Kitakyushu, Japan, 2007, pp. 61-66. [28] R.A.B. Bakar, J. Watada and W. Pedrycz, A proximity approach to DNA based clustering analysis, International Journal Innovative, Computing, Information and Control (IJICIC)-New Horizon of SoftComputing and Its Applications, 2008, Vol. 4, No. 5. [29] R.A.B. Bakar, J. Watada and W. Pedrycz, DNA approach to solve clustering problem based on a mutual order, Biosystems, Vol. 91, No. 1, 2008, pp. 1-12. [30] A. Boukerche, K.R.L. Juca, J.B. Sobral and M.S.M.A. Notare, An artificial immune based intrusion detection model for computer and telecommunication systems, Parallel Computing, Vol. 30, No. 5-6, 2004, pp. 629646. [31] J.C.Adams, On the application of DNA based computing, available online at: http://publish.uwo.ca/~jadams/dnaapps1.html, Last reviewed 14th January, 2008. [32] Adleman, L., Molecular Computation of Solutions to Combinatorial Problems, Science, Vol. 266, 1994, pp. 1021-1024. [33] R. Deaton, R. C. Murphy, M. Garzon, D. R. Franceschetti and S. E. Stevens, Jr. Good Encoding for DNA-Sased Solutions to Combinatorial Problems, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 44, 1999, pp. 247258. [34] D. Rooss, Recent Developments in DNA-Computing, Proceedings of the International Symposium on Multiple-Valued Logic, 1997, pp. 3-9.
[35] R.A.B. Bakar and Junzo Watada, DNA Computing and Its Applications: Survey, ICIC Express Letters, Vol. 2, No.1 , 2008, pp. 101-108. [36] D.J.-F. Jeng, Application-Oriented Bio-Soft Computing, PhD Thesis, Graduate School of Inforamtion, Production and Systems, Wasweda University, March 2008
294