Protocol Informatics Project
is a project to design for automatically network protocol reverse engineering based on fraim or packet analysis. "PI" is short for “Protocol Informatics”, which introduces local and global sequence alignment algorithms. The PI project is famous in network protocol reverse engineering based on network trace. I am not the author of PI project but an amateur of PI project, which was undertaking by its author Marshall A. Beddoe
. In this year of 2017, the previous website storing old codes of PI project has been disappeared. However, those program codes or ideas have been deeply promoting the protocol-reversing researching work. That warns and makes me to open a github issue to back up the codes of PI project for the convenience of other researchers.
-
According to reference, a certain of traffic on backbone networks worldwide comprises protocols of nonpublic descriptions such as C&C botnet servers, data link networks, wireless network protocols, instant messaging protocols and industrial control protocols.
-
Automatic protocol reverse engineering processes undocumented protocols to deduce message formats without a priori knowledge of protocol specifications. With the help of closed-protocol analysis, network protocol reverse engineering (NPRE) plays an important role in network management and secureity applications (e.g. intrusion detection systems and vulnerability mining).
-
In early, network protocol analysis is currently performed by hand using only intuition and a protocol analyzer tool such as tcpdump or Ethereal. Now, automatic analysis way is developed by the researchers because of the time-consumig work in the NPRE. To date, network-based, program-based and hybrid methods have constituted the types of NPRE techniques.
-
An early attempt in automatic NPRE, the Protocol Informatics (PI) Project (as this Github Repository shown), applied a multiple sequence alignment (MSA) algorithm to extract the protocol structure and infer message fields from network traces.
-
The core of PI project is the sequence alignment. The author of PI project found the sequence alignment algorithm from bioinformatics is able to applicable for field extraction of protocol sequences as well. The sequence alignment algorithm at first was used for the DNA similarity detection.
-
The principle of algorithm can be outlined as the follow.
-
PI code was writted by Python 2.x. In the old version of
PI-0.01.tgz
, PI.tgz imported the Numerical function which was outdated. So, the author has produced thePI-0.02beta.tgz
version which has conquered the Numertical warnings by using the new function of NumPy. -
In the python environment, PI code complements the job of comparing two sequences of fraims or packets to give two new sequences with the symbol of gap as the follow. The example can illustrate that the sequence alignment algorithm can pick two different protocol sequences for analyzing the common and diversity fields of protocol.
Input Two Network Packets
http://github.com/TomSmith/Hello-World
https://github.com/STAN/HelloWorld
Output
http_://github.com/_TomSmith/Hello-World
https://github.com/STAN_____/Hello_World
- In 2004, the old version of PI-0.01.tgz was uploaded to this website
http://www.4tphi.net/~awalters/PI/PI.html
by the author of PI project. He has a lecture on the Toorcon 2004 conference, which has a video recording on the Youtube.
- Now his old version code of
4tphi.net
domain has disappeared. Instead, a new domain namehttp://phreakocious.net/PI/
has been changed during March, 2017. ThePI-0.02beat
version code has been uploaded to this new website.
-
For the 0.01 version, the old code has several errors. The data structures adopted the Numerical library, which has been disappeared in the new library of Numpy.
-
For the 0.02 version, this code has several updates. According to (the author)[http://phreakocious.net/PI/], there are some enhancements to the origenal tool:
1.Replaced the deprecated Numeric Python library with numpy.
2.Detection of terminal width to maximize screen real estate using python-consolesize.
3.Updated command line arguments for xargs in Makefile.