-
Protocol Informatics Project
is a project for automatically network protocol reverse engineering based on protocol frames or packets. "PI" is short for “Protocol Informatics”, which introduces local and global sequence alignment algorithms. ThePI
project is famous in network protocol reverse engineering based on network trace. I amnot
the author ofPI
project but an amateur ofPI
project, which was undertaking by its authorMarshall A. Beddoe
. -
In 2017, the previous website storing old codes of
PI
project has disappeared. However, those program codes or ideas have been deeply promoting the protocol-reversing researching work. That warns me to open a github issue to back upPI
project codes for the convenience of other researchers.
-
According to reference, a certain of traffic on backbone networks worldwide comprises protocols of nonpublic descriptions such as C&C botnet servers, data link networks, wireless network protocols, instant messaging protocols and industrial control protocols.
-
Automatic protocol reverse engineering processes undocumented protocols to deduce message formats without a priori knowledge of protocol specifications. With the help of closed-protocol analysis, network protocol reverse engineering (NPRE) plays an important role in network management and security applications (e.g. intrusion detection systems and vulnerability mining).
-
In early, network protocol analysis was intuitively performed by hands. Then, the protocol analyzer tools were considered such as tcpdump or Ethereal. Now, the methods of automatic protocol analyses are developed by the researchers because of the time-consuming and diverse works in the different NPRE. To date, network-based, program-based and hybrid methods have constituted the types of NPRE techniques.
-
An early attempt in automatic NPRE, such as Protocol Informatics (
PI
) Project, applied a multiple sequence alignment (MSA) algorithm to extract the protocol structure and infer message fields from network traces. -
The core of
PI
project is the sequence alignment. The sequence alignment algorithm at first was used for the DNA similarity detection. The author ofPI
project found the sequence alignment algorithm from bioinformatics is applicable to the field extraction of protocol sequences as well. -
The paper entitled
Network Protocol Analysis using Bioinformatics Algorithms
has been presented by its authorMarshall A. Beddoe
. The principle of algorithm can be outlined as the follow.
-
PI
code was writted by Python 2.x. In the old version ofPI-0.01.tgz
,PI.tgz
imported the Numerical function which was outdated. So, another author@phreakocious
has produced thePI-0.02beta.tgz
version which has conquered the Numertical warnings by using the new function of NumPy. It is really excellent work thatMarshall A. Beddoe
and@phreakocious
do. -
In the python environment,
PI
code complements the job of comparing two sequences of frames or packets to give two new sequences with the symbol of gap as the follow. The example as follows can illustrate that the sequence alignment algorithm can analyse the common and diversity fields of packets or frames when comparing two different protocol sequences by the sequence alignment.
Input Two Network Packets
http://github.com/TomSmith/Hello-World
https://github.com/STAN/HelloWorld
Output
http_://github.com/_TomSmith/Hello-World
https://github.com/STAN_____/Hello_World
- In 2004, the old version of
PI-0.01.tgz
was uploaded to this websitehttp://www.4tphi.net/~awalters/PI/PI.html
by the author ofPI
project. He has a lecture on the Toorcon 2004 conference, which has a video recording on the Youtube.
- Now his old version code of
4tphi.net
domain has disappeared. Instead, a new domain namehttp://phreakocious.net/PI/
has been changed during March, 2017. ThePI-0.02beat
version code has been uploaded to this new website. The authorphreakocious
contributes the important work as well.
-
For the 0.01 version, the old code has several errors. The data structures adopted the Numerical library, which has been disappeared in the new library of Numpy.
-
For the 0.02 version, this code has several updates. According to the author, there are some enhancements to the original tool:
-
Replaced the deprecated Numeric Python library with numpy.
-
Detection of terminal width to maximize screen real estate using python-consolesize.
-
Updated command line arguments for xargs in Makefile.