Skip to content
Adam edited this page Feb 27, 2013 · 2 revisions

Table of Contents

URL

 This project provides a hardware accelerated URL extraction system. The NetFPGA reference router has been modified to identify HTTP packets containing URLs and send a copy to the host system. Software running on the host extracts and stores URLs and search terms into a database, and then displays them through a graphical user interface.

Project summary

Status :
Released
Version :
1.0
Authors :
Michael Ciesla ([email protected]), Vijay Sivaraman, Aruna Seneviratne
NetFPGA base source :
2.0

Download

  1. Install the URL Extraction Project.

Description

System Overview

The URL extraction system consists of two main components: hardware and software. The hardware component is an extended NetFPGA IPv4 reference router that identifies packets containing a HTTP GET request in hardware and sends a copy to the host. The software component is composed of three parts: URL Extractor, database, and graphical user interface. The URL Extractor parses HTTP GET packets, extracts the contained URLs and search terms, and then stores them into a database. The GUI queries the database for top occurring URLs and search terms, and displays them on-screen. A system diagram is shown in below.

Image adapted from [1]

Packet Life Cycle

The packet life cycle of the URL extraction system can be explained in the following six step sequence:

  1. A packet enters the NetFPGA through the gigabit Ethernet ports and is put in a MAC RxQ .
  2. It then traverses the User Data Path, which processes the packet to determine the output port, and places the packet in the TxQ corresponding to the output port. The User Data Path duplicates HTTP GET packets, sending a copy up to the host by placing it into the CPU TxQ, and forwarding the other along its normal path.
  3. The packet in the MAC TxQ is sent out onto the Ethernet, whereas the packet in the CPU TxQ is transfered across the PCI Bus to the NetFPGA kernel driver.
  4. The URL Extractor software then receives the packet by reading from a socket bound to the NetFPGA software interface (nf2c0).
  5. The URL Extractor parses the HTTP GET packet and extracts the contained URL, storing it into the database. The URL is then checked for embedded search engine terms, and if found, they are also extracted and stored into the database.
  6. Finally, the GUI queries the database for top occuring URLs and search terms, displaying them on-screen.

URL Extractor

Below is a screenshot of the output produced by the URL Extractor.

GUI

Below is a screenshot of the GUI. The left pane displays the top occurring URLs in the the database. The right pane displays the top occurring Google search terms (some asian characters distort the search term count alignment).

Regression Tests

The regression tests verify the functionality of the hardware component of the URL extractor system. In order to run the tests, you need to have the machine connected for the regression tests as stated in the Run Regression Tests section of the Guide.

After connecting the cables, ensure dhclient is not running. Then execute the following command to run the regression tests.

 nf2_regress_test.pl --project url_extraction

Regression Tests

The URL extraction router contains all the same regression tests as the reference router, with the addition of three new test (below). The definition of the reference router regression tests can be found on Router Tests wiki page.

Test 1: Verify duplication of Unix GET packets

Name :
test_get_unix Description: Tests the identification and duplication of packets containing a HTTP GET request method from a Unix client (TCP header length = 32B).
  1. Initialize netfpga hardware (same as test_packet_forwarding)
  2. Send 20 Unix GET packets from eth1 to eth2 and nf2c0.
  3. Send 20 Unix GET packets from eth2 to eth1 and nf2c0.
  4. Check the number of forwarded packets register and verify the value is correct.
Location
projects/url_extraction/regress/test_get_unix
Output
 SUCCESS!

Test 2: Verify duplication of Windows GET packets

Name :
test_get_win Description: Tests the identification and duplication of packets containing a HTTP GET request method from a Windows client (TCP header length = 20B).
  1. Initialize netfpga hardware (same as test_packet_forwarding)
  2. Send 20 Windows GET packets from eth1 to eth2 and nf2c0.
  3. Send 20 Windows GET packets from eth2 to eth1 and nf2c0.
  4. Check the number of forwarded packets register and verify the value is correct.
Location
projects/url_extraction/regress/test_get_win
Output
 SUCCESS!

Test 3: Verify non-GET packets are not duplicate

Name :
test_get_nondup Description: Tests that packets not containing a HTTP GET request method are forwarded correctly without being duplicated.
  1. Initialize netfpga hardware (same as test_packet_forwarding)
  2. Send 20 packets from eth1 to eth2 with an ip_len < MIN_LEN, and proto = TCP.
  3. Send 20 packets from eth2 to eth1 with an ip_len < MIN_LEN, and proto = TCP.
  4. Send 20 packets from eth1 to eth2 with an ip_len < MIN_LEN, and proto != TCP.
  5. Send 20 packets from eth2 to eth1 with an ip_len < MIN_LEN, and proto != TCP.
  6. Send 20 packets from eth1 to eth2 with an ip_len > MIN_LEN, and proto != TCP.
  7. Send 20 packets from eth2 to eth1 with an ip_len > MIN_LEN, and proto != TCP.
  8. Send 20 packets from eth1 to eth2 with an ip_len > MIN_LEN, proto = TCP, and dst port != HTTP.
  9. Send 20 packets from eth2 to eth1 with an ip_len > MIN_LEN, proto = TCP, and dst port != HTTP
  10. Check the number of forwarded packets register and verify the value is correct.
Location
projects/url_extraction/regress/test_get_nondup
Output
 SUCCESS!

Usage

Installation

  • Install packages required by software components:
 yum install mysql-server mysql-devel gtk2-devel
  • Start the MySQL server:
 service mysqld start

Setup the MySQL Database

  • Set a password for the root database user:
 mysqladmin -u root password netfpga
 mysqladmin -u root --password reload
  • Create the database:
 mysqladmin -u root -p create db
  • Create the database tables:
 cd projects/url_extraction/sw/db
 mysql -u root -p db < search_term_table.sql 
 mysql -u root -p db < url_tbl.sql 

Compile the URL Extractor

  • Compile the URL Extractor from the source:
 cd projects/url_extraction/sw/urlx
 make

Compile the GUI

  • Compile the GUI from the source:
 cd projects/url_extraction/sw/gui
 make

Hardware Component

  1. Ensure that the NetFPGA kernel driver is loaded and that the CPCI has been reprogrammed.
  2. Download the URL extraction bitfile:
 nf2_download url_extraction.bit

Network Configuration

There are two main ways to configure the router:

  1. Using SCONE. Note: This hasn't been throughly tested. Connecting hosts on port MAC-0 may have weird effects since SCONE will received extra unexpected GET packets. However, it has been tested and works in testbed topology below.
  2. Statically configure all networking information using the cli or Java gui. Adjacent nodes will also require a static ARP entry for the router.

Software Component

URL Extractor

The URL Extractor is started by running the urlx binary. The interface_name argument specifies the network interface to receive GET packets from, e.g. nf2c0.

 cd project/url_extraction/sw/urlx
 ./urlx 
    Usage: ./urlx interface_name

GUI

The GUI is started by running the gui binary:

 cd project/url_extraction/sw/gui
 ./gui

Testbed Setup

The URL Extraction system can be tested using the below topology. The NetFPGA interfaces use IP addresses 192.168.x.1, where 'x' is the interface number (starting at 1). Connect the PC to the 2nd NetFPGA port. Connect the NAT router to the 3rd NetFPGA port.

On the host system :

  • Run SCONE. The cpuhw and rtable files have been provided for this topology (projects/url_extraction/sw/scone). The NetFPGA has a default route through the NAT router.
 rtable: 
 0.0.0.0 192.168.3.2 0.0.0.0 eth2
  • Run the URL Extractor:
 ./urlx nf2c0
  • Run the GUI:
 ./gui
  • View accessed URLs and search terms.
On the PC :
  • Set the default route to 192.168.2.1
  • Start web browsing.
NAT Router :
  • The NAT router can be a PC running iptables or a home router/gateway. Here is a sample bash script to enable NAT where eth1 is connected to the LAN and eth0 has the public IP:
 #!/bin/sh
 echo "Enabling IP forwarding...\n"
 echo 1 > /proc/sys/net/ipv4/ip_forward
 echo "Flashing iptables...\n" 
 iptables -F
 echo "Adding iptables rules...\n"
 iptables -A FORWARD -i eth1 -j ACCEPT
 iptables -A FORWARD -o eth1 -j ACCEPT
 iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

References

[1] J.W. Lockwood, J. Naous, G. Gibb. (2008, Aug) Building Gigabit-rate Routers with the NetFPGA: NICTA Tutorial at UNSW. Sydney, Australia.

Clone this wiki locally