Skip to content

Roadshow Demo for Fraud Detection using Gemfire, Greenplum, Madlib and Postgis

License

Notifications You must be signed in to change notification settings

rais2-pivotal/FraudDectDemo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GemFire - Greenplum Demo

Steps to run the demo

0 - Pre-requisites

Ensure that the Greenplum database, java, maven and Gefire are installed and healthy - Greenplum Database 4.3.7.2

[gpadmin@localhost staging]$ psql -c "checkpoint; select version()"
                                                                       version
------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.2.15 (Greenplum Database 4.3.7.2 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 17 2016 12:49:03
(1 row)
[gpadmin@localhost staging]$ java -version
java version "1.8.0_73"
Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
[gpadmin@localhost staging]$ gfsh
    _________________________     __
   / _____/ ______/ ______/ /____/ /
  / /  __/ /___  /_____  / _____  /
 / /__/ / ____/  _____/ / /    / /
/______/_/      /______/_/    /_/    v8.2.0

Monitor and Manage GemFire
gfsh>quit;
Exiting...
[gpadmin@localhost apache-maven-3.3.9]$ mvn -v
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /staging/apache-maven-3.3.9
Java version: 1.8.0_73, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_73/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-431.el6.x86_64", arch: "amd64", family: "unix"

1 - Setup Greenplum Database

On an existing GPDB installation:

  • Download Madlib and Postgis binaries from the following link

       https://network.pivotal.io/products/pivotal-gpdb#/releases/1377/file_groups/250
    [NOTE]: The binaries have already been downloaded and available under '/staging' directory of this virtual machine. You can run the following commands to install it.
  • Extract and install the madlib binaries

cd /staging
tar -xzf madlib-ossv1.8_pv1.9.4_gpdb4.3orca-rhel5-x86_64.tgz
cd GPDB_4.3
[gpadmin@localhost GPDB_4.3]$ gppkg -i madlib-ossv1.8_pv1.9.4_gpdb4.3orca-rhel5-x86_64.gppkg

$GPHOME/madlib/bin/madpack install -s madlib -p greenplum -c gpadmin@localhost:5432/gemfire
  • Extract and install the postgis binaries

[gpadmin@localhost staging]$ gppkg -i postgis-ossv2.0.3_pv2.0.1_gpdb4.3orca-rhel5-x86_64.gppkg

psql -d gemfire -f $GPHOME/share/postgresql/contrib/postgis-2.0/postgis.sql
  • Download the demo binaries using git.

[gpadmin@localhost staging]$ git clone https://github.com/rais2-pivotal/FraudDectDemo
git: /usr/local/greenplum-db-4.3.7.2/lib/libz.so.1: no version information available (required by git)
Initialized empty Git repository in /staging/FraudDectDemo/.git/
git-remote-https: /usr/local/greenplum-db-4.3.7.2/lib/libz.so.1: no version information available (required by git-remote-https)
git: /usr/local/greenplum-db-4.3.7.2/lib/libz.so.1: no version information available (required by git)
remote: Counting objects: 763, done.
remote: Compressing objects: 100% (317/317), done.
git: /usr/local/greenplum-db-4.3.7.2/lib/libz.so.1: no version information available (required by git)
remote: Total 763 (delta 318), reused 763 (delta 318), pack-reused 0
Receiving objects: 100% (763/763), 127.95 MiB | 3.66 MiB/s, done.
Resolving deltas: 100% (318/318), done.
  • Change directory to /staging/FraudDectDemo/Server/scripts

  • Login to the Greenplum DB instance and run the model1.sql script. This will create the table structures we’re using. Please ignore the '"transaction_info" doesn’t exist' error, if it’s your first time running it.

[gpadmin@localhost Server]$ psql -d gemfire -f model1.sql
psql:scripts/model1.sql:15: NOTICE:  table "pos_device" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:scripts/model1.sql:23: NOTICE:  table "transaction" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:scripts/model1.sql:33: NOTICE:  table "zip_codes" does not exist, skipping
DROP TABLE
psql:scripts/model1.sql:36: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'zip' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
COPY 42180
psql:scripts/model1.sql:40: NOTICE:  table "suspect" does not exist, skipping
DROP TABLE
CREATE TABLE
CREATE VIEW
psql:scripts/model1.sql:55: NOTICE:  table "pos_data" does not exist, skipping
DROP TABLE
psql:scripts/model1.sql:59: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 0

2- Start the GemFire cluster

First we’ll add the GemFire-Greenplum beta jars to a local Maven repository (this won’t be necessary once the bits are released).

[gpadmin@localhost FraudDectDemo]$ cd /staging/FraudDectDemo/
$ mvn install:install-file -Dfile=lib/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar -DgroupId=io.pivotal.gemfire -DartifactId=gemfire-greenplum -Dversion=1.0.0-beta-6-SNAPSHOT -Dpackaging=jar

[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-install-plugin:2.4:install-file (default-cli) @ standalone-pom ---
[INFO] Installing /Users/fmelo/FraudDetection-wwko/lib/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar to /Users/fmelo/.m2/repository/io/pivotal/gemfire/gemfire-greenplum/1.0.0-beta-6-SNAPSHOT/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.465 s
[INFO] Finished at: 2016-02-10T20:41:44-07:00
[INFO] Final Memory: 7M/123M
[INFO] ------------------------------------------------------------------------

Now, let’s compile and start the GemFire cluster

$ cd Server
$ ./gradlew serverJar
:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:jar
:serverJar

BUILD SUCCESSFUL

$ ./startup.sh

1. Executing - start locator --name=locator --J=-Dgemfire.http-service-port=7575

.............................
Locator in /Users/fmelo/sko/Server/locator on frederimelosmbp[10334] as locator is currently online.
Process ID: 33127
Uptime: 15 seconds
GemFire Version: 8.2.0
Java Version: 1.8.0_40
Log File: /Users/fmelo/sko/Server/locator/locator.log
JVM Arguments: -Dgemfire.enable-cluster-configuration=true -Dgemfire.load-cluster-configuration-from-dir=false -Dgemfire.http-service-port=7575 -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path: /Users/fmelo/gemfire/lib/gemfire.jar:/Users/fmelo/gemfire/lib/locator-dependencies.jar

Successfully connected to: [host=frederimelosmbp, port=1099]

Cluster configuration service is up and running.

2. Executing - start server --name=server1 --cache-xml-file=src/main/resources/server-cache.xml --classpath='../../lib/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar:../../lib/postgresql-9.4-1206-jdbc4.jar:../build/libs/Server.jar' --J=-Dgemfire.start-dev-rest-api=true --J=-Dgemfire.http-service-port=8888 --locators=geode-server[10334]

...........
Server in /Users/fmelo/sko/Server/server1 on frederimelosmbp[40404] as server1 is currently online.
Process ID: 33128
Uptime: 5 seconds
GemFire Version: 8.2.0
Java Version: 1.8.0_40
Log File: /Users/fmelo/sko/Server/server1/server1.log
JVM Arguments: -Dgemfire.cache-xml-file=/Users/fmelo/sko/Server/src/main/resources/server-cache.xml -Dgemfire.locators=geode-server[10334] -Dgemfire.use-cluster-configuration=true -Dgemfire.start-dev-rest-api=true -Dgemfire.http-service-port=8888 -XX:OnOutOfMemoryError=kill -KILL %p -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path: /Users/fmelo/gemfire/lib/gemfire.jar:../../lib/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar:../../lib/postgresql-9.4-1206-jdbc4.jar:../build/libs/Server.jar:/Users/fmelo/gemfire/lib/server-dependencies.jar

3- Start the Web Console

  • In case you’re not deploying it to CloudFoundry, export the "locatorHost" and "locatorPort" environment variables to point to your GemFire locator endpoint. It defaults to "geode-server" on port 10334

$ export locatorHost=localhost
$ export locatorPort=10334
  • Compile the app

As the GemFire-Greenplum connector is not GA yet, we’ll add the provided bits (under the "lib" directory) to your local maven repository in order to compile the source code: (you’ll need maven installed, of course)

$ cd /staging/FraudDectDemo/
$ mvn install:install-file -Dfile=lib/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar -DgroupId=io.pivotal.gemfire -DartifactId=gemfire-greenplum -Dversion=1.0.0-beta-6-SNAPSHOT -Dpackaging=jar
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-install-plugin:2.4:install-file (default-cli) @ standalone-pom ---
[INFO] Installing /Users/fmelo/sko/lib/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar to /Users/fmelo/.m2/repository/io/pivotal/gemfire/gemfire-greenplum/1.0.0-beta-6-SNAPSHOT/gemfire-greenplum-1.0.0-beta-6-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.271 s
[INFO] Finished at: 2016-02-01T19:50:39-08:00
[INFO] Final Memory: 8M/309M
[INFO] ------------------------------------------------------------------------
$ cd WebConsole
$ ./gradlew jar
:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:jar

BUILD SUCCESSFUL

Run the app

$ cd WebConsole
$ ./gradlew bootRun
(...)
Feb 01, 2016 4:52:51 PM io.pivotal.demo.sko.ui.WebConsoleApp logStarted
INFO: Started WebConsoleApp in 4.958 seconds (JVM running for 5.227)

Make sure you can access the application at http://<host>:8080/index.html

4- Generate a few transactions to train the Machine Learning process

We’ll tell the generator to setup the PoS Devices and add 100000 transactions initially.

  • Ensure the application.properties file to look like the following:

$ cd PoS_Emulator
$ more src/main/resources/application.properties

# replace with your GemFire/Geode endpoint
geodeUrl=http://localhost:8888/gemfire-api/v1/
delayInMs=5
skipSetup=false
numberOfAccounts=5000

# negative number means it will keep posting continuously
numberOfTransactions=50000

$ ./gradlew bootRun

2016-02-01 17:23:47.075  INFO 33355 --- [           main] i.p.demo.sko.TransactionEmulatorApp      : Starting TransactionEmulatorApp on FrederiMelosMBP with PID 33355 (/Users/fmelo/sko/PoS_Emulator/build/classes/main started by fmelo in /Users/fmelo/sko/PoS_Emulator)
2016-02-01 17:23:47.078  INFO 33355 --- [           main] i.p.demo.sko.TransactionEmulatorApp      : No active profile set, falling back to default profiles: default
2016-02-01 17:23:47.111  INFO 33355 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@25bbf683: startup date [Mon Feb 01 17:23:47 PST 2016]; root of context hierarchy
2016-02-01 17:23:47.672  INFO 33355 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2016-02-01 17:23:47.689  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : >>>>> RUNNING SETUP
2016-02-01 17:23:47.689  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 17:23:47.689  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Geode rest endpoint: http://192.168.9.1:8888/gemfire-api/v1/
2016-02-01 17:23:47.690  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 17:23:47.690  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Adding 3143 devices ...
2016-02-01 17:23:55.508  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : >>>>> RUNNING SIMULATION
2016-02-01 17:23:55.508  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 17:23:55.509  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Geode rest endpoint: http://192.168.9.1:8888/gemfire-api/v1/
2016-02-01 17:23:55.509  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 17:23:55.509  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Posting 100000 transactions ...
2016-02-01 17:48:24.855  INFO 33355 --- [           main] io.pivotal.demo.sko.Emulator             : done
2016-02-01 17:48:24.933  INFO 33355 --- [           main] i.p.demo.sko.TransactionEmulatorApp      : Started TransactionEmulatorApp in 1478.061 seconds (JVM running for 1478.397)
2016-02-01 17:48:24.940  INFO 33355 --- [       Thread-1] s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@25bbf683: startup date [Mon Feb 01 17:23:47 PST 2016]; root of context hierarchy
2016-02-01 17:48:24.954  INFO 33355 --- [       Thread-1] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown

BUILD SUCCESSFUL
  • Create the pos_data and transaction_info objects which will retrieve data generated via transactions

[gpadmin@localhost Server]$ psql -d gemfire -f model2.sql

On the Greenplum server, run

$  psql -d gemfire -f train.sql

You will also configure this to run at each 10 minutes using a cron job (next step)

6- Setup the Machine Learning train and evaluation on cron

On the Greenplum server, run

[gpadmin@gpdb-sandbox ~]$ chmod u+x /home/gpadmin/*.sh
[gpadmin@gpdb-sandbox ~]$ sudo su
[root@gpdb-sandbox gpadmin]# echo "* *  *  *  * gpadmin  . /home/gpadmin/.bashrc;/home/gpadmin/prediction.sh" >> /etc/crontab
[root@gpdb-sandbox gpadmin]# echo "*/10 *  *  *  * gpadmin  . /home/gpadmin/.bashrc;/home/gpadmin/train.sh" >> /etc/crontab
[root@gpdb-sandbox gpadmin]# /etc/init.d/crond reload;exit

This will make sure the ML model is evaluated every minute and is re-trained at each 10 minutes.

8- Access the WebConsole and run the emulator to see results

Open a browser and point to http://localhost:8080/index.html, in case of local deployment or to the URL given by CloudFoundry (if deploying to CF)

Now we’ll config the generator to not setup the PoS Devices (we’ve already done the setup before), set your preferred number of transactions (-1 indicates an infinite loop) and add the desired delay between transactions (helpful to show scalability):

  • If not using CloudFoundry, edit the application.properties file to loop like the following and start the emulator:

$ cd PoS_Emulator
$ more src/main/resources/application.properties

# replace with your GemFire/Geode endpoint
geodeUrl=http://192.168.9.1:8888/gemfire-api/v1/
delayInMs=50
skipSetup=true
numberOfAccounts=5000

# negative number means it will keep posting continuously
numberOfTransactions=-1

$ ./gradlew bootRun
2016-02-01 16:53:54.764  INFO 33149 --- [           main] i.p.demo.sko.TransactionEmulatorApp      : Starting TransactionEmulatorApp on FrederiMelosMBP with PID 33149 (/Users/fmelo/sko/PoS_Emulator/build/classes/main started by fmelo in /Users/fmelo/sko/PoS_Emulator)
2016-02-01 16:53:54.766  INFO 33149 --- [           main] i.p.demo.sko.TransactionEmulatorApp      : No active profile set, falling back to default profiles: default
2016-02-01 16:53:54.808  INFO 33149 --- [           main] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@25bbf683: startup date [Mon Feb 01 16:53:54 PST 2016]; root of context hierarchy
2016-02-01 16:53:55.450  INFO 33149 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup
2016-02-01 16:53:55.466  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : >>>>> RUNNING SETUP
2016-02-01 16:53:55.466  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 16:53:55.466  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Geode rest endpoint: http://192.168.9.1:8888/gemfire-api/v1/
2016-02-01 16:53:55.466  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 16:54:04.909  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : >>>>> RUNNING SIMULATION
2016-02-01 16:54:04.909  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 16:54:04.909  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Geode rest endpoint: http://192.168.9.1:8888/gemfire-api/v1/
2016-02-01 16:54:04.909  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : --------------------------------------
2016-02-01 16:54:04.909  INFO 33149 --- [           main] io.pivotal.demo.sko.Emulator             : >>> Posting 2147483647 transactions ...
(...)
  • If using CloudFoudry, use the manifest at PoS_Emulator/manifest.yml to config the properties and push the app:

$ more manifest.yml
---
applications:
- name: pos_emulator
  memory: 512M
  instances: 1
  host: pos_emulator
  path: build/libs/PoS_Emulator.jar
  no-route: true
  services:
    - gemfire
  env:
    skipSetup: true
    numberOfTransactions: -1
    delayInMs: 50

$ ./gradlew build
:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:findMainClass
:jar
:bootRepackage
:assemble
:compileTestJava UP-TO-DATE
:processTestResources UP-TO-DATE
:testClasses UP-TO-DATE
:test UP-TO-DATE
:check UP-TO-DATE
:build
BUILD SUCCESSFUL

$ cf push --no-start
Using manifest file /Users/fmelo/sko/PoS_Emulator/manifest.yml

Creating app pos_emulator in org fmelo-org / space dev as fmelo...
OK

App pos_emulator is a worker, skipping route creation
Uploading pos_emulator...
Uploading app files from: /Users/fmelo/sko/PoS_Emulator/build/libs/PoS_Emulator.jar
Uploading 322.2K, 86 files
Done uploading
OK
Binding service gemfire to app pos_emulator in org fmelo-org / space dev as fmelo...
OK

$ cf set-health-check pos_emulator none
Updating pos_emulator health_check_type to 'none'
OK

$ cf start pos_emulator
(...)
     state     since                    cpu    memory         disk          details
#0   running   2016-02-01 06:33:23 PM   0.0%   692K of 512M   26.7M of 1G

You can also scale the emulator to several instances in order to show scalability.

Let it run for at least one minute while checking your browser. You should notice transactions and possible frauds being shown.

Demo Screenshot

About

Roadshow Demo for Fraud Detection using Gemfire, Greenplum, Madlib and Postgis

Resources

License

Stars

Watchers

Forks

Packages

No packages published