Skip to content

Latest commit

 

History

History
163 lines (145 loc) · 7.86 KB

README.md

File metadata and controls

163 lines (145 loc) · 7.86 KB

Apache NiFi Prototype

Using Apache NiFi to perform OCR on PDF files.

NiFi is a system of enhancing data through filtering with the help of point source security. It was developed by the National Security Agency to enhance and boost the underlying capacities of the host system NiFi is operating on. NiFi’s main purpose is to automate the data flow between two systems. This facilitates better flow of data between two systems, one of which is creating data while the other is consuming it. NiFi was formerly called Niagarafiles.

NiFi is built on the philosophy of ensured and guaranteed deliveries. It works by effective load spreading and providing high transaction rates. It supports buffering and can queue the data until the data reaches its intended destination. It also supports prioritized queuing in cases when there are exceptions that the largest, newest or some other data should be processed first. The main goal of NiFi is thus to enhance the data flow between the two underlying systems on which it is running.

Features

  • Guaranteed delivery
  • Data buffering with back pressure and pressure release
  • Prioritizing queuing
  • Flow specifc QoS
  • Lineage and Provenance
  • Fine-grained history
  • Extensable
  • Visual command and control
  • Clustering

Abstractions

  • FlowFile - an object moving through the system consisting of a byte array and a key/value map.
  • FlowFile Processor - software that does a unit of work.
  • Connection - linkage between processors.
  • Flow Controller - broker facilitating exchange of FlowFiles between processors.
  • Process Group - set of processors and connections which has input and output ports.

Questions

  • What is the NiFi CA server?
  • How to start Zookeeper with Terraform?
  • What is Ranger authentication?
  • How are processes checkpointed?
  • How are flow versions controlled?
  • On the editing window, can there by read-only users?

Concepts

  • Flow Development Life Cycle - FDLC

Metrics

  • Is flow running correctly?
  • Number of flow files?
  • Size of flow size?
  • Size of queues?
  • Memory utilization?
  • CPU utilization?
  • Disk utilization?
  • Error counts?

Links

NiFi People

  • Andy LoPresto - @yolopey - member of Apache NiFi PMC
  • Bryan Bende - @bbende - member of Apache NiFi PMC
  • Mark Payne - @dataflowmark - member of Apache NiFi PMC
  • Matt Burgess - @mattyb149 - member of Apache NiFi PMC
  • Matt Gilman - @mattgilman - member of Apache NiFi PMC
  • Pierre Villard - @pvillard31 - member of Apache NiFi PMC
  • Steven Koon - [email protected]
  • Yolanda M. Davis - [email protected]