DISCONTINUATION OF PROJECT
This project will no longer be maintained by Intel.
Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
Contact: [email protected]
Googletest changed its library name in recent versions. Our latest code will not link to older versions. Please update googletest if you have link errors. (Feb. 6, 2019) Linking changed again for googletest (Oct. 16, 2019)
Now includes integration with OPAE (Oct. 8, 2018). For an example, see vectoradd
in the chisel
subdirectory.
This project includes code and documentation to help software programmers and hardware engineers efficiently use platforms that include multi-core servers and FPGAs.
One challenge in using such systems is to effectively utilize the resources of the FPGA given the bandwidth and latency specifications imposed by this system. The link between the Accelerator Functional Unit (AFU) and the rest of the system is limited (at least) by the clock frequency and bit width of the interface. Read latency for items requiring DRAM access can be scores to hundreds of times slower than the FPGA clock frequency. Our design methodology can help you develop applications that successfully hide this latency and utilize the link at full system bandwidth.
We use SystemC to express the parallelism available in the AFU, and rely on commercial High-Level Synthesis (HLS) tools to map this hardware description to Verilog (needed by the FPGA Synthesis, Place and Route tools.) We provide code generation flows to enable new SystemC users to become proficient quickly. Standardized memory interfaces (optimized for streaming and also random access patterns) are also provided, and well as schemes for creating multiple parallel AFUs that interface to the same (single) memory system.
Describing design memory interfaces and SystemC module and process structure in a Python-based DSL like this,
from cog_acctempl import *
dut = DUT("vectoradd")
dut.add_ut( UserType("Blk",[ArrayField(UnsignedIntField("words"),16)]))
dut.add_rds( [TypedRead("Blk","ina","__inp_Slots__","1 << 30","1")])
dut.add_rds( [TypedRead("Blk","inb","__inp_Slots__","1 << 30","1")])
dut.add_wrs( [TypedWrite("Blk","out")])
dut.add_extra_config_fields( [BitReducedField(UnsignedIntField("n"),32)])
dut.module.add_cthreads( [CThread("fetcher",writes_to_done=True),
CThread("ina_addr_gen",ports=[RdReqPort("ina")]),
CThread("inb_addr_gen",ports=[RdReqPort("inb")]),
CThread("out_addr_gen",ports=[WrReqPort("out")])])
dut.get_cthread( "fetcher").add_ports( [RdRespPort("ina"),
RdRespPort("inb"),
WrDataPort("out")])
you need to write code like this to complete the functionality of a vector addition accelerator.
void fetcher() { // (generated)
inaRespIn.reset_get(); // type: MemTypedReadRespType<Blk> (generated)
inbRespIn.reset_get(); // type: MemTypedReadRespType<Blk> (generated)
outDataOut.reset_put(); // type: MemTypedWriteDataType<Blk> (generated)
unsigned int ip = 0;
done = false; // (generated)
wait(); // (generated)
while (1) { // (generated)
if ( start) { // (generated)
// check if it was the the last one (we process 16 elements at a time)
if ( ip != (config.read().get_n() >> 4)) {
// read two Blk objects, vector add (from class Blk), and write
outDataOut.put( inaRespIn.get().data + inbRespIn.get().data);
++ip;
} else {
done = true;
}
} // (generated)
wait(); // (generated)
} // (generated)
} // (generated)
In all, more than 700 lines of SystemC intrastructure code in addition to the entire memory susbsystem is generated from less than 90 lines of the Python DSL spec and kernel code.
Download SystemC and Googletest, compile and install. Set the environment variables SC_DIR
, GTEST_DIR
, and HLD_ROOT
.
See the wiki setup page.
cd $HLD_ROOT/tutorials/memcpy/systemc; make
See the Wiki for tutorials and more documentation.
File an Issue for questions and bug sightings.
This design infrastructure is a component of Intel's Hardware Accelerator Research Program V2
Copyright (c) 2016, Intel Corporation All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.