1. Introduction --------------- This archive contains the design files for our FPGA optimized Network-on-Chip (NoC) architecture and the wishbone <-> NoC bridge. 2. License ---------- This work uses the MIT license. Although it only talks about Software, in the context of the license you may consider the RTL source code to be software. Copyright (c) 2007 Andreas Ehliar Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 3. Software used ---------------- To use the files you probably have to use Linux. You also need ISE (tested with at least ISE 9.1) And ModelSim (most versions should work I guess) In some cases you will need Verilog-PERL as well. 4. NoC protocol --------------- The NoC protocol itself is very simplistic, it is packet switched and based on the fire-and-forget model. Once a packet has been accepted by the network, it is guaranteed to be delivered to its destination. Protocol: (for a NoC with DATAWIDTH `defined to 36) Bits | `define in noc_pkt.vh | Explanation -----+-----------------------+--------------------------------------------------------- 35-0 | DATAWIDTH | Data 40-36| DESTWIDTH | Destination ID 41 | LASTBIT | Last word in transaction 44-41| NEXTHOPWIDTH | Destination in current switch (one hot coded) 45 | SENDBIT | Set to 1 to send the current word 46 | SENDOKBIT | Signals that this port is ready to receive more data -----+-----------------------+--------------------------------------------------------- 5. Routing ---------- Bits 44-41 merits an additional explanation. Normally a route lookup happens in the current switch to decide where the packet should go. Unfortunately the route lookup turned out to be in the critical path. Therefore the route lookup happens in the previous node. 0 1 | | +-+ +-+ 7 ---|a|----|b|--- 2 +-+ +-+ | | +-+ +-+ 6 ---|c|----|d|--- 3 +-+ +-+ | | 5 4 Consider an example where node 4 wants to send a packet to node 0 and X-Y routing is used. Node 4 would therefore say where WEST would be the direction the packet should take right now and 0 is the ultimate destination. Switch 'd' would then send the packet to the west and at the same time perform the next route lookup and send to switch 'c'. Since NEXTHOPWIDTH is one-hot-coded, what node 4 would say would actually be <4'b1000, 5'd0>. The encoding of nexthop is WEST, NORTH, EAST, SOUTH, LOCAL. (There is no way to send a packet back the way it came, so that particular bit is removed from the NEXTHOP vector, reducing it to only 4 bits.) In the case of the south port of switch 'd', the nexthop bits would signify . 6. Topology ----------- Any topology can be used as long as it has a deadlock free routing algorithm. (Well, you could probably also create a topology which does not utilize deadlock free routing as long as the traffic patterns are guaranteed to never generate deadlocks. Not something I would recommend though.) All example scripts in this distribution uses X-Y based routing though. (Mainly because it is easy to understand and debug.) 7. Additional protocol information ---------------------------------- There is unfortunately no further protocol documentation available here. If the esteemed reader of this document is familiar with academia he or she will know that most research projects are usually not so polished, they are just worked on until they are good enough. There is seldom any time left to really clean up the research project except in a few rare cases. This NoC project is no exception to this. If you are interested in the protocol, take a look at the files in "tb" for a test source and a testsink that should be fairly understandable. 8. NoC switch design -------------------- Most of the components of the NoC switch is instantiated from FPGA primitives. Initially this was because we wanted to floorplan the design to a fixed pattern and didn't want all the nodes in the synthesized design to change names continually. It turned out that the performance of our RLOCed design was almost the same as the performance of a non-floorplanned design so almost all RLOCs were removed. The manual instantiation of FPGA components are still left however. (If anyone wants to try their hand at RLOC again they are certainly welcome to try it out...) Our FPL2007 paper has some additional information about the design as well which we hesitate to reproduce here due to copyright reasons. 9. Wishbone bridge ------------------ The Wishbone bridge has this general architecture: NoC --> FIFO --> FIFO for read requests | | +--+-----------------+---+ | Wishbone Master |-------------> Wishbone bus +------------------------+ | +-----------+ <-- Path for read replies | | V +------------------------+ NoC <---+ Wishbone slave |<------------- Wishbone bus +------------------------+ A read request is automatically queued in the read request FIFO whereas a write request is serviced directly. A read request will only be serviced if the NoC indicates that it is able to receive additional data. It is important to make sure that the read request FIFO is large enough to hold all read requests which can arrive to it. If not, a deadlock might occur. Otherwise, we believe this design to be deadlock-free as long as a deadlock free routing algorithm is used since all messages that are sent are guaranteed to be accepted (sooner or later) by the endnodes. However, a slave on a Wishbone bus could still cause a deadlock by doing something stupid. An example could be an accelerator which will not accept a value written to it until it has written (or read) a couple of memories from the Wishbone bus. (Note that this would deadlock on a normal bus as well, without any NoC involved.) Also, the Wishbone bridge does not handle the rty and err signals. (Handling them in a Wishbone compliant way would be hideously expensive, at least for writes since posted writes would no longer be allowed.) Also, in order to improve read performance, the slave Wishbone interface of the bridge has been augmented with a signal that tells how many words to request. Finally, there is a route lookup table in the bridge so that the nexthop (see section 5) field can be filled in correctly. The bridge should probably be cleaned up and optimized. The NoC switch itself has been optimized fairly well, but the WB bridge turned out to be far trickier than expected and only rudimentary optimizations has been done. (A clean redesign might be in order at some point, taking into account all the experience gained from the first implementation.) NOTE: The Wishbone bridge requires DATAWIDTH to be 36! 10. Makefile targets -------------------- clean: As usual sx35.bit: Build a 2x2 NoC with 12 memories and 12 transaction generators. The UCF file is designed to work with AVnet's XC4VSX35-10 board. Use util/talk.c to test the design briefly. sx35.sim: Simulate the above mentioned NoC. tb_pkt.sim: Simulate a 2x2 noc testall: Run a couple of testbenches. Results should be in the testlog directory. If you want to use ISE instead it should be easy to create sx35.bit from an ISE project. The other targets might be a bit more difficult (testall in particular). 11. Noteworthy files/directories -------------------------------- bus: Contains some Wishbone bus components generated: The testbenches puts some generated files here include: Definition file for the NoC. Configure the bitwidth here. noc_pkt: The implementation of a NoC switch README: You are reading it... :) serial_wb: A small 8-bit MCU based system meant to interface a Wishbone bus to an RS232 interface. simulator: This directory contains the compiled RTL model when simulating it sx35: The design for the AVnet SX35 board synthdir: The synthesized files will appear here tb: Testbench files testlog: Testbench output goes here tests: Some small regression tests wbmaster: A transaction generator that will write to memory and read back the content to see if it is identical wbslave: A wishbone slave containing a byte writable memory consisting of 8 RAMB16. wrapper: The Wishbone bridge Documentation: Mostly missing in action... 12. Final words --------------- Even though there are some limitations and the documentation is sadly lacking, we still hope that these designs might be useful for people, especially in the NoC research field who want to compare their architecture to other architectures. The NoC switches should not be very hard to get working at least. The Wishbone bridge will be a bit more difficult. Your best bet is to look at the existing design/testbench to see how it is used. Do bear in mind that this should be considered Alpha code at the moment. If you are wondering about anything specific, feel free to send an email and ask about it and I'll see if I can answer it. If I get lots of email about this chances are I will be motivated to update the documentation instead of answering every email separately. /Andreas