Rise Lab NITT

Equipment Details

The gem5 Simulator System

  • A modular platform for computer system architecture research.

  • Encompassing system-level architecture as well as processor microarchitecture

  • Full-system capability.

Bluespec

  • A high-level functional hardware description programming language.

  • It leads to shorter, more abstract, and verifiable (provably correct) source code.

  • More than 50% improvements compared to conventional methods of design

Altera Modelsim

  • Recommended for simulating all FPGA designs

  • 33 percent faster simulation performance than ModelSim®-Altera® Starter Edition software

  • No line limitations

Xilinx ISE Design Suite

  • Xilinx ISE (Integrated Synthesis Environment) is a software tool produced by Xilinx for synthesis and analysis of HDL designs, enabling the developer to synthesize their designs, perform timing analysis, examine RTL diagrams, simulate a design's reaction to different stimuli, and configure the target device with the programmer.

  • It is a design environment for FPGA products from Xilinx, and is tightly-coupled to the architecture of such chips, and cannot be used with FPGA products from other vendors. 

  • Primarily used for circuit synthesis and design, while the ModelSim logic simulator is used for system-level testing.

  • Other components shipped with the Xilinx ISE include the Embedded Development Kit (EDK), a Software Development Kit (SDK) and Chip Scope Pro.

CACTI (Cache Access Cycle Time Indicator)

  • Cache and memory access time indicator

  • Capable of modelling direct and set associative cache.

  • Capable of estimating

    • Cycle time

    • Access time

    • Area

    • Leakage power

Hardware Components

 

  • Intel Galileo Boards

    • Galileo is a microcontroller board based on the Intel® Quark SoC X1000 Application Processor, a 32-bit Intel Pentium-class system on a chip. It’s the first board based on Intel® architecture designed to be hardware and software pin-compatible with Arduino shields designed for the Uno R3.

    • It is also software compatible with the Arduino Software Development Environment (IDE), which makes usability and introduction a snap.

    • In addition to Arduino hardware and software compatibility, the Galileo board has several PC industry standard I/O ports and features to expand native usage and capabilities beyond the Arduino shield ecosystem.

    • A full sized mini-PCI Express slot, 100Mb Ethernet port, Micro-SD slot, RS-232 serial port, USB Host port, USB Client port, and 8MByte NOR flash come standard on the board.

  • NXP LPC 11U24 microcontroller boards

    • The mbed Microcontrollers are a series of ARM microcontroller development boards designed for rapid prototyping.

    • The mbed NXP LPC11U24 Microcontroller in particular is designed for prototyping low cost USB devices, battery powered applications and 32-bit ARM® Cortex™-M0 based designs.

    • It is packaged as a small DIP form-factor for prototyping with through-hole PCBs, strip board and breadboard, and includes a built-in USB FLASH programmer.

  • NXP LPC 1768 microcontroller boards

    • The mbed NXP LPC1768 Microcontroller in particular is designed for prototyping all sorts of devices, especially those including Ethernet, USB, and the flexibility of lots of peripheral interfaces and FLASH memory.

    • It is packaged as a small DIP form-factor for prototyping with through-hole PCBs, strip board and breadboard, and includes a built-in USB FLASH programmer.

    • It is based on the NXP LPC1768, with a 32-bit ARM Cortex-M3 core running at 96MHz. It includes 512KB FLASH, 32KB RAM and lots of interfaces including built-in Ethernet, USB Host and Device, CAN, SPI, I2C, ADC, DAC, PWM and other I/O interfaces.

  • Zed Boards

    • ZedBoard is a low-cost development board for the Xilinx Zynq™-7000 All Programmable SoC (AP SoC).

    • This board contains everything necessary to create a Linux, Android, Windows® or other OS/RTOS based design.

    • Additionally, several expansion connectors expose the processing system and programmable logic I/Os for easy user access.

    • Can take advantage of the Zynq-7000 AP SoC’s tightly coupled ARM® processing system and 7 series programmable logic to create unique and powerful designs with the ZedBoard.

    • The ZedBoard kit is supported by the www.zedboard.org community website where users can collaborate with other engineers also working on Zynq designs.
  • Zybo Boards

    • The ZYBO (Zynq Board) is a feature-rich, ready-to-use, entry-level embedded software and digital circuit development platform built around the smallest member of the Xilinx Zynq-7000 family, the Z-7010.

    • It is based on the Xilinx All Programmable System-on-Chip (AP SoC) architecture, which tightly integrates a dual-core ARM Cortex-A9 processor with Xilinx 7-series Field Programmable Gate Array (FPGA) logic.

    • When coupled with the rich set of multimedia and connectivity peripherals available on the ZYBO, the Zynq Z-7010 can host a whole system design. The on-board memories, video and audio I/O, dual-role USB, Ethernet and SD slot will have your design up-and-ready with no additional hardware needed.

    •  Additionally, six Pmod connectors are available to put any design on an easy growth path. The ZYBO provides an ultra-low cost alternative to the ZedBoard for designers that don't require the high-density I/O of the FMC connector, but still wish to leverage the massive processing power and extensibility of the Zynq AP SoC architecture.

  •  

List of M.Tech Projects in the RISE Lab

 

1

206110022

Srinivas V.V.

Enhancing QoS For Efficient Video Streaming In A Typical Cloud Environment

Dec-11

2

206110018

Chaitanya Vagga

Block Scheduling For Multicore Platform

Dec-11

3

206110022

Srinivas V.V.

Performance Evaluation of Stream Log Collection Using HADOOP Distributed File System

May-12

4

206110018

Chaitanya Vagga

Deadlock Free Scheduling For Distributed Systems

May-12

5

206112028

V.V.Varadhan

Hardware Assisted Scheduler for Multicore Architecture

Dec-13

6

206112001

Divya Patel

Automation of Power and Performance Validation Flow of Server Processor

May-14

7

206112028

V.V.Varadhan

A Prototype of Multi-core Cryptographic Processor with Hardware Scheduler

May-14

8

206113004

T. Vidya

RABED: Design of a reconfigurable associativity and block size embedded dynamic data cache

July-14

9

206113009

Ankam Koti

Veera Kumar

Implementation of parallel Irregular and Recursive Algorthims using open-MP

July-14

10

206113004

T. Vidya

Design of an interconnect topology for multi-cores and scale-out workloads

Dec-14

11

206113009

Ankam Koti

Veera Kumar

Design and implementation of lightstor protocol controller

Dec-14

12

206113004

T. Vidya

Scaling a noc topology and implementation of routing algorithms

May-15

13

206113009

Ankam Koti

Veera Kumar

Parallel Implementation of ACO with local search for Multi-Dimensional Knapsack Problem

May-15

14

206114023

S. Hemanthkumar

Mobile application for voice Messaging

July-15

15

206114026

Revathi Uddaraju

Implementation of MSI Cache coherence Protocol using Bluespec in Noc-(cache)

July-15

16

206114004

Prakash Borkar

Implementation of MSI Cache coherence Protocol using Bluespec in Noc-(Directory)

July-15

17

206114023

S. Hemanthkumar

Improving performance of h.264 video encoding on cpu+gpu systems

Dec-15

18

206114004

Prakash Borkar

Dynamic Cache Reconfiguration for Improved Performance

Dec-15

19

206114026

Revathi Uddaraju

Design of fault tolerance framework for faults that occur in soc

Dec-15

20

206114023

S. Hemanthkumar

Improving Performance of Text data Compression on CPU+GPU Systems

May-15

21

206114004

Prakash Borkar

Design of dynamic reconfigurable cache to improve energy efficiency

May-15

22

206114026

Revathi Uddaraju

Fault tolerance sysem for information and time redundancy

May-15

23

206115023

M Karthikeyan

CUDA Implementation of Speech Processing Algorithms

May-16

24

206115023

M Karthikeyan

Scaling Existing Lock Based Applications using Adaptive Lock Elision

Dec-16

 

On Going Projects in RISE LAB 2014-2017

Performance Analysis of Deep Architectures

Manycore architecture system includes more number of processing elements to improve the performance while sustaining power considerations. Accelerating heterogeneous manycore computing elements involves huge amount of memory copy, computation and thread management. Applications of manycore architectures range from desktop computer to ware-house-scale computer. Deeplearning applications utilize full capability of manycore architecture such as GPU to ease the complexities involved in it. Visual understanding and speech processing are two major applications of deeplearning which require efficient deep architecture models to train the system for prediction and classification tasks. Hybrid deep architecture models comprising convolutional neural networks and recurrent neural networks are used to achieve good accuracy in computer vision tasks.

Designing a trustworthy system in programmable SOC

An embedded system is an electronic system that has a software and  is embedded in computer hardware. The Evergreen blooming technology is embedded system, it is used everywhere in modern life, starting from consumer electronics, Education, telecommunication, home appliance, transportation, industry, medical and military.  It is programmable or non- programmable depending on the application.

The Complexity of the real time system and time to market is increasing tremendously due to speed and adaptability, single SOC is not a solution for such SOC design, Hence the multiprocessor SOC is very much essential in today’s world.  A framework has been proposed to manage resource, reduced power and to design a trustworthy system in programmable SOC for many components which is plugged into FPGA. As chips increase in complexity, trustworthy processing of sensitive information can become increasingly difficult to achieve due to extensive on-chip resource sharing and the lack of corresponding protection mechanisms. A Physical Unclonable Function (PUF) is a function with certain desirable properties, it should be easy to make, but “impossible” to duplicate. A PUF is basically a variability-aware circuit which is able to detect the mismatch in circuit components caused by manufacturing process variation. If a PUF circuit is instantiated on several different chips, then each of the PUF instantiations are expected to produce unique responses when supplied with the same challenge. The challenges in designing a security primitive like PUF are multiple: achieving reliability, cryptography, durability to attacks, low power consumption, shrinking of area size and easy system-level integration.

Enhancing Performance of On-Chip Cache for Multicore Architecture

The modern embedded system has to be designed to meet the tremendous changes due to high speed and advancement in technologies. A multi-core processor is a single computing component with two or more independent actual processing units called “cores”, which are the units that read and execute program instructions. The instructions are ordinary CPU instructions such as add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to parallel computing. The most important concern in the design of low power embedded applications is to decrease the consumption of energy by on-chip processors caches, as on-chip cache consumes approximately 40% of the power fed to processors. The size of on-chip cache increases with ever-shrinking features which in turn increases overall energy consumption. Multilevel cache memories such as L1, L2, L3 and L4 are introduced to minimize the consumption of energy. One area to focus on cache is its energy consumption. There are many parts of energy consumption to look into. Optimizing energy consumption of caches can be achieved by optimizing cache access, dynamic partitioning of caches, reconfiguring caches and predicting tag access. These smaller areas contribute a significant amount in the increased energy consumption of caches, so optimizing any one of them will decrease energy consumption of caches.

Design Space Exploration for Architectural Synthesis

System designers develop models in high level languages such as C or C++ which offers higher levels of abstraction. This makes verification of the model and its functionality easy. This also makes reusability of the code possible. For developing the corresponding hardware, hardware designers must analyze the high level code and select a suitable hardware the given code. An important challenge in converting this high level design to equivalent hardware design is the many design possibilities that need to be considered. The design spaces usually involve multiple metrics of interest such as timing, resource usage, energy usage, cost, etc. and multiple design parameters like the number and type of processing cores, sizes and organization of memories, interconnect, scheduling and arbitration policies. The relation between design choices on the one hand and metrics of interest on the other hand is often very difficult to establish, due to aspects such as concurrency, dynamic application behavior, and resource sharing. No single modeling approach or analysis tool is fit to cope with all the challenges of modern hardware design. The architecture designed should be optimized to achieve best trade-offs in the selected metrics of interest. This selection process is called Design Space Exploration and it is iterative in nature. It includes a vast set of design choices and relies largely on the decision of the architect. The number of design constructs for a particular design is huge and it exponentially increases the problem complexity. Design Space Exploration involves optimizing the design by selecting components that minimize or maximize the metrics of interest as needed. This optimization problem has multiple, often conflicting objectives that need to be achieved during designing. For example, the optimization problem may need to minimize power consumption under an execution time constraint or vice versa. An exploration algorithm that can achieve performance equivalent to complete exploration of design space with a practical execution time needs to be developed.

Implementation of RAVEL & GRUU Feature for IMS 3GPP Release-11

IP Multimedia Subsystem (IMS) is a standardized 3GPP architectural framework for providing access level independent services to users. IMS flows all the data in Packet Switched Domain with service interoperability with CS domain. IMS uses the SIP protocol as the base for all its communication and signaling aspects. The Globally Routable User Agent URI (GRUU) feature entitles for a unique determination of UE instance, in situations where multiple contacts are registered under the same public user identity. So their exits both the provision for uni-casting and multi casting of requests. These GRUU URI is created for universally unique identification of users.

Design and Implementation of Lightstor Protocol Controller

Lightstor is a new interface specification that allows host software to communicate with a non-volatile memory subsystem. This interface is optimized for Client solid state drives, attached using a Rapid IO fabric. It aims to build a comprehensive Storage and Backup system with unlimited size and bandwidth scalability.  The Lightstor interface provides optimized command submission and completion paths. It also includes support for parallel operation by supporting multiple I/O Command Queues. While other routable fabrics like Infiniband or quasi-fabrics like PCIe can be used, LightStor benefits are best demonstrated when RapidIO is used.

Design of a Multi-Core Interconnect for Scale-Out Workloads

Scale-out workloads are applications that are typically executed in a cloud environment and exhibit high level of request level parallelism. Such workloads benefit from processor organizations with very high core count (on the order of hundreds to thousands) since multiple requests can be serviced simultaneously by threads running on these cores. The characteristics of these workloads indicate that they have high instruction footprints, operate on large datasets with limited reuse and have minimal coherence activity due to lesser data sharing. Since most of the instructions will reside in the Last Level Cache (LLC) and will be actively shared by all the cores, the reduction in the latency to fetch a block of words of instructions will improve the performance of these workloads and thereby the performance of the system as a whole. The focus of the current work is to minimize this latency by appropriate design of the network that interconnects the multiple cores. Pejman Lotfi-Kamran et al. in the paper NOC-Out: Micro architecting a Scale-Out Processor advocate separating the LLC tiles from the core slice and placing them in a separate part of the chip to reduce LLC access latency. The current work takes this approach and a new network topology connecting cores, LLC slices and routers has been designed. In this design four cores and a LLC slice connect to a router forming a star topology and the routers form a 2D flattened butterfly topology. The current design has been targeted at 8 cores and has been implemented using the Blue spec System Verilog HDL (Hardware Description Language) and the design has been synthesized using Xilinx Vivado 2013.2 targeting Zynq-7000 product family of FPGA boards. The design has been tested for different amounts of offered traffic and the average latency and the throughput of the interconnection network for uniform random traffic pattern has been calculated.

List of Students

 

S.No

Name

Roll No

Category

Year of Admission

Lab

1.

Jobin Jose

406913051

PhD, Part Time

2013

RISE

2.

Shameedha Begum

406913001

PhD, Part Time

2013

RISE

3.

J. Kokila

406114002

PhD, DIETY Scheme, MHRD, Full Time

2014

RISE

4.

B. Krishna Priya

406114055

PhD, Full Time, Institute

2015

RISE