Home Documentation/Help Download Education simplescalar.com

The Zesto simulator has been used at Georgia Tech in the course CS8803 Advanced Microarchitecture. This page describes the course and also provides a link to all of the slides used in the most recent offering of this course. The slides and simulator are free to use for the development of a similar course at your university, or to simply augment existing courses. The materials provided here are provided "as is" with no guarantees of correctness or accuracy, but they may be freely used so long as the usage is not for profit (e.g., you cannot sell slides, books, etc., based on this material). If you have any questions, just ask (lohX@Xcc.gatech.edu). If you decide to make use of any of this material in a course you are teaching, it would be great if you could let us know (just for our own internal tracking) as well as include some form of attribution in any slides that you use.

Course Context and Organization

The Advanced Microarchitecture course (referred to as just AMA from now on) at Georgia Tech (GT) is positioned for somewhat more experienced computer architecture PhD and Masters students. At GT, we offer a basic "entry-level" computer architecture course that many students (including both those focusing in computer architecture as well as many other fields as distant as theory and HCI) take. This first-level course typically uses the Hennessey and Patterson "Quantative Approach" textbook or similar, and covers the basics of modern processor architectures at a level of detail that is appropriate for the wide variety of student backgrounds. As a result, there are many low-level details (particularly regarding circuit and microarchitectural-level implementation) and issues that are glossed over, only alluded to, or completely omitted. As a result, the follow-on AMA course is targetted at those students whose PhD research area is in computer architecture or a closely aligned field, or Masters students who intend on pursuing employment in with companies involved in hardware design and implementation.

Our AMA course basically walks through the entire pipeline of a modern out-of-order processor from fetch to commit. The idea is not to explain exactly how every operation is implemented (because the truth is we don't know exactly how each company has implemented each and every block, and even if we did, we'd be under NDA's preventing us from teaching it!), but instead to think about what are the issues in implementing each operation as well as discussing possible ways of doing it (and the pros and cons of such). As a result, there will be slides where a proposed "solution", while possible, may actually not be very practical at all. One of the goals of the course is to make our computer architecture students think hard about what it takes to "simply add one bit" to some microarchitectural structure (and why it may be acceptable for some structures, but not for others), "simply add one pipeline stage", or any of the many hardware changes required to implement many of the proposed techniques that we see in computer architecture research papers.

To the best of our knowledge, there currently do not exist any textbooks that are really well suited for such a course. As a result, many PowerPoint slides were created for teaching this course. While there were no required "textbooks" in the traditional sense, we did include Bob Colwell's "The Pentium Chronicles" (TPC) book as required reading. This provided a really nice, less academic, perspective on designing processors and how technical decisions are made in real projects. This also was very useful in the discussion-oriented portion of the course. Below is a lecture-by-lecture schedule for how the first part of the AMA course was taught in the Spring semester of 2008 (there were two 90-minute lectures per week).

  • Week 1-a (L1): Review (Design objectives and metrics)
  • Week 1-b (L2): Review (Pipelining and superscalars)
  • Week 2-a (L3): Fetch
  • Week 2-b (L4): Branch Prediction
  • Week 3-a (L5): Advanced Fetch
  • Week 3-b (L6): Decode
  • Week 4-a (L7): Register Renaming
  • Week 4-b (L8): Data Capture Scheduling
  • Week 5-a (L9): More Scheduling
  • Week 5-b (L10): ALU and Bypass
  • Week 6-a (L11): Memory Scheduling
  • Week 6-b (L12): Caches and Memory
  • Week 7-a (L13): Commit
  • Week 7-b (L14): Prefetching
  • Week 8-a (L15): Power
  • Week 8-b (L16): Multitheading, Multicore
Following this lecture-based portion of the course, we switched gears into a more discussion-based/paper-reading phase. For the rest of the semester, each lecture period consisted of group discussions based on readings about a processor design (or a family of processors). The readings are from papers from conferences, magazines (especially IEEE Micro), corporate/official documentation and white pages, and even Internet hardware rumor websites. The idea is that through the first part of the course, the students have now had a great exposure to the many implemenation details, issues and challenges throughout the processor. The students have all read Colwell's TPC by this point (regular reading summaries were required), and so they also have some additional perspective on many non-technical issues regarding processor design. Between these two, the students would have a variety of interesting discussions, debates and arguments over how and why different things were done in the different real-world processor designs. We update the reading list every time the course is offered to reflect a mix of classic and cutting-edge designs. This is the list used from Spring 2008:
  • Week 9-a: Intel Pentium-Pro, Pentium-M
  • Week 9-b: Intel P4 (Willammette and Prescott)
  • Week 10-a: Intel Core, Core 2
  • Week 10-b: AMD Opteron, Barcelona
  • Week 11-a: DEC Alpha 21264/364/464
  • Week 11-b: IBM Power 4/5/6
  • Week 12-a: Itanium (Madison and Montecito)
  • Week 12-b: Sun Niagara, Niagara 2
  • Week 13-a: Sony/IBM Cell, Cell 2
  • Week 13-b: Microsoft XBox, XBox 360
  • Week 14-a: Intel Nehalem (Core i7)
The remaining class slots were used for a variety things related to term projects. The course ended with a mini-conference where students gave short talks on their term projects. Prior to the conference portion, students (or groups of students) were required to submit short write-ups in the form of 4-6 page papers (similar in format to typical workshop papers), which were then anonymized and redistributed among the students for double-blind peer review. This provided the students with a little taste of how conference organization works, and at the end of the conference best paper and best presentation awards were handed out. (I got the idea of running this project portion of the course in a conference style from Prof. Alex Orso)

Assignments and Projects

In addition to the reading of Colwell's TPC and the paper readings for the discussions, a series of homework assignments were also given during the first half of the semester during the "lecture part". Besides regular pencil-and-paper problems, the homeworks included a variety of simple Zesto hacking components to get the students more familiar with the code so that they would be more productive once we got to working on term projects. Examples including modifying the simulator to model additional delays between resource allocation and insertion into the reservation stations, implementing separate banks of reservation stations for integer and FP operations, adding options for cache insertion policies, and other tasks.

The students then made use of the Zesto simulator for implementing their own ideas for their term projects. To our delight, two of the projects from this course (out of seven) resulted in papers that were presented at the CMP-MSI workshop held in conjunction with ISCA 2008 in Beijing (paper 1, paper 2).

Potential Usage of this Material

Obviously you could take this material and offer a course structured in the same exact way. If your university does not offer both introductory and advanced computer architecture courses at the graduate level, then you could sample/subset some of the information from these slides and incorporate them in your existing course. Because our students already had a lot of background in basic out-of-order processor architectures, we were able to maintain a pretty quick pace for the course (often covering 50+ slides in a single 90-minute lecture). Depending on the makeup of your students, you may wish to cut down on the paper-discussion portion of the course and present the material in the slides at a more relaxed pace. If you have any other creative or different ways of incorporating these slides into your course, please let us know!

Updates, To-Do's, ...

The slides provided below were last prepared in Spring of 2008.

Give me the Slides Already!

Click here to download all of the slides in one .zip file. All slides are in PowerPoint 2007 format. Requests for other formats will be sliently ignored. Some slides contain some material that many ECE curricula may consider as pretty elementary; these slides were prepared for CS students who may not have a strong ECE/EE background (although there were ECE students enrolled in this course as well).

Zesto Simulator
Georgia Institute of Technology
College of Computing
Copyright 2009
Last modified 29 Jan '09