Resources

Recommended readings, tools, and reference materials.

Textbooks

Computer Organization and Design (Patterson & Hennessy)

The hardware/software interface — RISC-V edition

Programming Massively Parallel Processors (Kirk & Hwu)

GPU programming fundamentals

Deep Learning (Goodfellow, Bengio, Courville)

Foundation text for neural networks

Tools & Software

llama.cpp

High-performance LLM inference in C/C++ with Vulkan support

ROCm / HIP

AMD GPU compute platform for Linux

DirectML

Hardware-accelerated ML on Windows (AMD/Intel/NVIDIA)

Python + PyTorch

For prototyping and experimentation

Key Papers

Attention Is All You Need (Vaswani et al., 2017)

The transformer architecture paper

Fast Inference from Transformers via Speculative Decoding (Leviathan et al., 2023)

Core speculative decoding paper

GGML/GGUF Format Specification

Model file format used by llama.cpp