Resources
Recommended readings, tools, and reference materials.
Textbooks
Computer Organization and Design (Patterson & Hennessy)
The hardware/software interface — RISC-V edition
Programming Massively Parallel Processors (Kirk & Hwu)
GPU programming fundamentals
Deep Learning (Goodfellow, Bengio, Courville)
Foundation text for neural networks
Tools & Software
llama.cpp
High-performance LLM inference in C/C++ with Vulkan support
ROCm / HIP
AMD GPU compute platform for Linux
DirectML
Hardware-accelerated ML on Windows (AMD/Intel/NVIDIA)
Python + PyTorch
For prototyping and experimentation
Key Papers
Attention Is All You Need (Vaswani et al., 2017)
The transformer architecture paper
Fast Inference from Transformers via Speculative Decoding (Leviathan et al., 2023)
Core speculative decoding paper
GGML/GGUF Format Specification
Model file format used by llama.cpp