I followed Alex Huth's semantic encoding notebook on story listening fMRI and learned a practical recipe: build an encoding model that turns timelines of sound, phonemes, and meaning into predicted fMRI signals per voxel; smooth then downsample features with a Lanczos low pass filter so they match the TR without fake slow patterns; model the hemodynamic lag with a small FIR set of delays; score on held out story segments by correlation and map where predictions are strongest. The maps show that speech engages auditory and language networks, not the whole brain. This matters for interpretable science, future non invasive decoding of the gist of language, clinical language mapping, and testing which AI features align with cortex.