Quote: "I know in the old days the video memory just held the entire screen and the DA converter converted the data on the video memory into analog signals that were sent to the screen. "
Oddly enough the principal hasn't really changed, we've always had 'video memory' and chips to decode frame buffer(s) and spit it out the display device (TV/monitor)
Many older systems (from 8bit -> 32bit) though used a shared memory approach. So graphics / audio and CPU code all have access to a shared pool of memory. Rather than the separate cache ideal that's become the norm today.
The interfaces for programming old school graphics chips for example are simply setting the chip modes by writing directly to the devices hardware interface. The interface is mapped into visible memory (normally as overlay), so you can write the modes to the a certain address, and set it's display properties. So if you wanted a sprite, or wanted async memory transfer you write the info to the devices and enabled it.
Many people here probably have a C64 background, so if you're really interested then I'd recommend downloading a c64 emulator and having a crack at programming it's VIC chip, which you can do from it's built in BASIC.
Programming hardware interface directly was/is fun, but it doesn't really work if you want interchangeable components. So these days the hardware interfaces are abstracted from the programmer behind drivers. So the driver gives us (via the OS) a common interface, how it's implemented behind closed doors is mostly irrelevant to us.
Quote: "I would guess that in that area of graphics processing the entire video is sent to video memory then played by the GPU"
Given the size of any reasonable video clip, then caching the entire thing in video memory isn't a viable option.
In the simplest model, the data packets are streamed from the source (where ever it may be), they're then decompressed as frame data and pushed into video memory for display, the same for audio.
Given the diversity of PC hardware the decompression passes need to have CPU by default. But there's nothing stopping the decompression code from using local hardware if available. For example perhaps the system has such a decoder in hardware (MPEG, DSP ), or GPU shaders etc. Then the data path can be offloaded from the CPU's back.