I think in the end I preferred the article over the video. There's a lot of filler (e.g. just you behind a lectern) and stock/Googled images, some of which fit the context only loosely (a jumping cat to illustrate "subtle"?). Some of them are actually pretty distracting, to the point I have to actually pause the video just to figure out how what I just saw is connected to what I just heard (e.g. charts and xkcd strips). If you really wanna be like Yahtzee, I'd suggest less "lol" pictures and more original art/animation. And one way to keep the images relevant is to imagine you're playing a game of Pictionary...without the words, are people still going to know what you're talking about?
More importantly though, not every noun has to have a corresponding picture. It doesn't surprise me that these videos are exhausting to make, because they're exhausting to *watch* too. One thing I noticed is that you use several images over a short period of time for words that just synonyms for a single concept (e.g. "interested," "engaged," "care about"). Slow down, bro! One image per concept is usually sufficient. Although sometimes using a lot of similar words is a clue that you're actually repeating yourself, so in those cases you could just cut the fat and move on. Less words, less pictures.
By the way, I resent your using Number Munchers as an example of "sucking all the fun out" of games. That game was AWESOME.