I'm convinced that so-called 'v-blogging' is a fad. Why?
Because I can read ten times faster than anyone can speak: I'm not alone. In the time it takes a video from youtube to buffer, you'll already have read this paragraph.
Other reasons?
Text is searchable. That's the big roadblock. I know there are many sites out there claiming to be able to search video content using speech recognition, but until voice recognition takes a quantum leap the only option is manual transcription. And who's going to do that for the tens of thousands of videos on sites around the net? No-one.