29.97 fps? what's up with that...
The Firewire 400 spec (1394a) uses a native 125 ns clock. When video transfers over firewire isochronously, the necessary bandwidth is reserved first. Once this is done, you just start streaming and wait for the video to pour in. Each frame of video is divided into bus frames. When the first frame's reserved bandwidth is filled, the remaining data is sent in subsequent bus frames. Well, it turns out PAL video (25fps) transfers just fine since a new video frame starts every (8000/25) 320th bus frame. However, NTSC video (30fps) has a problem. 8000/30 is approximately 266.6666 repeating. This means that a minimum of 267 frames are necessary (the last frame is padded out). This means that a 1/30th of a second is really 267/8000 of a second which gives you a frame rate of 29.962546816479 fps.
Audio is different. Audio has no inherent audio frame the way that video has a video frame. You can sample audio at whatever frequency you want and with as much precision you want from as many independent sources you want. You can divide it up into packets of whatever is convenient for transport without any fuss at all.
So what happens when you try and sync separate audio and video sources? Well, if the video is timed at 29.97 fps but the software assumes it is 30fps then after an hour, the lip movements will be almost 135 video frames out of sync.
It is important to realize that you can avoid this problem completely if you change your camera so that video frame boundaries do NOT coincide with bus boundaries. Then, your video data is just a stream of bytes that will always fit neatly into bus frames, albeit starting generally somewhere inside a bus frame instead of at the beginning. You need to add information to your video data to recognize a frame start. A little bit of overhead and complexity solves this problem.
So there you go. Crazy frame rates explained.