Oscar Papel's Web Log

Friday, November 26, 2004

29.97 fps? what's up with that...

I've been recently delving into the IEEE-1394 specs (also known as Firewire or i.Link). It is a general purpose bus that, unlike USB, is not PC specific. The various specs work together in stack fashion similar to the ISO network stack with each layer building upon the layer below. The bus is capable of asynchronous communication but it is it's isochronous capabilities that make it suitable for carrying time-based data such as video and audio.
The Firewire 400 spec (1394a) uses a native 125 ns clock. When video transfers over firewire isochronously, the necessary bandwidth is reserved first. Once this is done, you just start streaming and wait for the video to pour in. Each frame of video is divided into bus frames. When the first frame's reserved bandwidth is filled, the remaining data is sent in subsequent bus frames. Well, it turns out PAL video (25fps) transfers just fine since a new video frame starts every (8000/25) 320th bus frame. However, NTSC video (30fps) has a problem. 8000/30 is approximately 266.6666 repeating. This means that a minimum of 267 frames are necessary (the last frame is padded out). This means that a 1/30th of a second is really 267/8000 of a second which gives you a frame rate of 29.962546816479 fps.
Audio is different. Audio has no inherent audio frame the way that video has a video frame. You can sample audio at whatever frequency you want and with as much precision you want from as many independent sources you want. You can divide it up into packets of whatever is convenient for transport without any fuss at all.
So what happens when you try and sync separate audio and video sources? Well, if the video is timed at 29.97 fps but the software assumes it is 30fps then after an hour, the lip movements will be almost 135 video frames out of sync.
It is important to realize that you can avoid this problem completely if you change your camera so that video frame boundaries do NOT coincide with bus boundaries. Then, your video data is just a stream of bytes that will always fit neatly into bus frames, albeit starting generally somewhere inside a bus frame instead of at the beginning. You need to add information to your video data to recognize a frame start. A little bit of overhead and complexity solves this problem.
So there you go. Crazy frame rates explained.

Thursday, November 18, 2004

How sharp is sharp?...

Ok, I just came across an article over at cross-platform.net that talks about doing precisely what I am doing with Damocles, namely, writing cross-platform native code that has a managed cross-platform binding in C#. The article mentions that my design pattern has a very subtle bug that can occur due to the fact that the C# garbage collector might kick in AFTER marshalling the args to an interop call but before the call actually finishes, causing the (admittedly highly unlikely) problem where the finalizer gets called and frees the unmanaged object too soon.
Now, this can't occur if you always dispose of your objects properly. It is slightly more likely to occur in a multithreaded scenario since the GC can run simultaneously with unmanaged code. Regardless, the solution is easy, and so I've decided to add it to the DamoclesSharp project.
It boils down to this: Instead of storing a private IntPtr, you instead store a HandleRef which contains an IntPtr. It also contains a reference to the managed object forever tying the managed and unmanaged sides together. This way, the managed object is held until the HandleRef is disposed of, thereby guaranteeing the lifetime of the original object.
Is it a bug? Is it just a theoretical possibility that would never happen in real life? It sure hasn't happened yet, but I haven't run any multithreaded tests either. Also, even though the possibility is very small, the consequences are huge. Processing memory that has been freed can result in a segfault or even cause the Blue Screen of Death if processing a buffer owned by the kernel, (say a video buffer that was wrapped).

So the code now looks like this:


public class Image : IDisposable {

private HandleRef native=NULL;

public Image() { ... }
public Image(IntPtr NativePtr) { native=new HandleRef(this,NativePtr); ... }
~Image() {....}

#region IDisposable implementation
...
#endregion

#region Interop calls
[DllImport(Damocles.DllName)] private static extern IntPtr AllocateImage(...)
[DllImport(Damocles.DllName)] private static extern void AddImageReference(...)
[DllImport(Damocles.DllName)] private static extern void FreeImage(...)
... other Interop functions here ...
#endregion

}


So this way, we avoid getting cut with DamoclesSharp (I'm sorry, I couldn't resist the pun!)

Thursday, November 11, 2004

Damocles is Sharp!

Damocles can also be used from any managed language. Known as DamoclesSharp, it is a library that provides managed wrappers or equivalents to the C "objects" of Damocles. A typical class looks like the following:


public class Image : IDisposable {

private IntPtr nativePtr=NULL;

public Image() { ... }
public Image(IntPtr NativePtr) { ... }
~Image() {....}

#region IDisposable implementation
...
#endregion

#region Interop calls
[DllImport(Damocles.DllName)] private static extern IntPtr AllocateImage(...)
[DllImport(Damocles.DllName)] private static extern void AddImageReference(...)
[DllImport(Damocles.DllName)] private static extern void FreeImage(...)
... other Interop functions here ...
#endregion

}



The class is implemented with the unmanaged resources design pattern. That is, it creates, modifies, and frees an unmanaged resource (in this case, our C "object"). It implements IDisposable to signal that it contains unmanaged resources and it implements a finalizer that works with the IDisposable implementation to ensure that the C "object" will be freed when the managed object is. Again, the object can create the underlying C "object" or can be created to wrap an existing one.

Just like the ObjC implementation, the C# implementation works with the C memory management to ensure no memory leaks.

Also like the ObjC implementation, there are some optimizations in place. Since the IntPtr is just a managed wrapper around a pinned address to memory, there is no need to cross the interop boundary to read/write the contents of the underlying object. Care has been taken to compensate for the size of the pointer during this. As a result, reading and writing the object properties can be done without incurring an interop penalty. Again, just like the ObjC optimizations, any code using the class is completely unchanged and continues to work as before.

This code wraps Damocles.dll on the PC and libDamocles.dylib on the Mac. A simple config file makes this interop layer code portable.

Damocles from C to ObjC

Now that we have some memory managed "objects" in C with some functions that use them, we need to make real OO objects. Each ObjC object has a corresponding C "object".


@interface NSImage : NSObject {
Image *image;
}
- (id)init;
- (id)initWithImage:(Image *)pImage;
- (void)dealloc;
..... other messages ...
@end


When you allocate a NSImage object, with the init message, it allocates an underlying C "object". When the dealloc message is called, either through a release message or indirectly via a [NSAutoReleasePool release] message, the underlying C "object" is released. This way, ObjC object memory management works with C "objects" to make memory management seamless.

When you allocate a NSImage object with the initWithImage message, it performs an AddRef on the C "object". This way, it gets it's own reference to the C "object" which it can release in the same way that the first NSImage object does.

All these ObjC objects get compiled into a native Mac Framework and can be directly used by any Cocoa-based Mac project. This is how EyeImage (one of our commercial products) gets it's functionality.

There are some minor variations. For instance, an ObjC NSImage object might temporarily allocate a C Image "object" in order to generate a displayable image then immediately deallocate it. For performance reasons, no ObjC object wraps this temporary object. Such cases are rare, however. Premature optimization is the root of all software evil. In this case, however, the optimization was deemed worth it. This optimization does not cause a change in any software that uses NSImage. Optimization across an interface boundary is almost always an indication of bad design.

Memory management in Damocles

Memory management is a tricky subject when dealing with native code. ObjC uses reference counting similar to COM's. Managed code uses a garbage collector. In order to be flexible, Damocles has to be able to "wrap" buffers it didn't allocate and support multiple pointer references. This way, it can support multiple views/subviews of the same data. as well as be "held" by more than one object.

I came up with a reference counting mechanism that does both and simplifies memory management for non-trivial cases.

Each "object" is really a c-style structure that has fields, a pointer to it's data payload and a reference count.

example:

typedef struct {
void *pData;
S32 RefCount;
U32 width;
U32 height;
} Image;


S32 and U32 have been defined in os.h as a signed and unsigned (respectively) 32bit integer. os.h shields Damocles from platform differences in C's integer definition making the code more portable. Also, void * will be a 32bit value on 32bit systems and a 64bit value on 64bit systems.

When an Image struct is allocated, you have 2 options. You can pass a pointer to an existing buffer in which case RefCount will be set to -1. Calling AddRefCount on Image will cause RefCount to go to -2. Calling Free on Image causes RefCount to go back to -1. Calling Free again causes Image to throw away the pointer and deallocate the structure.

If you DON'T pass a pointer to an existing buffer, one is allocated for you and RefCount=1; Calling AddRefCount on Image will cause RefCount to go to 2. Calling Free on Image causes RefCount to go back to 1. Calling Free again causes the pointer to be freed and then the structure to be freed.

In this way, you can have multiple references to a struct that holds data that may or may not be wrapped. You can Free the Image in an arbitrary order to the order they were Allocated or AddRef'd and the last one will properly give up or deallocate the buffer properly.

This was critical to correct operation where you need to return a linked list of objects. This way, the list holds it's own reference. you can select an object from the list, AddRef it, destroy the list, use the object then Free it and everything works the way it should.

Obsidian, meet Damocles.

As started in the previous post, Damocles was born out of the need to get native performance in my C# classes. Damocles is implemented in portable ANSI C. It is a collection of very low level functions to do imaging. It is not optimized to favour one CPU or another although I did do a lot of work to minimize the working set and instruction count. It is fast but not hand-tuned assembly fast. It does NOT use the Altivec instructions on the PPC or the MMX/SSE instructions on the PC. These will be addressed later. Damocles was designed to be a well performing PORTABLE base.

Considering the lack of niceties in plain C, it is amazing that elegant code can be written using it. Granted, when you are writing 18 versions of Boolean XOR, programming niceties aren't what gets the job done. It's small, efficient, tight code that does.

I mentioned 18 because that's how many different pixel formats that Damocles currently supports. That does NOT count paletteized versions. Due to the need to support higher than 8 bits per channel images as efficiently as possible, there is direct support for 8,10,12,14 & 16 bits per channel as well as single precision floating point support. Since there can be Gray, RGB, and RGBA colorspaces, that makes 6 choices for channel depth x 3 colorspaces = 18 pixel formats.

How to write Software... (part 4)

I should mention that I have a bias towards C# as a language. It is modern, OO, and has lots of language features that lend themselves to writing elegant programs. I also follow the progress of the Mono:: and DotGNU Portable.NET open source projects that allow C# to run on several platforms. The platforms I am most concerned about are (in order) Windows XP, Mac OS/X, and Linux (x86 & PPC).

What is NOT available is a cross platform UI for C#. There are candidates (wxNET, GTKSharp, TK#, SWF, SWT) but they are either incomplete, defunct, or just not ready yet for all my platforms.

Well, the obvious choice was to separate UI from functionality (always a good decision anyway) and write my imaging library first in C#. By then, surely, there would be a UI that I could use. If I put my UI behind a C# interface, then I could even load a different UI for each different platform. The same with hardware support. Everything behind an interface. I just need to do a little reflection discovery at load time to see what I have available.

I designed the project and called it Obsidian. Why not. I wrote the discovery code and the library. Lots of formal interfaces.
It was implemented in several assemblies. A real medium-to-large size C# app sans a UI.

A year passed. I kept up to date with all kinds of projects, both commercial and free. And guess what. I got burned. No UI materialized that was even a 1.0 release. I even tried starting interest in creating a project called CocoaSharp. It was supposed to bring Cocoa (the Mac's native Object framework) into the managed world. I worked a few weeks on it and released a version 0.1 that wrapped a few key Cocoa UI classes. Nobody was interested in contributing to the work and I was looking at wrapping by hand hundreds of classes (which were changing underneath me with new OS/X updates). I gave up and e-mailed the code so someone who asked for it and called it a life lesson. Note that my CocoaSharp project predated but was otherwise unrelated to the current CocoaSharp project underway, at least to my knowledge.

As it turns out, processing pixels (even in unsafe code) is not a performance demon under dotNET 1.0 or 1.1 so I got to thinking that C# might not be the full answer.

I had some experience with Interop (the method that managed code interfaced with native code) while writing managed camera classes that linked with some native libraries on the PC and the Mac. It even chose at runtime which native library to use through the same discovery mechanism I had written for the rest of Obsidian.

I decided it was time to bite the bullet. Write the lowest level functions in C. Plain C. K&R C. no modern conveniences C. portable C. Then, I could wrap them into ObjC objects on the Mac, C++ objects on Linux, and C# objects wherever a CLR was available. Damocles was born.

How to write Software... (part 3)

So, let's start. I need to write imaging software that is cross-platform. It should perform and feel like a native application on every platform it supports. It should also be able to take advantage of any platform specific features but degrade gracefully on platforms that lack those features. It should be capable of taking advantage of multiple processors. It should be pointer size agnostic (32 or 64 bit)

Programming language choices available to me:
C
C++
Managed C++
ObjC
C#
Java
Visual Basic
VB.NET

C : Granddaddy of them all
pro's - ubiquitous. performs well. portable.
con's - low level, lack of object oriented concepts, verbose.

C++ : C with objects
pro's - same as C but with OO concepts
con's - interfacing and binding are more difficult than C

Managed C++ : Managed C with objects
pro's - same as C++ but runs within the confines of the CLR.
con's - performance is not as good as C++, not portable.

ObjC - Native objects on OS/X
pro's - interfacing done at runtime
con's - not portable, message based calls slower than direct function calls.

C# - Managed C with objects
pro's - easy language, portable using mono or pnet on non MS platforms.
con's - performance is not as good as C, C++

Java - portable C-like language
pro's - portable
con's - different platforms have slightly different implementations, performance

Visual Basic, VB.NET - missing critical pointer manipulation

In the next part, we'll see what I chose and why I chose it.

How to write Software... (part 2)

I guess I should mention that I'm talking here about writing imaging software. In other words, the software that I write creates, enhances, processes, analyzes and extracts and captures images. If I were a database programmer then this thread would be very different.

The first step in writing any software is gathering your requirements.

The "What's"
What should the software do?
What O/S does it need to run on?
What software does it need to interact with?
What hardware does it need to interact with?
What are the time constraints?

The "Who's"
Who is the end user/operator?
Who is paying for the software?
Who will maintain the software?

The "Where's"
Where does the software need to be developed?
Where will training/installation take place?

The "When's"
When does the software need to be done?
When is the software finished?
When are updates needed?

The "How's"
How will feedback take place?
How will upgrades occur?

There are a lot of questions. And I haven't asked them all.
Since software is never really done, the more questions you ask now, the better off you will be later in the product life cycle.
Note that these questions assume that you are being paid to write software. When you write software for free, most of these questions can be answered by "whenever I get to it" or "next time I'm doing something with that". There is a lot of freedom in NOT getting paid.

How to write Software... (part 1)

This seems like such an easy thing to do. I mean, people write software every day. I don't pretend to know everything about writing software. There is so many different kinds of software out there that it would be foolish to think that there is ONE way that is best. But I have been thinking about this so I thought I'd write down what I've come up with as well as talk about some of my recent attempts at it. I don't have all the answers. I don't even have all the questions. Let's dance.