Oscar Papel's Web Log

Tuesday, April 05, 2005

Windows.Forms and Mac OS/X

I decided to create a step-by-step tutorial on how to get Windows.Forms based PC apps running on Mac OS/X. As an outsider to the project, I'm going to restrict myself to public releases.

Prerequisites:

Before we begin, we need to check the minimum system requirements:
- Mac OS/X Panther 10.3.x (I used 10.3.8)
- X11 for Mac OS/X ( this is an install option for Panther located on the third CD ). This requirement may go away in the future but for now just make sure you have it installed.

Step 1: Get Mono

mono

Download Mono.Framework-1.1.6.dmg from the Mono website (http://www.mono-project.com/Downloads).

drive

Double click on the .dmg file to mount the disk image. A drive icon like the one above should now be on your desktop. Double click on the drive in order to see it's contents.

dir

Step 2: Install Mono

Double click on the package file. This will start an installation of Mono onto your hard drive.

installer

You need to follow the instructions and complete the installation. Once you are done, you will be ready to begin writing your first Mac Windows.Forms "Hello World" program.

Step 3: Creating Hello World

In this step, we will be creating a simple Message Box based program to test our installation. Start TextEdit and lets write a simple C# Windows.Forms program.

source

This program won't win any design awards but it will create a Message Box with a text message (the ubiquitous "Hello World"), and an OK button. Make sure that when you save this file that you don't add a .txt extension to the file!

We now need to compile this simple program. We need to start Terminal in order to do this. Launch Terminal and change your directory to the directory that you saved your Hello.cs file to. I saved it to a subdirectory of "Documents" so at the command prompt I typed:

cd ~/Documents/Hello

You'll need to do something similar.

we can now launch the compiler. At the prompt, type:

mcs Hello.cs /r:System.Windows.Forms.dll /out:Hello.exe

If you didn't make any typing errors, you should now have Hello.exe in addition to Hello.cs.

If this was a PC, we would be done now but since we are on the Mac, we need another step. We need to MacPack the exe we created so that it can play nice with the Mac's GUI. Type the following command at the same prompt you used to compile.

macpack -n:Hello -a:Hello.exe -o:. -m:1

This will create a Mac bundle that contains our sample and that can be launched by clicking on it in the finder.

finder

Step 4: Hello World

Clicking on the Hello App in the finder should give you the following message box.

message

Clicking the OK button closes the app.

Addendum: MacPack, an explanation.

Mac Applications exist as App bundles. A bundle is just a subdirectory containing an executable as well as all the resources that the executable may need (such as icons, images, sounds, plugins, help, language packs, etc). In fact, the bundle we created above (called Hello) is really just a subdirectory called Hello.app. These bundles need to conform to a particular structure so that they interact with the finder and dock properly. This is where macpack comes in. It creates the necessary directory tree, a configuration file, and a launch script for your exe.

The finder reads the directory tree and realizes that it is really an App and treats the bundle like a single file. If you want to explore the subdirectory, you need to right click (or Ctrl-click for single button mice) and select "Show Package Contents". This will let you bypass the Mac "magic" and explore the contents of the subdirectory. This magic doesn't exist when you are in terminal. Terminal always shows the App as a directory. The only thing that should be in the root directory of the App is a Contents subdirectory. Inside this subdirectory are the Resources and MacOS subdirectories as well as the Info.plist file.

Info.plist
The Info.plist file configures the bundle. Double clicking on it should launch the property list editor. There are two entries created by default when you use macpack. CFBundleExecutable specifies which executable (located in the MacOS sub) should be run when the app is launched. CFBundleIdentifier specifies what name should be displayed for this app in the finder. There are a few others that you might like to use.

Custom Icons
First, you may want to use a custom icon. Create a new Hello.icns file using Icon Composer (part of the XCode Developer tools, another optional install included with Panther) or just borrow an icon from the Resources section of another App. Copy this file to the Resources section of Hello.app and add a new entry to the Info.plist file where key=CFBundleIconFile and value=Hello.icns (or whatever it is called). You now need to force the finder to update it's picture for this app. Dragging the app to the desktop and then back again should do it. Making a duplicate of the App also works.

"APPL" status
Your app is ALMOST a fully fledged Mac App. One thing it can't do, however is get dragged to the dock. That's because the dock doesn't know that it is an App. There are many bundle types used by Panther and an App is only one of them. Frameworks are also bundles. So are installation packages. Obviously it doesn't make sense to drag a framework (like Mono.Framework, for instance) to the dock so we need to tell the Dock that this bundle is really an App. We do this by adding another entry to the Info.plist file where the key=CFBundlePackageType and the value="APPL" (for application). Once you have done this, your app can now be dragged to the dock like any other Mac App.


Epilogue:

The astute among you will realize that although I stated that X11 was a requirement, we didn't actually do anything with it. It does not even need to be running for S.W.F apps to work. This is because, as it has been explained to me, the only part of X11 that is being used is in the text rendering. This means that it is theoretically possible to remove the X11 requirement in the future. As you can see from the screenshot, the X11 rendering is far from exceptional. Also, I have been told that S.W.F on Mac is still considered to be in the Alpha stage. Therefore, not all S.W.F apps will run flawlessly. My advice is to not rely on full S.W.F functionality until at least beta :)

Oscar

Tuesday, January 18, 2005

When is a sort not a sort...

I've been expanding on the classical image processing in Damocles lately. Specifically the Rank Operator. It works like this, take all the pixel values around a central pixel, sort them by value, then replace the central pixel with the minimum / maximum / median / r th pixel from the sorted list.
Now, it seems obvious that you don't need to actually sort data to find the min or max value from a list. Finding the min/max of a list is an O(n) operation. Any other position in the list, however, is usually calculated by doing a full sort, then returning the rth ranked item from the list. A quick glance at this shows that you are performing an O(n log(n)) or O(n^2) sort in order to generate one number, then throwing away most of the data. Yet, this is how most software performs median filtering. It's really no problem when your neighbourhood is small (3x3 cross or box) but as your kernel size grows, the sort becomes the biggest concern.
So how do we avoid it? By using a slightly different definition of median. Traditional processing says that for any N items, the median value is the N/2th value after sorting. We can also use the following definition: the median value is the value that has the same number of values greater than it as smaller than it. Formally,

Median when either
1) smaller=larger
2) smaller < larger AND smaller+equal>=larger

The second test allows for duplicate values in the list and for even values of N. This is a generalization of the min test or max test. The min test is smaller==0 while the max test is larger==0.
You can generalize this to any position in the list as follows:

(zero based) r'th in list when either
1) smaller=r
2) smaller<r AND smaller+equal>=r

In this manner, we only need to keep track of how many items are smaller and how many are equal.

Also, every time we test a value, we can use the results to fine tune the next iteration. For example, if a value of, say, 63 is determined to be too small to be the median, we can ignore any values in the list <=63. Likewise for values that are too big. By doing this, you progressively narrow the allowable range and avoid testing many values in the list.

Finally, when largest_possible==smallest_possible, you don't have to do any more testing at all, just return smallest_possible.


// ----------------------------------------------
// Function: Rank
// Parameters:
//    list: an array of (type)
//    length: number of entries in list
//    rank: rank within list [0..length-1]
// (floating point data needs code changes)
// ----------------------------------------------
(type) Rank((type) *list,int length,int rank) {
   (type) temp;
   (type) smallest=Smallest(type), largest=Largest(type);
   int i,j,lower,same;
   for (i=0;i<length;i++) {
      temp=list[i];
      if ((temp<=largest) && (temp>=smallest)) {
         for (lower=0,same=0,j=0;j<length;j++) {
            if (list[j]<temp) lower++;
            else if (list[j]==temp) same++;
         }
         if (lower==rank) return temp;
         else if (lower>rank) {
            if (temp<=largest) {
               largest=temp-1;
               if (smallest==largest) return smallest;
            }
         } else {
            if (lower+same>rank) return temp;
            if (temp>=smallest) {
               smallest=temp+1;
               if (smallest==largest) return smallest;
            }
         }
      }
   }
   return (smallest+largest)/2;
}


A 3x3 rank is usually done with a modified bubble sort (for simplicity, code size, and partial sorting capability) that stops after r iterations in the outer loop. This routine compare well with this.
This code performs well on random data usually beating O(n log(n)) (quicksort et al). On identical data, it is O(1). The pathological worst case is when the numbers are sorted into max, min, max-1, min-1, max-2, min-2, etc. In this case, the operation performs O(n^2) or O(n*range(type)) whichever is smaller. This is not a naturally occuring pattern in image data, however. On large lists, the range(type) clamps the log(n) term to log(range(type)) so performing extremely large ranking operations becomes possible with no additional computation (free!)

Oscar

Wednesday, December 08, 2004

Firewire is an onion...

I've spent about a week pouring over Firewire documents. There are a few. I've summarized the important specs and tech data below.
The first document is commonly known as CSR although it is formally known by ANSI/IEEE 1212-2001 or by ISO/IEC 13213:1994. CSR stands for Control and Status Register and describes a method for devices to communicate via a shared memory address (64 bits). The basic operations are read/write a quadlet (a 32 bit value), read/write an octlet (a 64-bit value), read/write a block (arbitrary size) and lock. The commands are completed asynchronously (i.e. non-blocking) but a software abstraction layer can create the illusion of synchronous calls. CSR describes how a bus can work but is not tied to any particular implementation.
The next document is known as IEEE 1394-1995 or 1394a-2000 or 1394b-2002 High Speed Serial Bus. It is an implementation of CSR. It describes a hot-pluggable bus that can provide power as well as data transmission, typically at 400Mbps (the most common) but can go faster (3200Mbps) or slower (100Mbps) with the newer "b" spec. The bus creates a tree topology with a "root" having possibly multiple children. This tree can contain up to 63 nodes but due to bandwidth reasons there is usually far fewer (eg 1 or 2) devices. Unlike the popular USB2.0 spec, this bus is equally at home connecting computer to device, device to device or computer to computer. This allows it to be equally at home connecting consumer A/V equipment or connecting two (or more) computers. It does have a weakness, however. It's enumeration mechanism does not handle circular connections well. In other words, there can only be one path along the tree from one device to any other.
Another key capability of this bus is isochronous communications. This allows you to pre-allocate bandwidth so that time based data (audio/video) will be guaranteed to have the bandwidth available. Asynchronous communications have to use the remaining unallocated bandwith. There is a minimum amount of bandwidth reserved for asynchronous so that other devices are not starved out (typically 20%). This means that the maximum isochronous bandwidth a S400 device can allocate is about 320Mbps.
Due to it's "hot pluggable" nature, every time a device is plugged in or unplugged, the bus is reset and re-enumerated. The devices that were transmitting isochronously get a chance to reallocate their bandwidth. This can cause a "blip" in the data stream. Software can be notified of a bus reset and take appropriate action. Since the node address can change with a reset, each device has a unique ID available known as an EUI-64. This ID is formed from a 24 bit Vendor ID (assigned by a central authority) as well as a 40 bit serial number assigned by the vendor. Two otherwise identical devices will have distinct EUI-64's.
Every Firewire device must implement a minimum amount of CSR registers. This allows the enumeration mechanism to work as well as determine what protocols the device understands. It also determines the bus capabilites of the device. A device can be a cycle master (it can generate the bus start of cycle packets), an isochronous manager (isoch. bandwidth and channel allocation mechanism), or bus manager (supply and manage power, bus optimization and root determination). The bus can operate without a bus manager but both devices must supply their own power. A device does not need to be any of the above.
The CSR's form directories with offsets pointing to where to find other directories. This allows for a generic tree configuration. The Bus Info Block is first and is always located at offset 0x400. Immediately following is the Root Directory that describes the device and contains offsets to unit directories. A device can support multiple protocols and multiple versions of the same protocol. Each unit dependant directory lists a protocol by Protocol ID and spec version # and the offset where to find the CSR's that control the device using that protocol and version #.
The next spec to read up on is the IIDC 1394-based Digital Camera specification version 1.30, 1.20 or 1.04 (also known as DCAM). It is a defined by the 1394 Trade Association Instrumentation and Industrial Control Working Group, Digital Camera Sub Working group (II-WG DC-SWG). It's purpose is to define a command set for digital cameras that use the firewire bus. This is a completely separate group from the DV group which comprise a whole laundry list of specs. More importantly, it defines uncompressed digital camera operations. DV cameras use a lossy MPEG2 video compression variant to transmit data.
The spec provides several inquiry registers that allow you to determine what data formats, frame rates, image sizes, etc. are available. This allows software to choose a configuration that it can deal with and ignore the others if it wants. There are 5 standard formats each with standard image sizes and frame rates available for compatibility purposes as well as a fixed image format (for still cameras) and a flexible format where you can configure your camera's options known as Format 7. This is by far the most flexible format and allows you to change every aspect of your image data.
The spec also defines standard image controls (Brightness, Auto-expose, Sharpness, White Balance, Hue, Saturation, Gamma, Shutter, Gain, Iris, Focus, Temperature, Trigger, Zoom, Pan, Tilt, Optical Filter) as well as the level of support for each of these capabilities (not present, read only, read-write, auto-mode, one-push mode, disable mode, valid range). Format 7 also allows you to specify the Region of Interest within the image as well as the valid sizes available. Finally, you can choose your data format (YUV4:2:2, YUV4:1:1, RGB8, Y8, Y16, etc). Additional, camera specific, controls can be added in an advanced CSR controls directory but additional info is required from the manufacturer in order to identify and translate these CSR's into camera features. The most important of these is binning options. Finally, once you have set up your camera, you can tell it to acquire one image (one-shot) a series of images (multi-shot) or a stream of images (stream). Of course, if your camera is a still camera, isoch stream will not be available.
In general, the specs work together as a protocol stack to cover and define a very wide range of objects. There is a spec for Hard Disks, printers, keyboard, mice, generic masss storage (media keys), CD/DVD drives known as SBP-2 and more recently SBP-3. As mentioned before, consumer DV video falls under many specs such as AV/C and HAVi. RFC 2734 defines how to layer the IP network protocol on the 1394 bus. Peruse these at your leisure the next time you get insomnia. Oh, and by the way, the official PDF of the specs are all sold online for about $100 a pop. Happy reading.

Friday, November 26, 2004

29.97 fps? what's up with that...

I've been recently delving into the IEEE-1394 specs (also known as Firewire or i.Link). It is a general purpose bus that, unlike USB, is not PC specific. The various specs work together in stack fashion similar to the ISO network stack with each layer building upon the layer below. The bus is capable of asynchronous communication but it is it's isochronous capabilities that make it suitable for carrying time-based data such as video and audio.
The Firewire 400 spec (1394a) uses a native 125 ns clock. When video transfers over firewire isochronously, the necessary bandwidth is reserved first. Once this is done, you just start streaming and wait for the video to pour in. Each frame of video is divided into bus frames. When the first frame's reserved bandwidth is filled, the remaining data is sent in subsequent bus frames. Well, it turns out PAL video (25fps) transfers just fine since a new video frame starts every (8000/25) 320th bus frame. However, NTSC video (30fps) has a problem. 8000/30 is approximately 266.6666 repeating. This means that a minimum of 267 frames are necessary (the last frame is padded out). This means that a 1/30th of a second is really 267/8000 of a second which gives you a frame rate of 29.962546816479 fps.
Audio is different. Audio has no inherent audio frame the way that video has a video frame. You can sample audio at whatever frequency you want and with as much precision you want from as many independent sources you want. You can divide it up into packets of whatever is convenient for transport without any fuss at all.
So what happens when you try and sync separate audio and video sources? Well, if the video is timed at 29.97 fps but the software assumes it is 30fps then after an hour, the lip movements will be almost 135 video frames out of sync.
It is important to realize that you can avoid this problem completely if you change your camera so that video frame boundaries do NOT coincide with bus boundaries. Then, your video data is just a stream of bytes that will always fit neatly into bus frames, albeit starting generally somewhere inside a bus frame instead of at the beginning. You need to add information to your video data to recognize a frame start. A little bit of overhead and complexity solves this problem.
So there you go. Crazy frame rates explained.

Thursday, November 18, 2004

How sharp is sharp?...

Ok, I just came across an article over at cross-platform.net that talks about doing precisely what I am doing with Damocles, namely, writing cross-platform native code that has a managed cross-platform binding in C#. The article mentions that my design pattern has a very subtle bug that can occur due to the fact that the C# garbage collector might kick in AFTER marshalling the args to an interop call but before the call actually finishes, causing the (admittedly highly unlikely) problem where the finalizer gets called and frees the unmanaged object too soon.
Now, this can't occur if you always dispose of your objects properly. It is slightly more likely to occur in a multithreaded scenario since the GC can run simultaneously with unmanaged code. Regardless, the solution is easy, and so I've decided to add it to the DamoclesSharp project.
It boils down to this: Instead of storing a private IntPtr, you instead store a HandleRef which contains an IntPtr. It also contains a reference to the managed object forever tying the managed and unmanaged sides together. This way, the managed object is held until the HandleRef is disposed of, thereby guaranteeing the lifetime of the original object.
Is it a bug? Is it just a theoretical possibility that would never happen in real life? It sure hasn't happened yet, but I haven't run any multithreaded tests either. Also, even though the possibility is very small, the consequences are huge. Processing memory that has been freed can result in a segfault or even cause the Blue Screen of Death if processing a buffer owned by the kernel, (say a video buffer that was wrapped).

So the code now looks like this:


public class Image : IDisposable {

private HandleRef native=NULL;

public Image() { ... }
public Image(IntPtr NativePtr) { native=new HandleRef(this,NativePtr); ... }
~Image() {....}

#region IDisposable implementation
...
#endregion

#region Interop calls
[DllImport(Damocles.DllName)] private static extern IntPtr AllocateImage(...)
[DllImport(Damocles.DllName)] private static extern void AddImageReference(...)
[DllImport(Damocles.DllName)] private static extern void FreeImage(...)
... other Interop functions here ...
#endregion

}


So this way, we avoid getting cut with DamoclesSharp (I'm sorry, I couldn't resist the pun!)