I have a new source of time-suckage, and its name is the Kinect.
Back in December, someone hacked Microsoft’s Kinect.. allowing computers to interface and receive raw data from the full-body motion/gesture sensing device. A week or two ago, in early January, another group created an AS3 solution. The solution consists of a command-line socket server (which acts as a bridge between the Kinect and the client Flash application) and an AS3 codebase containing a lot of the basic Kinect control/data-processing routines.
I debated for a short period of time about picking up a Kinect and joining the hackathon. Now, it should be said that a) I don’t have an Xbox and b) I know very little when it comes to actual hardware-level hacking, but neither of those two stopped me as I proceeded to purchase a Kinect, bring it home, and park it on my desk next to my MacBook Pro.
I’ve now set up as3-server (the command-line Kinect connector) and have been playing with OpenKinect and the OpenKinect AS3 API. I’m still trying to figure out what I can and should do with this beast, but nevertheless I’m super-giddy about it.
Working on creating an x/y/z visual representation of the data coming from the Kinect. Basically taking the data stream from the RGB camera, but adjusting the z-index of the individual pixels, placing them in the appropriate z-space. (thus creating a realtime 3d model of the scene). (inspired by this clip… I think the guy here is using something other than Flash for his visualization). Running into difficulties. In order to control z-index of parts of the image, I’m creating an array (actually vector) of sprites… one for each pixel in the video frame. Currently unable to handle more than 1/10 of the video frame without latency becoming a problem OR Flash just choking in general. Going to have to optimize my methods.
Moved instantiation of the sprite vector to coincide with instantiation of the general app. Created another vector to contain depth frame information (updated as information is received from the depth camera). So, on frame updates, all that Flash has to worry about doing is setPixel on the bitmap data for each sprite (to show the proper image), and adjusting the z property for that sprite, depending on what’s being returned by the depth camera. With these optimizations, I’ve been able to increase frame sampling to 1/4 of the full frame. Latency issues still loom, and it looks like there’s a mismatch between video and depth data. I think that this is due to a difference in the resolutions of the depth and rgb cameras. Investigating that now. Anyone know offhand what the resolution for each is? I’ve recorded a sample screen video.
3d Kinect output in Flash – trial 1/15/2011 11:30
Worked around performance issues for now… Increased the size of the sprites from one pixel to five (squared). I’m copying a 5×5 pixel block to each sprite & adjusting the depth according to a single pixel (from the depth camera stream) within that block. This has allowed me to run a full-frame sample with very little lag. (before I was running a partial – 1/5 frame sample – and it was very laggy). Test video posted here.
Just mostly been working on bringing the depth & rgb cameras into sync. Not sure if it’s a difference in the cameras or something odd in my bitmap sampling/drawing methods. Been trying to track that down. Also figured that I should make the individual sprites’ rotationY the inverse of the container rotationY in order to display it properly. Another test video here.
Before getting out of bed this morning, I laid there, thinking that maybe I was going about things the wrong way. That maybe I should steer clear of Flash’s native 3D, and use one of the multiple 3D Flash engines that have popped up over the years.
After getting up, feeding the animals, and brewing some coffee, I set about porting what I’d done on Saturday over to using the Away3D Flash engine. Results of initial tests led me to be optimistic. After a well-needed three-hour midday break from dev, I settled back into my seat and recorded this test video. The bitmap mesh was cut to an array of something near 100×100 vertices. So, while it’s nowhere near super-fine it’s enough to hopefully show depth differences.
As viewable in the video, I’m now running into troubles with retrieving multiple levels of depths. I think this is mostly because I’ve been basing my depths on the color values of a bitmap generated from the depth camera stream. I really want to process the depth camera stream’s byte array directly, but I can’t seem to track down any notes when it comes to the format of the depth camera stream’s byte array. Is it unsigned? is it bytes or ints or floats or shorts?
More to follow.