But I would be remiss if I did not provide something to show for the last five months. So here is a picture of our cats, Fitzcarraldo (left) and Alan "Bonesaw" Turing (right). Alan was born June 2014, so I guess he happened too.
An offering of cats to please the internet gods.
But to the topic at hand: I've been interested in cameras and image processing for many, many years. I've written all kinds of code to filter, sort, encrypt, convolve, and de-convolve images, and I've built gadgets that were made to be photographed. But one thing I hadn't done until this project was build a camera.
What's the easiest way to build a camera? My guess would be to build a simple pinhole camera with real film. All you would need is something like a shoebox and some film. Assuming that you could figure out how to handle the film in a dark enough room so it didn't get exposed to light before you wanted it to, and that you had a place you could take the film to get processed, this might be the simplest way of building a camera.
But I'm picky, and wanted a digital color camera. Most modern digital cameras work on the same principle: replacing the sheet of film from our simple pinhole with a dense array of light sensors, and replacing the pinhole with a glass lens (or a set of glass lenses). The lenses are used to gather more light than the pinhole could allow and focus the light on to the sensor array. The array contains millions of sensors (pixels), each with either a red, green, or blue filter in front. You point the camera at the object you wish to take a picture of, focus the lenses, and tell the pixel array to gather light for a little while. When you read out the values each pixel has recorded, and use a bit of software to arrange the colors and pixels on a screen, you see the image you want. Simple enough!
But let's say I didn't want to cough up the money for a pixel array or even lenses. In fact, all I could afford was a single color light sensor, some basic electronic components, and whatever I could print on a 3D printer. How do we make the MacGyver of digital cameras? First we need a hand-wavy theory of how to build an image with only one sensor.
Hand-Wavy Theory
Imagine you are standing outside in a field on a clear day with the sun shining in front of you. Why are you in this field? I don't know, maybe you are having a picnic. When you stare forward, your eyes help your brain form an image of the scene in front of you. But your eye has a lens and a dense array of sensors (your retina), and we don't want that luxury. So you close your eyes. Your entire retina is reduced to picking up just the light that hits your eyelids, giving you just one measure of the scene in front of you: the total integrated light hitting your face. In this single-pixel scenario, you measure the light from every object in the scene all at once, but you don't know where each object is or how much of the total light came from it. But you can start to build a mental image of where things sit by using your hand. Still standing in a field with your eyes closed, you hold your hand out to block light and wave your arm around (all good hand-wavy theories involve actual hand-waving). As your hand blocks light from different angles, the total light hitting your face is reduced somewhat. By knowing where you've placed your hand in front of you (assuming decent spatial awareness), you can get an estimate of how much light was coming from that direction. For example, you can figure out what direction the sun is by waving your arm around until you notice most of the light has disappeared. Then you know that the sun is in the direction connecting your face to your hand, and extending off into the sky.
Using this principle, we can build a digital camera with a single pixel as long as we can also build some kind of 'arm' that can be moved around in front of the sensor. The sensor continuously records the amount of light hitting it while the arm moves around in front. A computer reads the sensor measurement, considered the arm placement, and builds up an image in memory. The next question is: how well can this method actually produce an image?
Testing the Theory
A good first step to any potentially crazy idea is to simulate it on a computer. If it doesn't work there, either your simulation is crap or the idea is bad. If it does work, either your simulation is crap or the idea might work in the real world. While these options don't make the simulation seem very worthwhile, they are often much simpler than trying things in real life, and can save a lot of money and tears.
Below, I've provided a couple little programs that demonstrate the theory of producing an image by blocking light in a systematic way. The first simulates a light-blocking arm sweeping across the field of view of a sensor, along with a plot of the total light collected as a function of arm position. The panel near the top right shows the color corresponding to the given combination of red, green, and blue at any given time.
This gives us an idea of the data we can record by moving a blocking arm back and forth across the field of view. Next, we not only vary the position of the blocking arm, but also the angle at which it moves. By sweeping the arm across at many different angles, we can build up a two-dimensional plot of the total integrated light hitting the sensor as a function of position and angle. This is shown in the next program as a color map.
The resulting image is fascinating, but doesn't look like the original scene at all. At each arm angle (plotted along the x-axis), the same object in the scene will be 'recorded' at a different arm position (plotted along the y-axis). This causes each object to become a stretched out and curved object in the recorded image. Luckily, there is some clever math that can get us from the stretched out recorded image to a nice reconstruction of the original scene.
Radon Transforms
When I first looked at one of the simulated recordings from this simulation, I was ecstatic. I immediately recognized it as the Radon Transform of the original scene. Admittedly, this is not a normal thing to immediately recognize. A significant portion of my PhD work is spent thinking about similar transforms, so I have things like this on my mind.
Why care about transforms like this? An integral transform is essentially a method for taking a signal and mushing it up in a particular way that makes the result more understandable. The most common example is the Fourier Transform, which is useful for breaking down a signal into waves. An example of when a Fourier Transform might be useful is when analyzing an audio source. This transform will separate the audio into its component frequencies and tell you how much 'power' is in each separately. The Radon Transform is a little more obscure, but one example of its usefulness is detecting hard edges in an image. An important feature of many integral transforms is that they can be inverted to give you back the original signal. The inverse of the Radon Transform (also called a filtered back projection) is most commonly used in CT scans, where a sensor array measures the projected image of someone's insides and the final image is reconstructed from many projections.
Our simulated sensor-and-arm camera has given us the negative Radon Transform of the scene it tried to image. All we need to do is perform the inverse of the transform to the recorded image and we should get an image of the original scene. One issue with this procedure is resolution. When making the camera, we need to pick the width of the blocking-arm, how finely we can move it across the scene, and at how many angles we choose to do this at. All three of these choices determine the resolution of both the recorded transform and the final reconstructed image. After a bit of playing around with some test images, I settled on a resolution that would keep the total scan time for the camera reasonable.
With a solid theory in place with fancy enough math behind it to look impressive, I could now continue on and build the single pixel Radon Transform camera in real life.
Constructing the Camera
The main components I used to build this camera were:
- Arduino Pro Mini
- Color Light Sensor
- MicroSD Card Board
- 2 Servos
- Battery Pack
The SD card board and battery pack added a bit to the cost, but were added to make the camera portable, with the hope I could take the camera outdoors and take some pictures of mountains and fields and things. They were not necessary for indoor, tethered use of the camera.
The first big hurdle was designing the parts to be printed. I'm not an engineer, I'm a scientist. I don't even enjoy saying that physicists are good at doing an engineer's job. Making things that don't immediately self-destruct is non-trivial. This was probably the most complicated thing I've had to design, and the number of revisions I had to go through shows that. About half-way through, my 3D printer died an ungraceful death (after a year of use and a few pounds of filament), so I upgraded to a better printer and enjoyed a marked increase in print quality by the end.
Printed in 8 parts, in case you were wondering.
Left: old dead printer. Right: new not-dead printer.
Various attempts at being an engineer.
Glam shot with all electronics magically completed.
MicroSD card reader on the side.
Attached to my tripod, ready for testing.
The circuit to control the camera was as simple as I could make it. I soldered the Arduino board to a protoboard and added connections for the servos, sensor, battery pack, and SD card board. Once I confirmed that everything could be controlled by the Arduino, I moved on to the code.
The main problem I encountered while building the camera was the fact that the servos would not rotate the full 180 degrees I expected to see. In fact, using the default Servo library for Arduino and the standard servo.write(); command, I only saw about 100 degrees of rotation. As it turns out, different servos have different standards for how to make them turn to various positions. The Servo library assumes that a pulse-width of 1000 us corresponds to 0 degrees and a pulse-width of 2000 us corresponds to 180 degrees. In the servos I bought, these pulse-widths corresponded to about 70 degrees and 170 degrees, respectively. By manually sending a pulse-width of 2100 us, I could get the servos to rotate close enough to 180 degrees, but getting the low-angle end was trickier. At a pulse-width of 450 us, I was getting down to around 30 degrees, but any lower than that caused the servo to swing to zero and bounce around. My guess is that the internal servo electronics aren't capable of handling pulse-widths shorter than about 450 us.
So in the end, I could only get around 150 degrees of rotation out of the servos. This wasn't too much of an issue for the servo that moved the blocking arm in front of the sensor, since the sensor probably couldn't tell if light was hitting it from those extreme angles anyway. But the limited range was a significant problem for the other servo that rotated the whole arm contraption around the axis of the sensor. This limitation is like chopping off the right end of the simulated Radon Transform images in the above animations. Without information from some band of angles, the image reconstruction is unconstrained and will show significant banding. I thought about this problem for a few days before running out of ideas and searching the internet for a solution. The good news is that this problem is not uncommon and there are research groups around the world thinking about it. The bad is that there isn't some magical way to infer the information lost at those angles, only methods to mitigate the artifacts introduced in the inverse transform. The best solution is to record as many angles as possible and be aware of the limitations of what you've recorded.
Programming the Processor
I decided to save the inverse Radon Transform computation for post-processing on my workstation, so that all the camera had to do was record the sensor values and store them in a sensible way. This led to a fairly simple code flow for the on-board controller:
I've excluded some bits of code that do things like set the servo position and write data out, mostly to simplify the code above. These extra bits aren't too important and can be done in a variety of ways.
Here is the flow of how the camera works: once the camera initializes, it determines the correct sensor gain and begins scanning the scene. Each measurement is saved to a file on the SD card. I collect this data and move it to a file on my laptop for post-processing. Before the raw data can be transformed from Radon-space into real-space, it needs to be adjusted.
One issue with the camera is the long exposure time. During the ten or so minutes it takes to record a single image, the lighting in the scene can change dramatically. The biggest cause of lighting changes I ran into was clouds. This kind of change in lighting would cause the recorded data to vary in brightness over time, which would imprint into the final image as a variation with position. To ameliorate this, I computed a weighted mean of the total brightness at each arm angle and used it to compute the negative image. This way, the light could change slowly over time and the moving average would account for it.
Results
The first test scene was of a lamp and a red flashlight. This provided a few simple diagnostics, like checking to see if a bright point source would get smeared at all in the final reconstruction.
The first panel shows the raw values from the camera mapped as a function of arm angle and position and scaled between 0 and 255 in each color channel. The second panel shows the negative scan, where I've gone from measuring the total light at every arm position to measuring the light blocked at every arm position. The third panel shows the inverse transformed scan, which is ideally the reconstruction of the actual scene. The final panel is a comparison image taken with a normal camera.
The raw scan and negative show exactly what we would expect from the Radon transform. The two main objects show up as smooth curves as a function of arm angle. The resulting image isn't quite as clear. You can tell that it picked up on the bright light and a bit of the red light, but there is a huge amount of aberration present, particularly around the bright light. This butterfly-looking pattern shows up when the full 180 degrees of arm rotation aren't recorded. The angled lines radiating from the light show you the limits of angle the arm could achieve. With this limitation in mind, I moved on to recording more complex scenes.
Close enough?
Next up was a few objects that were sitting around on my desk. I wanted to know how well the camera could record different colors, so I placed some yellow sticky notes and red wire next to the blue painters tape I had on my printer. The raw scan and negative look pretty sensible, and the resulting image actually shows some resemblance to the real scene. You can pick out the blue region, the bright yellow spot on the right, the red blur on the bottom left, and even the red printer nozzle above the blue region. The whole image looks like it was taken underwater or something, but then again, it's an image made by a single-pixel camera. I'm surprised it works at all.
View from my balcony.
Next up, I took the camera out to my balcony and took a picture of my morning view. No simple point sources to make things easy, but the result looks pretty good. The raw scan and negative look nice and smooth, and the result has most of the main features present in the real scene. You can see the horizon, the tree right in the center, and even a little of the parking lot in the bottom right. Of course, I spent time and money making the camera portable, so next up I needed to go on a little photo trip.
Living in Colorado has its benefits.
One of the nice things about living in Boulder, CO is that you are never far away from beautiful scenery. I took the camera up the Front Range and let it record a view of the Rocky Mountains in the distance. The raw scan and negative look pretty good, but have a lot of sharp jumps as a function of arm angle. I'm really not sure where these came from, but I suspect either my battery pack was dying or my servos were giving out. Even with these problems, the resulting image looks great.
So there you have it, a single pixel camera that takes color images. It may not produce the highest quality photographs, but it's certainly enough of a proof-of-concept to be worth a post. Given some more time and money, a better version of this camera could be made to take better pictures. But I'll leave that as an exercise for the reader.
This project took much longer for me to complete than it would have a year ago. As I inch closer to finishing my PhD work, I can afford less and less time to projects outside of work. I think it's time to say that I won't be updating this blog any more, or at least until I'm done with school. Of course, I say this, but who knows what kind of small projects I'll be able to do in a day later on. Fortunately, I think this blog has had a good run, from Daft Punk helmets to LED Planets, from Neural Networks to Robots that Slap People. While this may just look like a collection of eccentric and technical projects, it has really helped me figure out what I might be good at (outside of science) and what I enjoy doing. I know my documentation here has helped a few people with their own projects, and I hope they continue to do so in the future. Thanks for reading!