c© 2013 Brian Harding IONOSPHERIC IMAGING WITH COMPRESSED SENSING BY BRIAN HARDING THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Urbana-Champaign, 2013 Urbana, Illinois Adviser: Associate Professor Jonathan J. Makela ABSTRACT Compressed sensing is a novel theory of sampling and reconstruction that has emerged in the past several years. It seeks to leverage the inherent sparsity of natural images to reduce the number of necessary measurements to a sub-Nyquist level. We discuss how ideas from compressed sensing can benefit ionospheric imaging in two ways. Compressed sensing suggests signal reconstruction techniques that take advantage of sparsity, offering us new ways of inter- preting data, especially for undersampled problems. One example is radar imaging. We explain how compressed sensing can be used for radar imaging and show results that suggest improved performance over existing techniques. In addition to benefitting the way we use data, compressed sensing can improve how we gather data, allowing us to shift complexity from sensing to reconstruction. One example is airglow imaging, wherein we propose re- placing CCD-based imagers with single-pixel, compressive imagers. This will reduce the cost of airglow imagers and allow access to spatial information at infrared wavelengths. We show preliminary simulation results suggesting this technique may be feasible for airglow imaging. ii ACKNOWLEDGMENTS First and foremost, I would like to thank my adviser, Jonathan Makela, for his unspoken support and patience in allowing me to pursue my interests and for providing opportunities to cultivate these interests. I have truly enjoyed our work together and look forward to the coming years. I would also like to thank the staff at the Jicamarca Radio Observatory, where many of the ideas in this thesis were first generated. Thanks go specifically to Marco Milla and Koki Chau. Back at the University of Illinois, I must thank Nancy Morris for administrative support. To my fellow graduate students, thanks for all your help, academic and otherwise: Henry, Dan, Tim, Tom, Tony, Levent, and Kirk, to name a few. Of course, none of this would have been possible without the love and support from my family. Finally, thanks to Eileen for her love, encouragement, and understanding. This material is based upon work supported by the National Science Foun- dation Graduate Research Fellowship under Grant No. DGE-1144245. Any opinion, findings, and conclusions or recommendations expressed in this ma- terial are those of the author and do not necessarily reflect the views of the National Science Foundation. iii TABLE OF CONTENTS CHAPTER 1 MOTIVATION . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 GENERAL SPATIAL ELECTROMAGNETIC IMAG- ING WITH COMPRESSED SENSING . . . . . . . . . . . . . . . . 3 2.1 Spatial Electromagnetic Imaging . . . . . . . . . . . . . . . . . 3 2.2 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . 16 CHAPTER 3 RADAR IMAGING WITH COMPRESSED SENSING 26 3.1 Radar Imaging Background . . . . . . . . . . . . . . . . . . . 26 3.2 Compressed Sensing Results . . . . . . . . . . . . . . . . . . . 36 CHAPTER 4 OPTICAL AIRGLOW IMAGING WITH COMPRESSED SENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1 Airglow Imaging Background . . . . . . . . . . . . . . . . . . 47 4.2 Compressive Single-Pixel Airglow Imager . . . . . . . . . . . . 50 4.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 CHAPTER 5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . 64 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 iv CHAPTER 1 MOTIVATION We have known since 1938 of the development of ionospheric irregularities and their effect on radio wave propagation [1]. The most spectacular of these irregularities are massive turbulent upwellings of plasma in the equa- torial ionosphere that can reach heights of 1000 km or more, a phenomenon commonly known as “equatorial spread-F ” [2, 3]. A more phenomenologi- cally appropriate name is “convective ionospheric storm,” but we will use the former to match previous literature [4]. In the intervening 75 years, we have gained an understanding of the seasonal variability in the occurrence of equatorial spread-F, but an explanation of its day-to-day variability remains elusive. Understanding the conditions that lead to its growth is a necessary first step to predicting the resulting communication outages, an overarching goal of the National Space Weather Program [5]. Equatorial spread-F comprises irregularities in electron density on spa- tial scales ranging from 100s of kilometers to fractions of a meter. Imaging the spatial and temporal dynamics on all scales is important in developing a holistic understanding of the physics of irregularity development and growth. The two most popular ionospheric imaging techniques are radar and optical airglow imaging. A backscatter radar imager is sensitive to small-scale ir- regularities on the order of half the radar wavelength because the dominant type of scattering is Bragg scatter. An airglow imager observes large, km- scale morphology of structures associated with equatorial spread-F, due to the fact that photon emission in the airglow layer is proportional to electron density, and the combination of nonzero pixel size, integration times, and lifetime of the emission results in a blurring of small-scale features. Both modalities are necessary for effectively monitoring irregularities; however, both have challenges. With radar imaging, we must overcome the underde- termined nature of the problem. With airglow imaging, typical CCD-based imagers are too expensive for global coverage and are only cost-effective at 1 visible wavelengths. The emerging theory of compressed sensing can address both concerns. In this thesis, we show how compressed sensing can be used to improve imaging of equatorial spread-F, though the ideas presented here are easily generalized to other phenomena. In Chapter 2, we describe the problem of spatial elec- tromagnetic imaging, of which radar and optical imaging are examples. We introduce the theory of compressed sensing and show how it can be used to solve such problems. In Chapter 3, we apply these ideas to radar imaging, and show results suggesting that compressed sensing can outperform exist- ing methods on simulated and actual data. We use compressed sensing for airglow imaging in Chapter 4, which details the design of the single-pixel airglow imager and describes initial simulation results. We then conclude in Chapter 5. 2 CHAPTER 2 GENERAL SPATIAL ELECTROMAGNETIC IMAGING WITH COMPRESSED SENSING Imaging is the process by a which a signal of space is estimated from mea- surements. This process comprises both the data gathering and the data processing. While these areas are often treated as the separate fields of in- strumentation and image processing, a recent trend is to consider a more holistic approach. The idea is that deep knowledge of the sensor can inform better reconstruction algorithms, and conversely, a thorough understanding of the possibilities offered by signal processing can improve instrument de- sign. In this vein, this chapter reviews both the hardware and software of imaging. The first section develops a general theory of electromagnetic imag- ing, oriented to the applications of optical and radio imaging. The second section introduces compressed sensing, an emerging signal processing theory that can be applied to solve electromagnetic imaging problems. 2.1 Spatial Electromagnetic Imaging Students of electromagnetism learn as early as high school physics that light and radio are manifestations of the same underlying phenomenon, electro- magnetic waves. However, in subsequent coursework the two fields diverge rapidly, developing independent language, notation, and conventions. This is likely due to the fact that the technologies in the two fields are substantially different; optical sensors can only measure amplitude, not phase, while radio sensors can measure both. This section is an attempt to bridge this gap in the context of imaging, providing a generalized framework for electromag- netic imaging that applies to both optics and radio. 3 2.1.1 Fundamentals A traveling electromagnetic plane wave is characterized by an electric field of the following form: E(r, t) = E0 cos(ωt− k · r + ψ) (2.1) This describes a wave as a function of location in space, r, and time, t, with an initial amplitude and direction given by E0, phase offset, ψ, oscillating with a radial frequency, ω, and traveling in the direction k. To avoid having to deal with messy trigonometric identities, we leverage Euler’s formula by performing all subsequent calculations with phasor notation. This phasor, E˜(r), is complex-valued, and the true time-domain signal can be recovered through: E(r, t) = < { E˜(r)ejωt } (2.2) where < denotes the real part. All additions and multiplications in the phasor domain are thus the identical operations in the time domain. However, we will have no need to revert to the time domain, so we will drop the tilde and simply write E(r). Our traveling wave phasor takes the form: E(r) = Ae−jk·r Eˆ0 (2.3) where A = ‖E0‖ejψ is a complex-valued amplitude and Eˆ0 is a unit vector in the direction of E0. For our purposes we need only consider transverse elec- tromagnetic waves, for which Eˆ0 ⊥ k. Since we always know the relationship of Eˆ0 to k, we can ignore the direction of E(r), simply writing: E(r) = Ae−jk·r (2.4) This simplified notation will allow us to have general results that apply to linearly, circular, and elliptically polarized waves without getting lost in no- tation. 2.1.2 Single Plane Wave Propagation Consider the two-dimensional case of wave propagation in the xz-plane, as shown in Figure 2.1. In this work we will ignore the third dimension because 4 the generalization is trivial and the added complication would dilute the important concepts. Figure 2.1: An electromagnetic wave impinging upon the z = 0 plane. This wave is arriving at an angle φ with the z axis, so the wavevector is k = [ kx kz ] = [ −k sinφ −k cosφ ] (2.5) where k = |k| = 2pi λ and λ is the wavelength. We can now write the phasor as: E(x, z) = Aejk(x sinφ+z cosφ) (2.6) If we were to measure the electric field somewhere on the z = 0 plane, we would measure: E(x, 0) = Aejkx sinφ (2.7) 2.1.3 Plane Wave Continuum We can think of E(x, 0) as a spatial signal, a function of x. Imagine now that we have a multitude of plane waves arriving from a range of angles between −φ0 and φ0. The plane wave from each angle has its own complex amplitude, a function we denote A(φ). The signal on the plane z = 0 is now the integration of all of these plane waves. This new signal is: q(x) = ∫ φ0 −φ0 A(φ) ejkx sinφ dφ (2.8) In many applications, we are interested in determining the angular dis- tribution of incoming waves, A(φ), by making measurements of the spatial signal on a plane, q(x). Equation (2.8) shows that these two functions have 5 a Fourier-transform-like relationship. This relationship is made more clear if one makes the small angle approximation: max | sinφ| = sinφ0 << 1 ⇒ sinφ ≈ φ (2.9) which yields an approximation for (2.8): q(x) ≈ ∫ φ0 −φ0 A(φ) ejkxφ dφ (2.10) This is, precisely speaking, an inverse Fourier transform. If we take the inverse Fourier transform of A and evaluate it at kx, we get the function q(x). Conversely, if we define a new function by rescaling q: p(kx) = q(x) (2.11) then we see that the functions p and A are Fourier transform pairs, which we denote with the following: p←→ A (2.12) 2.1.4 Practical Issues This is promising, since it gives us hope that by measuring p we can infer A. However, there are two serious obstacles that we must first overcome. First, to uniquely determine the function A(φ) we would need to know p(x) for all x. This would require an uncountably infinite number of samples, but all we have in practice are finitely many measurements. Fortunately, we know that A(φ) is support-limited to within ±φ0. In the language of signal processing, we can say that p(kx) is bandlimited. The Shannon-Nyquist sampling theorem states that we can exactly reconstruct a bandlimited signal from its discrete samples, provided that the sampling rate is greater than twice the signal’s bandwidth. Therefore, we can uniquely determine A(φ) as long as the sensors measuring p(kx) are spaced by a distance less than the 6 Nyquist requirement: fsampling > 2B = 2 ( φ0 2pi ) 1 ∆(kx) > φ0 pi ∆ (x λ ) < 1 2φ0 (2.13) This Nyquist requirement relates the maximum sampling interval (measured in wavelengths) to the angular width of the signal. For example, consider a 50-MHz imaging radar with a transmitter beam width of 10◦. The required antenna spacing is then about 17 meters. In the optical case (λ = 400 nm), the required spacing is 1 µm. The second obstacle is that the Shannon-Nyquist theorem assumes we have a countably infinite number of samples spanning the entire x-axis. In practice, we only have finitely many sensors and thus can only sample out to a maximum distance x0. One attractive solution to this problem is to continue using the Fourier-transform-based processing scheme, substituting zeros for those samples we are unable to obtain. Effectively, we are windowing p(kx) with a rectangle function: r(kx) = 1 if |kx| ≤ kx00 if |kx| > kx0 (2.14) If we then attempt to recover A(φ) with a Fourier transform, we get a blurred version, as shown below: p · r ←→ A ∗F{r} = A ∗R (2.15) whereF denotes a Fourier transform, and we have used the Fourier transform property that multiplication in one domain is convolution in the other. The Fourier transform of a rectangle function is a sinc function, so after the correct scaling we can find the blurring function R: R(φ) = 2kx0 sinc(kx0φ) (2.16) 7 where sinc(x) ≡ sinx x (2.17) Roughly speaking, the width of the main lobe of the sinc function is 2 √ 3. This is found from the approximate value of x for which sinc(x) = 1 2 . There- fore, the width of the main lobe of R(φ) is: ∆φ ≈ 2 √ 3 kx0 = 2 √ 3 2pi ( x0 λ ) ≈ 1 2 ( x0 λ ) (2.18) Features of A(φ) on scales less than ∆φ will be smeared out by the convolu- tion with R(φ). The farther out we sample (higher x0), the narrower R(φ) will be, and the more detail we can resolve in A(φ). The preceding discussion inspires some questions. Since using the Fourier transform introduces a blur, is there some other inversion scheme with better resolution properties? Instead of assuming the non-measured samples to be zero, is there some clever way of extrapolating p(kx) out past x0, either implicitly or explicitly? What if we cannot, or prefer not to, have regular sampling? More seriously, what if we cannot sample at the Nyquist rate? All of these software-related questions find an answer in more advanced signal processing schemes such as compressed sensing, to be discussed in the second section of this chapter. For now, we will answer a hardware-related question. The procedure above relies on sampling p(kx), a complex quantity. This is only possible with sensors that measure phase, but phase-sensitive sensors in the optical regime have not yet been realized. The mathematical problem of extracting the phase from an amplitude signal is known as phase retrieval, and despite decades of research, little headway has been made. Is there a way to perform the necessary Fourier transform without sampling and processing? The answer is yes, and the solution is of course to use a lens. 2.1.5 Analog Fourier Transform Consider that instead of distributing sensors on the z = 0 plane, we place a lens, as shown in Figure 2.2. An ideal lens is simply a “phase screen,” adding a position-dependent phase shift to the signal without affecting its amplitude. The ideal lens has a quadratic phase shift; the signal passing 8 through it is modulated by the following complex function [6]: t(x) = ej kx2 2f (2.19) where f is the focal length. Thus, the spatial signal just below the z = 0 plane is the product of (2.10) and (2.19): q˜(x) = q(x)t(x) (2.20) Figure 2.2: An electromagnetic wave impinging upon a lens at the z = 0 plane, and propagating to the z = −z0 plane. We now propagate q˜(x) to a new plane at z = −z0. In keeping with the Huygens-Fresnel principle [6], we calculate this propagation by treating every point just below the z = 0 plane as the source of a new spherical wave. Consider just one ray emanating from just one point, x, on this plane to a point, x′, on the z = −z0 plane, as shown in Figure 2.3. This ray has initial amplitude q˜(x)dx and propagates a distance d. After propagation, it encounters a phase decrease of kd. Its amplitude is also reduced by 4pid2. If we call the signal on the z = −z0 plane s(x′), we can write: s(x′)dx = 1 4pid2 q˜(x)e−jkd dx (2.21) The distance d is: d = √ (x− x′)2 + z20 (2.22) which substituted into (2.21) gives: s(x′)dx = 1 4pi ((x− x′)2 + z20) q˜(x)e−jk √ (x−x′)2+z20 dx (2.23) 9 Figure 2.3: Using the Huygens-Fresnel principle to model electromagnetic propagation from one plane to another. To simplify this expression, we use the Fraunhofer approximation [6], which basically assumes: z20 >> (x− x′)2 (2.24) This assumption is valid when the planes are separated by a distance much greater than the size of the lens. Except to make this assumption, we will not explicitly assume a finite-sized lens, but will simply note that a finite lens produces a small blurring effect in an identical way to the finite sampling effect described above. Using the Fraunhofer approximation, we can approximate the amplitude term and phase term in different ways, since the function 1/d is much less sensitive to small changes in d than ejkd. The amplitude term can simply be approximated as: 1 4pi ((x− x′)2 + z20) ≈ 1 4piz20 (2.25) which is a constant amplitude term that is ignored in practice. For the phase term, we utilize the approximation (1 + δ)n ≈ 1 + nδ for small δ. e−jk √ (x−x′)2+z20 = e −jkz0 √ 1+ ( x−x′ z0 )2 ≈ e−jkz0 ( 1+ 1 2 ( x−x′ z0 )2) = e−jkz0 e−jk x′2 2z0 e jkx x ′ z0 e −jk x2 2z0 (2.26) Integrating both sides of (2.21) and substituting the approximations (2.25) and (2.26), we find an expression for the signal on the z = −z0 plane: s(x′) = e−jkz0 4piz20 e −jk x′2 2z0 ∫ ∞ −∞ q˜(x)e jkx x ′ z0 e −jk x2 2z0 dx (2.27) We recognize that the integral is almost a Fourier transform, excepting the 10 quadratic phase term e −jk x2 2z0 . To remove this quadratic phase we use the definition of q˜(x), Equation (2.20), to write s(x′) = e−jkz0 4piz20 e −jk x′2 2z0 ∫ ∞ −∞ q(x)e jkx x ′ z0 t(x)e −jk x2 2z0 dx (2.28) Not coincidentally, the ideal lens phase t(x), given in Equation (2.19), exactly cancels this quadratic phase term if we choose f = z0 to be the focal length of the lens, yielding: s(x′) = e−jkz0 4piz20 e −jk x′2 2z0 ∫ ∞ −∞ q(x)e jkx x ′ z0 dx (2.29) This Fourier transform cancels the Fourier transform used to create q(x), giving us back the desired function A(φ). This last statement is made con- crete by considering a new function u that is a rescaled and phase-adjusted version of s, defined such that: u(−z0x′) e −jkz0 4piz20 e −jk x′2 2z0 = s(x′) (2.30) With this substitution and the previously-defined substitution p(kx) = q(x), Equation (2.29) can be succinctly written as a simple Fourier transform re- lationship between the signal p just above the plane z = 0 and the signal u on the plane z = −z0: p←→ u (2.31) Coupled with the previous Fourier transform result, Equation (2.12), p←→ A we conclude that u = A. In other words, the signal on the image plane (ignoring constants and the small phase modulation e −jk x′2 2z0 ) is a scaled ver- sion of the desired sky image A(φ). Therefore, to sample A(φ) we simply sample u(− x z0 ) where x is measured on the z = −z0 plane. A popular way to perform this sampling is with a charge-coupled device (CCD), an array of sensors. While without the lens we sampled and performed the Fourier transform in software, here we let nature perform the Fourier transform, with the help of the lens to cancel the quadratic phase. 11 2.1.6 Generalized Sampling In the previous sections we detailed the relationship between different elec- tromagnetic imaging modalities and recognize that it is simply a matter of which domain is being sampled. A typical radio imager samples q(x) directly, so it samples A(φ) in the Fourier domain. A typical optical imager uses a lens and samples s(x′) on the image plane, essentially sampling A(φ) directly. This is known as the canonical, or Dirac, domain. To generalize our notion of sampling, we model each sample as an inner product between our image A(φ) and a “test function” h(φ): gm = 〈A, hm〉 ≡ ∫ ∞ −∞ A(φ)h∗m(φ) dφ (2.32) where gm is a sample, and the asterisk denotes complex conjugation. If we take the set of test functions to be complex exponentials, then we recognize that each measurement is a sample of the Fourier transform, which is the radio imaging case. If we take the test functions to be Dirac delta functions, we have the optical imaging case. Actually, since a photosensor cannot have infinitesimal area, but rather integrates the signal over a small range of values of x, the appropriate {hm} for CCD-based optical imaging consists of rectan- gle functions. However, more exotic test functions are certainly possible and can be beneficial, an example of which is the single-pixel camera discussed in Chapter 4. Equation (2.32) generalizes all forms of linear sampling. Digital computing requires discrete problems, so much of the remainder of this thesis will be formulated in the discrete case. In order to discretize (2.32), we consider its associated Riemann sum approximation. gm = ∞∑ i=−∞ A(φi)h ∗ m(φi)∆φ (2.33) Here, {φi} is a set of equally spaced independent variables separated by ∆φ. Since A has limited support, we only need to consider a finite number of values of i, a number we call N . gm = N−1∑ i=0 A(φi)h ∗ m(φi)∆φ (2.34) 12 Suppose we have M such measurements, each with a different test function. By stacking these M measurements into a vector, we can rewrite (2.34) in matrix-vector notation: g = H∗f (2.35) where the columns of the matrix H are the discretized test functions, and we have relabeled the desired image vector f as the discretized version of A(φ). The asterisk denotes the Hermitian operator (conjugate transpose). The number of elements in g is M , the number of measurements, and the number of elements in f is N , the desired resolution. The matrix H is known as the sampling matrix. Equation (2.35) will serve as our primary generalized forward model for the remainder of this work. 2.1.7 Extension to Variance Imaging Until now, for notational simplicity we have assumed that A(φ), the com- plex amplitude distribution, is the function we wish to estimate. However, in many applications, we do not care about the phase of A(φ), only the mag- nitude. For example, in optical imaging of a noncoherent source, the phase carries no useful information. To use a radar example, we often wish to know if a target is present in some direction and at some approximate range, but do not care whether this range is 5000 meters versus 5001. In these applica- tions, the phase is a random variable while the magnitude is deterministic. In some applications, even the magnitude is a random variable. To capture all of these cases, we say that instead of estimating A(φ), we wish to estimate f(φ) = E [|A(φ)|2] (2.36) where E[·] is the expected value operator. This is the variance of the random variable A(φ) (assuming it has zero mean). Our previous measurements {gm} are related to A(φ), not its variance, so to get information on the second moment of A, we must find the second moment in our measurement space. 13 To do so, we calculate the correlation between two measurements: E[gmg∗k] = E [(∫ ∞ −∞ A(φ)h∗m(φ) dφ )(∫ ∞ −∞ A(φ′)h∗k(φ ′) dφ′ )∗ ] = ∫∫ ∞ −∞ E[A(φ)A∗(φ′)]h∗m(φ)hk(φ′)dφdφ′ (2.37) The function E[A(φ)A∗(φ′)] is the autocorrelation of A(φ). Assume that signals coming from different directions are uncorrelated. Then the autocor- relation function is: E[A(φ)A∗(φ′)] = f(φ)δ(φ− φ′) (2.38) This assumption is justified, for example, in the case when A(φ) has a deter- ministic magnitude and a uniform random phase, ψ, that is spatially uncor- related: E[A(φ)A∗(φ′)] = |A(φ)| |A(φ′)|E [ ej(ψ1(φ)−ψ2(φ ′)) ] = |A(φ)| |A(φ′)| δ(φ− φ′) = f(φ)δ(φ− φ′) (2.39) This assumption is also justified if the magnitude is random, but is not shown here. In any case, substituting (2.38) into (2.37) yields a simplified expression: E[gmg∗k] = ∫∫ ∞ −∞ f(φ)δ(φ− φ′)h∗m(φ)hk(φ′)dφdφ′ = ∫ ∞ −∞ f(φ)h∗m(φ)hk(φ)dφ (2.40) Equation (2.40) is of the same form as (2.32): our “measurements” (actu- ally cross-correlations of measurements) are generalized linear samples of the function we wish to estimate (the variance, f(φ)), and the “test functions” are actually products of test functions. This means that Equation (2.35) is still valid as a forward model for variance imaging. For the example of radio imaging, this result is particularly simple. In this case, as discussed above, the individual test functions are complex exponen- 14 tials, so (2.40) becomes: E[gmg∗k] = ∫ ∞ −∞ f(φ)e−jkxmφejkxkφdφ = ∫ ∞ −∞ f(φ)e−jk(xk−xm)φdφ (2.41) where xm and xk are the locations of the two sensors measuring gm and gk. Here we see that the new test function is also a complex exponential, maintaining the Fourier transform relationship between what we have (cross- correlations between sensors) and what we want (the distribution of signal energy across the sky). This is a well-known result in radio interferometry, radio astronomy [7], and more recently in radar imaging [8]. Cross-correlating the signal from two spatially separated antennas (the “visibility” function) yields a sample of the Fourier transform of the angular energy distribution (the “brightness” function). It is worth pausing briefly to review the important distinction between correlation and sample correlation, since we have referred to the correlation as a measurement. Colloquially, the term “correlation” is often used to refer to the operation of taking two measurements and averaging their product over time or space. However, in the context used above, and in the strict sense, correlation is an ensemble average, a statistical concept. It is a fun- damental property of a random signal; it cannot be measured. A temporal or spatial average can be used to estimate, not measure, the correlation, and only when the signal is ergodic. Any such temporal or spatial average is necessarily finite, leading to uncertainty in the estimate. As such, in a prac- tical experiment, the left-hand side of Equation (2.40), E[gmg∗k], which is the “measurement,” contains uncertainty. One must estimate this uncertainty in order to effectively use this data. This uncertainty is based on the properties of both the random signal, A(φ), and random sensor noise. It is tempting to lump statistical uncertainty and sensor noise together as “noise,” but an understanding of the distinction is essential for proper use of one’s data. A formal treatment and useful reference for these considerations for the case of an incoherent radar experiment is given by [9]. Finally, we note that this discussion about variance imaging relates pri- marily to the radio case, since optical sensors cannot measure phase and thus directly measure power, which is related to signal variance. However, as dis- 15 cussed above, Equation (2.35) encapsulates these complexities and continues to provide a valid discrete forward model for our problem. With the forward model for spatial electromagnetic imaging developed in this section, we now turn to the latter half of our sampling and reconstruction problem. 2.2 Compressed Sensing In the case of electromagnetic imaging, and in many other applications, our measurements are governed by the linear forward model (Equation (2.35)): g = H∗f where g is a vector consisting of our measurements, f is the image vector we want, indexed by space, and H∗ is the M × N linear operator that relates the signal of interest to our data. In the special case where M ≥ N and H∗ is full rank (analogous to sampling at or above the Nyquist rate), we can reconstruct f exactly with a simple matrix inversion or least-squares solution. A more interesting and practical case is where M < N , a situation in which we refer to the problem as underdetermined. In this case, there are an infinite number of possible estimates, fˆ , that agree with the data, g, because H∗ has a nonempty null space, N (H∗). This is because, for any plausible solution fˆ , we have: g = H∗fˆ = H∗fˆ + 0 = H∗fˆ + H∗n ∀n ∈ N (H∗) = H∗(ˆf + n) ∀n ∈ N (H∗) (2.42) This means that, for any plausible solution fˆ , an equally plausible solution is fˆ + n. How we are to choose among this infinitude of candidate solutions is the subject of regularization. The basis of regularization is the contention that not all candidate solu- tions are equally plausible when we incorporate prior information. Typical regularization is to choose the solution that minimizes some cost function c(f) while prescribing to the data. Formally, the procedure is to find the f 16 in order to: minimize : c(f) subject to : g = H∗f (2.43) There are many possible choices for the cost function c(f). For example, choosing c(f) = ‖f‖2 (2.44) yields Tikhonov, or 2-norm, regularization, which finds the candidate signal with the minimum energy. Choosing c(f) to be the Shannon entropy: c(f) = − N−1∑ i=0 fi ln fi (2.45) results in a form of regularization known as the Maximum Entropy Method [10]. Recently, a compelling new form of regularization has emerged from theo- retical results in sampling and reconstruction theory known as “compressed sensing” or “compressive sampling” [11–15]. Briefly, the compressed sensing scheme is to regularize by the transform sparsity: c(f) = ‖Ψ∗f‖0 (2.46) where ‖ · ‖0 is the “0-norm,” which counts the number of nonzero elements and Ψ is an N × N orthogonal matrix, whose columns define a basis in which f is believed to be sparse (or at least compressible, as we will see). The operator Ψ∗ thus implements a transformation of f into a basis in which it is well-represented by a small number of coefficients. In the following sections we introduce the basic theory of compressed sens- ing and its extensions, and discuss the reasons compressed sensing appears to be an improvement to current regularization schemes. 2.2.1 Basic Theory We argue that compressed sensing offers a more compelling form of regular- ization in part due to surprising theoretical recovery results. Historically, the Shannon-Nyquist sampling theorem has guided engineers’ thinking in instru- ment design and signal analysis. This theorem states that a signal is exactly 17 recoverable from its discrete samples provided its bandwidth is less than half the sample rate. Recent results indicate that exact signal recovery is possible under much looser constraints (e.g., [11, 12]). Instead of requiring the signal to be bandlimited, we require the signal to be sparse, meaning it consists mostly of zeros, with a small number of nonzero elements. Furthermore, we generalize our notion of sampling to include all linear samples instead of point samples (as discussed in Section 2.1.6). If the signal f is indeed sparse, K = ‖f‖0 (2.47) where K << N , then exact recovery of f by the measurements g = H∗f is possible by solving the following problem [11]: minimize : ‖f‖0 subject to : g = H∗f (2.48) The solution to the above problem is equal to the original signal provided H∗ satisfies the restricted isometry property (RIP) [11], which roughly requires that all sets of 2K columns of H∗ are approximately orthogonal. This ensures that most of the signal energy is captured by the sensing matrix. Of course, typical signals are not directly sparse. However, most natu- ral signals are sparse when transformed into an appropriate basis. This as- sumption of transform sparsity is the basis of widely-used image compression algorithms such as JPEG, for example. To formalize this into our sensing framework, we may think of our desired signal not as f but rather as its transformation: s = Ψ∗f (2.49) where the columns of the N×N matrix Ψ define a basis in which f is sparsely representable. With this substitution, Problem (2.48) becomes: minimize : ‖s‖0 subject to : g = H∗Ψs (2.50) Once the sparse transformed signal s is recovered by Problem (2.50), then the desired signal f can be recovered through f = Ψs. Note that Problem (2.48) is a special case of Problem (2.50) by considering Ψ = I, the identity 18 matrix, or Dirac basis. Solving Problem (2.50) only succeeds if the matrix H∗Ψ satisfies the RIP. The language of the RIP becomes slightly awkward and hard to interpret in this case. Since the validity of the RIP only depends on the relationship be- tween the sensing matrix H and the sparsity matrix Ψ, we will speak instead of the “incoherence” between these two bases, a property that determines the maximum K for which the RIP is satisfied [12]. Define the coherence of two bases as: µ ≡ √ N max i,j |〈hi, ψj〉| (2.51) where 〈hi, ψj〉 is the inner product between a column of H and a column of Ψ. In this case, the RIP is satisfied and exact recovery is therefore possible if [11]: K ≤ C 1 µ2 M log4N (2.52) where C is a constant. The less coherent the measurement and sparsity bases are, the fewer measurements are needed for exact recovery. The idea of incoherence is often described in terms of an uncertainty prin- ciple. It effectively measures how spread out basis vectors of H are when expressed in the Ψ-basis, and vice versa. For example, the Fourier and Dirac bases are maximally incoherent, since a delta function in one domain has full support in the other. This is important because it guarantees that informa- tion from a sparse signal s will be spread out over all possible measurements; undersampling will not accidentally miss an important coefficient. To give a negative example, assume that we design a system in which the sensing and sparsity matrices are identical and therefore maximally coherent. Perhaps our signal is sparse in the Fourier domain, meaning it can be well-represented by a few sinusoids (but we do not know which ones), and our sensor measures a few of these sinusoids. In this case, there is no hope of exact recovery since we would have to be extraordinarily lucky to sample the important sinusoids. Most (if not all) of our measurements would be zero. Research has revealed several rules of thumb for incoherence [16]. For example, most wavelet bases are reasonably incoherent with both the Dirac and Fourier bases (indicating the non-transitive nature of incoherence). Ran- domly selected basis vectors from noiselet domains are incoherent with wavelets. Finally, even completely random Bernoulli ±1 or Gaussian matrices are, with 19 overwhelming probability, incoherent with almost all sparsity bases of prac- tical interest. Further intuition can be gained on the motivation for using compressed sensing as a regularizer by considering a modern digital camera. These de- vices make millions of measurements with a CCD, and then the signal is fed to an on-board image compression chip. This chip performs a basis transfor- mation (a Discrete Cosine Transform for JPEG, or a wavelet transform for JPEG-2000), and then keeps only the most important coefficients, yielding a compressed version of the image that is often perceptually lossless. Most of the data obtained in the measurement step are thus immediately discarded. Compressed sensing offers a framework to remove this inefficiency, collecting a form of the signal that is already compressed These preliminary compressed sensing results show that if we have a signal that is sparse in a known basis, and a sensing basis that is reasonably inco- herent with the sparsity basis, we can get away with a severely undersampled problem. By solving Problem (2.50), we can recover the sparse signal with many fewer measurements than previously thought possible. This raises the question of how to solve Problem (2.50). The problem of `0-minimization is an NP Hard problem, which means that global optimization would re- quire an exhaustive search of exponential complexity. Fortunately, as we will see in the next section, Problem (2.50) has the same global minimum as `1-minimization, provided the RIP is satisfied. 2.2.2 Leveraging `1-minimization Perhaps one of the greatest breakthroughs in compressed sensing theory is the proof that `1-minimization is a “sparsity-inducing” regularizer. This has allowed compressed sensing to be elevated from a mere mathematical curiosity to a powerful practical framework. Optimization theory is rich with `1-minimization problems, which are solvable with a variety of techniques such as linear programming. These results show that the solution to Problem (2.50) is also the solution to the following problem: minimize : ‖s‖1 subject to : g = H∗Ψs (2.53) 20 where the `1 norm is the sum-of-magnitudes: ‖s‖1 = N−1∑ i=0 |si| (2.54) This equivalence is by no means obvious and is a deep mathematical result. In this work, we will simply provide a geometrical example to build intuition [15]. Consider the case where M = 1 and N = 2, and for simplicity, Ψ = I, meaning our signal is directly sparse. Thus, we have one measurement of the form: g0 = h ∗ 0f0 + h ∗ 1f1 (2.55) and two unknowns f0 and f1. The act of recovering the signal, f , is identical to choosing a point in RN , which is R2 in our current example. The data constrain us to only consider points on the line described by (2.55). Any point on this line agrees with the data, so regularization decides which point on this line we choose. Tikhonov regularization, as discussed above, chooses the point with mini- mum `2 norm. Geometrically, as shown in Figure 2.4, this can be thought of as starting with a small sphere (the sphere being the “`2 ball”, or the locus of points with a given `2 norm), and expanding it until it is tangent to the line. The tangent point is the minimum `2-norm solution. Sparse solutions lie along axes, so the minimum `2-norm solution is almost never sparse. Figure 2.4: Regularizing with `2 minimization (Tikhonov regularization) does not produce sparse solutions. 21 On the other hand, consider `1-norm regularization. Here, we start with a small “`1 ball”, and expand it until it touches the line, as shown in Figure 2.5. In R2, the `1 ball, the locus of all points with a given `1 norm, is a diamond. Here we can see that `1-minimization offers a sparse solution with high probability. Of course, low-dimensional intuition can only take us so far. In higher dimensions, the `1 ball becomes even more pointy, increasing the probability that the correct solution is recovered. Figure 2.5: Regularizing with the `1 norm induces sparsity. We have seen that `1-minimization is a substitute for `0-minimization, allowing for practical recovery of sparse signals. However, before compressed sensing can become truly practical, we must ensure that the problem is stable. 2.2.3 Statistical Estimation The preceding discussions have all made two unrealistic assumptions: the data are noiseless, and the signal is actually sparse. In practice, noise is om- nipresent, and signals, while possibly compressible, are never exactly sparse. We first address the noise concern. Although our immediate intuition is that such a nonlinear problem would be extremely sensitive to noise, and that the surprising recovery results described above must only be true in the noiseless case, the truth is that this recovery scheme is indeed stable. However, one adjustment must be made. Instead of adhering exactly to the noisy data, we allow the algorithm to deviate from the data in search of 22 sparse solutions, provided that the deviation is within some estimate of the noise power, 2: minimize : ‖s‖1 subject to : ‖g −H∗Ψs‖22 ≤ 2 (2.56) If this recovery scheme is used, then it can be proved that under some con- ditions related to the RIP, the worst case reconstruction error is bounded by a linear function of the noise level [13]: ‖sˆ− s‖2 ≤ C1 (2.57) where sˆ is the estimation of s and C1 is a constant that in image processing applications is empirically around 10 [13]. Thus, sparse noisy recovery is stable. The second concern is how this recovery scheme responds to non-sparse signals. Although many natural signals are compressible, meaning that they can be well-approximated by a small number of coefficients in some basis, they are not exactly sparse. Their lower-order coefficients are small but nonzero. Compression is necessarily lossy. Can we still use compressed sens- ing algorithms to reconstruct signals even if the signals are not sparse? The answer is a qualified yes. In these cases, we cannot hope to recover the true signal s, but rather a compressed version of the signal, sK , which has all but the largest K coefficients set to zero. By solving Problem (2.56) in this case, we can be guaranteed [13]: ‖sˆ− s‖2 ≤ C1+ C2‖s− sK‖1√ K (2.58) Here we see that the reconstruction error is bounded by two error terms. The first is a function of the noise, as we have seen. The second is the inevitable error associated with approximating s as sparse. This term guarantees that the recovered solution is close to a compressed version of the actual signal. Again, C2 is a constant that is empirically well-behaved [13]. Thus, our problem is indeed stable even when the data are noisy and the signal is not sparse, and since the solution is unique, the problem is also well-posed. 23 2.2.4 Compressed Sensing In Practice In practice, inversions using compressed sensing depend heavily on the cho- sen sparsity basis, Ψ. While a general-purpose basis may be discovered in the future, it does not appear that one basis will work for all problems. Nevertheless, certain bases have been found to be practical for many prob- lems. For example, smoothly varying signals are compressible in a Fourier basis, piecewise constant signals are compressible with the Haar wavelets, and piecewise smooth signals are compressible with any number of alternate wavelet bases. Perhaps one of the most appealing properties of regularizing with compressed sensing is just this choice: allowing the user to choose the sparsity basis in order to tune the algorithm to certain signals of interest, or to bring out certain features in an image. Algorithms to solve Problem (2.56) abound. This problem is known as Basis Pursuit Denoise, and algorithms for its solution are still emerging. Some are classical greedy algorithms, for example Matching Pursuit, Orthog- onal Matching Pursuit (OMP), and Compressive Sampling Matching Pursuit (CoSaMP) [17]. These bear a striking resemblance to many ad hoc deconvo- lution methods such as the CLEAN algorithm [18]. Others solve smoothed versions of the `1 norm, for example Iterative Reweighted Least Squares [19]. Many others are completely novel approaches to the inversion problem; a few that we have found useful are SPGL1 [20], SpaRSA [21], and NESTA [22]. A major goal of current research in this direction is to find efficient methods for large scale problems. A recent trend in compressed sensing is to use “overcomplete dictionaries” instead of an orthogonal sparsity basis. Essentially, this replaces the N ×N matrix Ψ with a wider cN × N matrix, where c is an integer greater than one. This applies for signals which are sparse when expressed not in a single basis but as a combination of vectors from different bases. For example, some astronomical images consist of a smoothly varying background punctuated by stars, so the appropriate Ψ might be the concatenation of Fourier and Dirac bases. Another popular procedure for recovery of two-dimensional signals is to assume the signal is sparse in its Total Variation (TV) [12]: TV (f) = ∑ i |∇fi| (2.59) 24 where ∇fi is an approximation to the gradient of the underlying function f at location i. This is not strictly a basis but has found empirical success for signals that are piecewise constant (e.g., magnetic resonance imaging of the brain [23]). The recent practical results and rich theoretical underpinning of com- pressed sensing make it a compelling choice for regularizing electromagnetic imaging problems. For the design of electromagnetic imagers, it is even more compelling. While current imaging architectures employ a “sample first, ask questions later” [24] approach to signal acquisition, compressed sensing is a more holistic approach to sampling. Furthermore, if a suitable measurement basis such as noiselets are chosen, the samples are universal (applying to al- most any type of signal and sparsity basis), and future-proof (can be stored and decoded at a later date when processing algorithms have improved). Of course, the true value of compressed sensing for electromagnetic imaging must be judged not on theoretical grounds, but rather on the basis of sim- ulation of real imaging scenarios, and application to real data. This will be the focus of the next two chapters. 25 CHAPTER 3 RADAR IMAGING WITH COMPRESSED SENSING A conventional radar uses either a single antenna or multiple antennas con- nected together to simulate a large antenna. The radar can access range information by range-gating the signal returns, but it cannot access angular information in this simple configuration. Signals from different directions are summed together, weighted by the antenna beam pattern. Imaging requires access to this angular information, as discussed in Chapter 2. The only way to image in this case is to steer the antenna, either electronically or mechan- ically, or to keep the antenna stationary and allow the target to drift across the beam. In both cases, the spatial resolution is limited by the antenna beam width. Also, beam steering creates an ambiguity between temporal and spatial variation, as the true image may change as it is being observed. A better solution is to use multiple antennas without connecting them together in a phased array, but rather recording the signals separately on each antenna. In this way, we obtain samples of the desired angular distribution, f(φ), in the Fourier domain. As we saw, this leads to an underdetermined problem in most practical cases. The optimal way to infer f(φ) from these incomplete measurements is a topic of debate in the field of radar imaging. In this chapter we show how compressed sensing can be used to invert radar imaging data. We present simulation results in which compressed sensing outperforms existing methods and show some examples of its application to data collected at the Jicamarca Radio Observatory. 3.1 Radar Imaging Background Before discussing the application of compressed sensing to invert radar imag- ing data, we first describe the history of ionospheric radar imaging and detail popular inversion methods. 26 3.1.1 History Ionospheric radar imaging has its roots in early two-antenna interferometry experiments at the Jicamarca Radio Observatory in the 1970s and 80s [25,26]. These experiments were the first to make measurements from two antennas separately, and correlate the results to obtain angular information. As we know now, they were just measuring a single Fourier component of f(φ), but in these works they assumed a Gaussian shape for f(φ) and solved for the center location (which is related to the relative phase between the antennas), and the angular spread of the scattering region (known as the “coherence”). By monitoring the center location over time, they were able to deduce a drift velocity. This new method proved to be a rich source of information to study the horizontal dynamics of ionospheric irregularities. While it is impressive what these researchers were able to achieve with one frequency component, it is clear that using more antennas would provide more angular information. However, it was not until 1991 that this tech- nique was generalized, and the Fourier transform relationship between the measurements and the desired image was discovered in the context of radar imaging [27]. Interestingly, radio astronomers had known this result since at least the 1950s, and the formation of images in radio astronomy is extremely closely related to imaging with radar [7]. 3.1.2 Basic Fourier Inversion Although the 1991 work was a breakthrough in radar imaging, the method with which the data were inverted was naive, so the resolution of the resulting images was suboptimal. This method was based on the fact that the true image, f(φ), and the measurements, samples of g(k∆x), are related with a Fourier transform, which we may write as g(k∆x) = ∫ ∞ −∞ f(φ)ejk∆xφ dφ (3.1) where k is the wavenumber and g(k∆x) is obtained by cross-correlating the signals from two antennas separated by a distance ∆x (see Section 2.1.7). This method then claimed that the best way to recover f(φ) is to attempt 27 to invert the Fourier transform as such: f(φ) = ∫ ∞ −∞ g(k∆x)e−jk∆xφ dk∆x (3.2) Evaluating (3.2) requires measuring g(k∆x) for all possible antenna sep- arations. Since this is not practical, this early method attempts to mimic (3.2) using only the available measurements: fˆ(φ) = M−1∑ m=0 g(k∆xm)e −jk∆xφ (3.3) where fˆ is the estimate of f , ignoring constant multipliers. Note that this method implicitly assumes that the non-measured Fourier components are zero, a rather arbitrary choice not made in more advanced methods. A closer look at the ramifications of this assumption can provide some intuition on the resolution limits of this method. Assuming that the non-measured values are zero is tantamount to as- suming that the visibility function, g(k∆x), is well-approximated by delta functions at its measured points: gˆ(k∆x) = g(k∆x) M−1∑ m=0 δ(k∆x− k∆xm) ≡ g(k∆x) d(k∆x) (3.4) where we define the function d as the delta function mask. Taking the Fourier transform of both sides and applying (3.3), we find: fˆ = f ∗D (3.5) where D is the Fourier transform of d, and the asterisk denotes convolution. In other words, the estimate of the image is the true image blurred by the function D. Evaluating D, we find: D(φ) = M−1∑ m=0 ejk∆xmφ (3.6) Instead of indexing by the antenna separations, we will instead index by 28 the antenna locations directly, resulting in a double sum: D(φ) = ∑ i ∑ j ejk(xi−xj)φ = (∑ i ejkxiφ )(∑ i ejkxiφ )∗ (3.7) which we recognize as the squared magnitude of the receiving array beam pattern. The true image is convolved with the beam pattern, resulting in a resolution limit on the order of the beam width and ringing due to side- lobes. The sidelobes can be reduced by tapering the amplitudes elements in the array. For this reason, this method is often referred to in the literature as “spectral windowing,” but, as we show above, it is no different than cre- ating an image by sweeping the beam across the scene with a conventional single-receiver radar. Using this inversion method, the only advantage over a conventional radar is to separate temporal and spatial variation, but we do not overcome the spatial resolution limitations. To fully realize the advantages of radar imaging, we must use more ad- vanced techniques for inverting our data. We will describe in detail the two leading methods, Capon’s method of imaging and the Maximum Entropy method. 3.1.3 Capon’s Method Capon’s method of imaging is basically a minimum-variance beamformer. The original method was developed in the late 1960s to analyze seismic mea- surements [28], and it was applied to radar imaging in 1998 [29]. It can be viewed as a generalization of the beam steering approach. Instead of choos- ing a constant beam pattern in the form of a linear phased array, Capon’s method optimizes the antenna weights in order to minimize interference, and it chooses different weights for different look directions. Following the notation from Chapter 2, we denote the phasor measured at antenna i as q(xi). Stacking the measurements from all antennas yields the vector q. The unknown antenna weights are denoted by w. The output of the array is then y = w∗q (3.8) 29 where w∗ denotes the conjugate transpose of w. We want y to be an optimal estimate of A(φ), the complex amplitude of the signal coming from the angle φ. One option is to choose the weights in the form of a linear phased array: wi = e −jkxiφ (3.9) However, this is no different than the basic Fourier inversion described above. Signals from other directions will leak into the sidelobes of the beam pat- tern. We may think of the resulting output as being the sum of the desired signal with interference. To minimize the interference, we minimize the total output power while constraining the beam pattern to unity in the desired direction, φ. This constraint ensures that the desired signal, A(φ), passes to the output unmodified, while minimizing the interference power. Formally, Capon’s method is: minimize: E [|y|2] subject to: e*w = 1 (3.10) where the vector e is: e = [ ejkx0φ ejkx1φ . . . ejkxI−1φ ]T (3.11) and I is the number of antennas. Minimization under constraints is identical to unconstrained minimization of the following function with respect to the weight vector, w, and the La- grange multiplier, Λ: L(w,Λ) = E[|y|2]+ Λ(e*w − 1) = E[yy∗] + Λ(e*w − 1) = E [ w*qq*w ] + Λ(e*w − 1) = w*E [ qq* ] w + Λ(e*w − 1) = w*Rw + Λ(e*w − 1) (3.12) where we have defined the correlation matrix, R = E [ qq* ] , and q∗ denotes the conjugate transpose. Setting the partial derivatives of L equal to zero 30 results in a closed form for the weight vector: w = R−1e e∗R−1e (3.13) Recall from Section 2.1.7 that we are interested in imaging the variance of A(φ), so our estimate is: fˆ(φ) = E [|y|2] = w∗Rw = 1 e∗R−1e (3.14) This provides a simple formula for estimating the power from a given direc- tion based on the cross-correlations (which are the effective measurements, as described in Section 2.1.7). This procedure is repeated for many values of φ in order to generate an estimate of the function f(φ). Of all estimates of f(φ) that are linear functions of the received signals, Capon’s method pro- vides the minimum mean-square estimate. Another advantage of Capon’s method is its relatively low computational complexity. For each pixel in the recovered image, we must only solve a linear system. Although Capon’s method provides the optimal linear estimate, there is no reason we must re- strict ourselves to linear processing schemes. Although nonlinear processing is generally more computationally complex, it allows access to more inversion methods with more theoretical and empirical justification. Both Maximum Entropy and compressed sensing use nonlinear inversions. 3.1.4 Maximum Entropy The Maximum Entropy method (MaxEnt) is an attempt to adhere to the “first principle of data reduction” by Ables: “The result of any transforma- tion imposed on the experimental data shall incorporate and be consistent with all relevant data and be maximally non-committal with regard to un- available data.” [30]. Many argue that the only way to conform to this principle and avoid making unfounded assumptions is to regularize by the entropy functional (e.g., [31–33]). In other words, of all the infinite candi- date solutions that agree with the data, we should pick the solution with the 31 maximum entropy. The entropy is defined as: S = − N=1∑ i=0 fi log fi F (3.15) where F = N−1∑ i=0 fi (3.16) and fi denotes the i th element of f , the discretized version of f(φ). MaxEnt has a number of desirable properties. First, if we understand en- tropy as information content, then MaxEnt recovers an estimate with only as much information as allowed by the experimental data. To choose a solu- tion with a higher entropy is to imply information not contained in the data. Second, entropy is a smoothness metric, in a certain sense. Images that are closer to the flat, uniform image have a higher entropy. Maximizing the en- tropy is equivalent to minimizing the relative entropy (or Kullback-Leibler divergence) between f and the uniform image. Of course, the entropy metric does not explicitly enforce spatial smoothness, since (3.15) is invariant un- der a relabeling of the indices. Finally, regularizing with entropy implicitly imposes a non-negativity constraint on f , since negative images have an un- defined entropy. In many practical applications, the desired image is known to be nonnegative. In the case of radar imaging, this is because the image represents the variance, which is related to scattered power. Critics of MaxEnt claim that since there is nothing resembling “informa- tion” in the radio interferometry problem, the information-theoretical argu- ments that justify MaxEnt are not valid. They claim the practical success of MaxEnt arises because it keeps solutions close to the uniform image and enforces non-negativity, not because it has roots in information theory [34]. The apparent success of other similar regularization functions seems to sup- port this claim [35]. Although the entropy expression seems somewhat arbitrary at first glance, it has an interpretation as a prior distribution. Many regularization schemes can be understood as Maximum a Posteriori (MAP) estimators, and Max- Ent is no exception, although this fact is rarely discussed in the literature. Imagine the proverbial monkey with a container of F radio photons, and an initially empty pixelated image, f . The monkey takes a photon from the 32 bucket, chooses a pixel at random with uniform probability, and places the photon in this pixel. The monkey repeats this process until all photons have been distributed. Of all the possible images that agree with the data, we will choose the image that the monkey is most likely to create. The a pri- ori probability of creating an image, f , is proportional to the multiplicity of f , the number of ways the monkey can create f . By a simple counting argument, the multiplicity is given by the multinomial coefficient: m(f) = F !∏N−1 i=0 fi! (3.17) We seek to maximize the prior probability, which we can do by maximizing the multiplicity. To make the computation easier, we will maximize the logarithm of the multiplicity, which leads to the same solution since the logarithm operator is convex. As we show below, the log-multiplicity is equal to the entropy. logm(f) = logF !− N−1∑ i=0 log fi! ≈ F logF − F − N−1∑ i=0 (fi log fi − fi) = F logF − N−1∑ i=0 fi log fi − F + N−1∑ i=0 fi = F logF − N−1∑ i=0 fi log fi = N−1∑ i=0 fi logF − N−1∑ i=0 fi log fi = − N−1∑ i=0 fi log fi F We recognize the last line as the entropy. The second line used Stirling’s approximation: log n! ≈ n log n− n (3.18) which is valid for large n. Note that this approximation is valid in our case, since the quantum nature of values of f are only a mathematical convenience. 33 We have the freedom to define the quanta size to be arbitrarily small, yielding arbitrarily large values of fi. This effectively generalizes this procedure to our continuous case. The MaxEnt procedure is often formulated as follows, in order to account for noisy measurements: maximize: − N−1∑ i=0 fi log fi F subject to: ‖g −H∗f‖22 ≤ 2 F = N−1∑ i=0 fi (3.19) where H∗ is the M ×N linear operator implementing the forward model, g is the vector of measurements, and 2 is an estimate of the noise power (see Chapter 2). Of all images that could be described by the data corrupted by noise of power 2, we choose the one with the maximum entropy. Instead of performing the constrained minimization in (3.19) directly, we can convert it to an unconstrained minimization of a function of Lagrange multipliers. The details are not shown here but can be found in the literature (e.g., [36, 37]). This results in the following set of equations: fi = Fe ∑ j h ∗ ijλj∑ i e ∑ j h ∗ ijλj ∀ i ∈ [0, N − 1] λj = −2Λ ( gj − ∑ i h∗ijfi ) ∀ j ∈ [0,M − 1] Λ2 = ∑ j λ2j 42 (3.20) These M + N + 1 coupled equations can be reduced to M equations in the Lagrange multipliers {λj}. Performing a MaxEnt inversion is a matter of solving these M nonlinear equations. In the noiseless case, these equations have a unique solution, but we were unable to find an analogous result for finite 2. Empirically, applying Newton’s method to solve these equations is found to yield a unique solution and converges for a wide range of initial guesses [35, 37]. MaxEnt inversions attain a higher resolution than basic Fourier inversions 34 with fewer sidelobe artifacts. An example of this can be seen in Figure 3.1, which shows actual data from observations of equatorial spread-F at the Jicamarca Radio Observatory. In the top pane, the width of main beam causes smearing artifacts in the reconstruction. In the bottom pane, we see that MaxEnt is able to resolve smaller-scale features. For example, a bifurcation is evident at 320 km, and vertically elongated structures on the bottomside layer between 250 and 275 km are seen to drift eastward and shear [35]. Subsequent work on this method has shown that further resolution improvements can be attained by accounting for correlated errors; since each measurement uses information from pairs of antennas, we should not expect our errors to be uncorrelated [36]. Figure 3.1: A comparison of the basic Fourier inversion (top) with MaxEnt (bottom) using observations of equatorial spread-F at the Jicamarca Radio Observatory. After [35]. Reproduced with permission of the American Geophysical Union. An initial comparison of the Fourier, Capon, and MaxEnt inversions us- ing a few representative image models and antenna layouts found that both Capon’s method and MaxEnt are superior to the Fourier inversion. The results of Capon’s method and MaxEnt were comparable, but MaxEnt was slightly superior, especially at low SNR [38]. A subsequent study reached similar conclusions [39]. As discussed in Chapter 2, radar imaging seems an ideal application for compressed sensing, yet to our knowledge there is no formal treatment in the literature. In the following section, we seek to bridge this gap by comparing compressed sensing to current methods in the context 35 of radar imaging. 3.2 Compressed Sensing Results Radar imaging is usually an underdetermined problem. Each of the meth- ods described above deals with this fact in different ways. The philosophy of compressed sensing is to recover the most parsimonious image consistent with the data, where parsimony is defined by a preference for a low-bit rep- resentation. In other words, we reconstruct a sparse image. It is our task to choose a domain in which the image will be sparse. This choice allows us to tailor the inversion to particular classes of signals or to bring out certain image features, a tool unavailable in the above methods. However, even with a simple sparsity domain such as wavelets, compressed sensing inversions can outperform existing methods, as we show in the following sections. 3.2.1 Simulation Details We simulate a simple radar imaging experiment to test the compressed sens- ing method against Capon’s method and MaxEnt. We used the antenna locations from the imaging configuration of the Jicamarca Radio Observa- tory near Lima, Peru. These are shown in Figure 3.2. For this experiment, we assume that ionospheric structure does not depend on magnetic latitude, and therefore we can reconstruct an image purely as a function of range and magnetic longitude. This assumes that scatterers are constant along mag- netic field lines. This is a typical assumption made in ionospheric imaging that is theoretically valid for large, km-scale features, which tend to be elon- gated along the Earth’s magnetic field [40]. For each range gate, we seek to recover the angular distribution in the magnetic east-west direction, f(φ). In this work we simulate this one-dimensional imaging problem. For the antenna locations, the relevant coordinate is the magnetic east coordinate. In keeping with previous simulation studies, we start with a Gaussian as our true image, f(φ). We note that a real ionospheric image will not be as simple as a single Gaussian, nor is it the case that the results of this sim- ulation generalize using superposition, since all of these techniques (except the Fourier inversion) are nonlinear in the sense that they adapt to the data. 36 −50 −40 −30 −20 −10 0 10 20 30 −60 −50 −40 −30 −20 −10 0 10 20 30 Wavelengths W av el en gt hs N Figure 3.2: The antenna array at the Jicamarca Radio Observatory. The locations of the receiving antenna modules used in the simulation are shown in gray. Each module comprises 36 half-wave dipoles. The arrow indicates geomagnetic north, which is within 1◦ of geographic north. Thus, we use a Gaussian as a proxy, and interpret our results conservatively. We use a Gaussian centered at φ = −3◦, with a width of 2◦. We have tried other image models, such as multiple Gaussians, rectangles, and triangle functions, and we obtain qualitatively similar results. The measurements, g(k∆x), are Fourier coefficients of f(φ) specified by the antenna separations, as indicated by Equation (3.1). We add error to these measurements and reconstruct with all four methods. Realistically, this error arises from the finite time intervals used to estimate the signal correlations and is a function of the probability distribution of the signal and receiver noise (see Section 2.1.7). In the present simulation, we model this error as uncorrelated Gaussian noise for simplicity. This simplifying assumption was used in a previous simulation study [39]. A more detailed forward model [38] and a first-order error model for the inversion [36] can be found in the literature. We define the signal-to-noise ratio (SNR) as: SNR ≡ ‖gexact‖2‖g − gexact‖2 (3.21) where gexact is the exact measurement vector, comprising correlations without 37 error, and g includes the introduced error. We note that this definition of SNR does not correspond to the actual SNR at the radar receiver front end. Our definition includes the effect of signal variance and finite integration times. In any case, in this study we are not concerned with the absolute value of SNR, only with how the comparisons change as we change the uncertainty. We reconstruct an image, fˆ(φ), onto the domain φ ∈ [−9◦, 9◦], which is large enough to include virtually all of the energy in f(φ). We reconstruct onto a grid with a coarser resolution than was used to simulate the measure- ments. The results of our inversions do not change qualitatively when we change the grid size. Both MaxEnt and compressed sensing require building a discrete forward linear operator as a matrix H∗, such that our measure- ments are modeled linearly as g = H∗f (3.22) as described in Section 2.1.6. In order to reconstruct a real image, we must be careful with how we build H∗, since the cross-correlations are in general complex. One option is to use the fact that the Fourier transform of a real function is even. This constraint creates two effective measurements per correlation, one each for the positive and negative spatial frequencies. This is identical to treating the antenna pair (i, j) as separate from the pair (j, i). However, we use a different but effectively identical option. We operate entirely with real numbers, treating the real and imaginary parts of g(k∆x) as two separate measurements. This way, the matrix H∗ is purely real and consists of sines and cosines instead of complex exponentials. MaxEnt and compressed sensing both have a free parameter, , the esti- mate of the magnitude of the error. In this simulation, we assume this value is known exactly. In a real scenario, one can sample range and Doppler bins without signal to obtain an estimate of the receiver noise power and use this, along with the known integration time, to obtain an estimate of the error magnitude. Of course, this estimate of error magnitude is subject to error, but this is a complication we do not consider. Obtaining solutions for the Fourier and Capon’s method is straightforward. To obtain the MaxEnt solution, we solve theM nonlinear equations generated by (3.20) using MATLAB’s fsolve function. We use a small, constant initial guess of λj = 10 −5, but we find that the solution is not sensitive to the initial 38 guess. As discussed in Section 2.2.3, the problem to solve for compressed sensing is (2.56): minimize : ‖s‖1 subject to : ‖g −H∗Ψs‖22 ≤ 2 (3.23) where Ψ is the sparsity basis, and the image is subsequently recovered with fˆ = Ψs. This problem is often referred to as Basis Pursuit Denoise. To solve this problem, we use SPGL1 [20,41]. For the sparsity basis, Ψ, we use the Daubechies-4 wavelets. The moti- vation for using wavelets comes from the widespread success of the JPEG compression standard. The original JPEG uses a block-version of the Dis- crete Cosine Transform, which can be seen as an approximation to wavelets, and JPEG-2000 uses wavelets directly. Their success supports the claim that many natural images are inherently sparse; they have structure that is lo- calized both in space and spatial frequency, and are thus compressible in the wavelet domain. Another attractive feature of wavelet transforms is that extremely fast implementations have been developed, of complexity O(N), which is even faster than a Fast Fourier Transform. To perform wavelet transforms we utilize MATLAB’s Wavelet Toolbox. The Daubechies-4 wavelets in particular were chosen because they were found useful in image processing applications [42]. However, this choice is somewhat arbitrary, as there exist many varieties of wavelets. The Daubechies- N wavelets are characterized by a maximum number of vanishing moments, meaning that they are optimal for polynomials up to order N. The higher N, the more vanishing moments, but the wider the support of the wavelet atom. Other wavelets have different design criteria and will give quantita- tively different results, though the broad morphological features remain the same. 3.2.2 Simulation Results In this section we show typical inversion results for high-, mid-, and low-SNR cases, and quantitatively compare the four inversion methods. A represen- tative mid-SNR inversion is shown in Figure 3.3. The SNR is 15 dB. The forward model is calculated with 16,384 grid points, and the reconstruction grid is 256 points. Since the Fourier and Capon’s methods do not return 39 meaningful absolute values, we have multiplied these reconstructions in or- der to make their maximum value equal the maximum value of the true im- age. The MaxEnt and compressed sensing inversions have not been modified. Qualitatively, it appears as if compressed sensing provides a reconstruction that is closer to the truth and has fewer artifacts. −10 −5 0 5 10 0.5 1 1.5 2 2.5 3 x 10−4 Po w er (ar bit rar y u nit s) Fourier Truth −10 −5 0 5 10 0.5 1 1.5 2 2.5 3 x 10−4 Po w er (ar bit rar y u nit s) Capon’s Truth −10 −5 0 5 10 0.5 1 1.5 2 2.5 3 x 10−4 Po w er (ar bit rar y u nit s) Degrees MaxEnt Truth −10 −5 0 5 10 0.5 1 1.5 2 2.5 3 x 10−4 Degrees Po w er (ar bit rar y u nit s) Comp. Sens. Truth Figure 3.3: Inversions of simulated data using four methods with SNR = 15 dB. The black dotted line is the true image. The red line is the recovered image, using the basic Fourier inversion (top left), Capon’s method (top right), MaxEnt (bottom left), and compressed sensing (bottom right). At very high SNR, MaxEnt and compressed sensing both appear to con- verge to the true image. A representative inversion for a 25 dB SNR is shown in Figure 3.4. Capon’s method does not converge to the true image, since side lobes in the beam pattern are still present, even though they have been arranged to minimize interference, and the main beam has nonzero width. In the Fourier method, we see a rather severe artifact near the right edge of the recovered image, possibly due to a large sidelobe or grating lobe. At very low SNR, the signal is buried so deep in the noise that recovery is practically impossible with any method. With the definition of SNR used here, this occurs for SNR below about 5 dB. A representative inversion of data with a 5 dB SNR is shown in Figure 3.5. We can see that each method deals with this high-noise situation in different ways. With the Fourier inver- sion, white noise in the frequency domain looks like white noise in the space domain. Capon returns spikes at locations where it was unable to cancel the noise. Since the constraints imposed by the data are so loose in low-SNR cases, MaxEnt provides an image that is close to the globally maximum en- tropy image, which is constant. Compressed sensing provides the sparsest 40 −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Degrees −10 −5 0 5 10 0 1 2 3 x 10−4 Degrees Po w er (ar bit rar y u nit s) Fourier Truth Capon’s Truth MaxEnt Truth Comp. Sens. Truth Figure 3.4: Inversions of simulated data using four methods with SNR = 25 dB. The black dotted line is the true image. The red line is the recovered image, using the basic Fourier inversion (top left), Capon’s method (top right), MaxEnt (bottom left), and compressed sensing (bottom right). image consistent with the data, which consists of one wavelet in this case, but is sometimes as low as zero if the data do not support even a single wavelet. −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Fourier Truth −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Capon’s Truth −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Degrees MaxEnt Truth −10 −5 0 5 10 0 1 2 3 x 10−4 Degrees Po w er (ar bit rar y u nit s) Comp. Sens. Truth Figure 3.5: Inversions of simulated data using four methods with SNR = 5 dB. The black dotted line is the true image. The red line is the recovered image, using the basic Fourier inversion (top left), Capon’s method (top right), MaxEnt (bottom left), and compressed sensing (bottom right). In order to quantitatively compare these four methods, we use a correlation metric as a proxy for image quality, following previous work [38]. We define the correlation as: corr ≡ fˆc T fc ‖fˆc‖2‖fc‖2 (3.24) where the subscript c denotes removal of the DC value. Removing the DC value is important so that this metric evaluates image structure, without in- 41 cluding the effect of a constant offset. In other words, a flat image should have a correlation of 0 with our Gaussian, and a perfect reconstruction should have a correlation of 1. The higher the correlation, the closer the reconstructed image matches the true image. We run simulations for different values of SNR between 0 dB and 25 dB and measure the correlation for each of the four methods. For each SNR, we run 100 trials with randomized error realizations in order to minimize statistical fluctuations. The average correlation is calculated for each SNR and plotted in Figure 3.6. Compressed sensing outperforms all other methods for virtually all SNR, except for a small dip at 6 dB. Surprisingly, Capon’s method is inferior for most SNR. Other image metrics than correlation could be used, such as mean-square error or other modified correlative measures. While quantitatively different, results from using some of these other methods are found to be qualitatively similar. These results match qualitatively the results in previous simulation studies, although other studies have shown better performance for Capon’s method [38,39]. 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 SNR Co rre la tio n wi th T ru th Comp. Sens MaxEnt Fourier Capon Figure 3.6: A statistical comparison of the four methods of inverting simulated radar imaging data. The average correlation is reported for each SNR for each method. In comparing these four methods, one must also consider the tradeoff be- tween the quality of the reconstruction and the running time. In our recon- structions, we find that Capon’s method takes the same amount of time as the Fourier method. MaxEnt requires the most running time, at about 100 times that of the Fourier method. Compressed sensing inversions take about 30 times longer than Fourier inversions. We note that these values are not general. Since iterative methods are used in MaxEnt and compressed sens- 42 ing, running times can vary with SNR, image model, or even the particular error realization. Furthermore, if the number of grid points or number of measurements changes, these comparisons could change significantly. Before moving on to real data, we must supply a word of caution. Nonlinear processing schemes can achieve higher resolution, but they can also produce spurious reconstructions. With linear reconstructions such as the Fourier method, the output is a linear function of the signal and noise. Thus, as the noise increases, the reconstruction degrades smoothly. This is not the case for nonlinear imaging, which often exhibits a “thresholding” behavior. In partic- ular for compressed sensing, a small change in the noise level may lead to one more or fewer atom in the reconstruction, which can significantly change the interpretation of the image. Consider as an example the inversion in Figure 3.7, which uses 5 dB SNR. The compressed sensing inversion consists of a sin- gle wavelet in the wrong location. The Fourier inversion is noisy everywhere. Even though the MaxEnt inversion is nonlinear, the entropy constraint seems to inhibit this nonlinear thresholding effect. We see this pathology in about 5% of the low-SNR compressed sensing inversions. However, this effect is often not practically important, since low-SNR reconstructions are not given much weight in the formation of the full two-dimensional image. −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Fourier Truth −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Capon’s Truth −10 −5 0 5 10 0 1 2 3 x 10−4 Po w er (ar bit rar y u nit s) Degrees MaxEnt Truth −10 −5 0 5 10 0 1 2 3 x 10−4 Degrees Po w er (ar bit rar y u nit s) Comp. Sens. Truth Figure 3.7: An example of the problem with nonlinear reconstructions, specifically compressed sensing. It can suggest signal structure where there is none. The SNR is 5 dB. The black dotted line is the true image. The red line is the recovered image, using the basic Fourier inversion (top left), Capon’s method (top right), MaxEnt (bottom left), and compressed sensing (bottom right). 43 3.2.3 Jicamarca Radio Observatory Data To further evaluate the performance of these four methods of inverting radar imaging data, we apply them to data from the Jicamarca Radio Observatory (JRO) near Lima, Peru (geographic: 11.95◦S, 76.87◦W). The magnetic dip angle at JRO is approximately 1◦. The data were acquired at 22:18:30 LT on day 277 of 2011. An interval of 12 seconds was used to estimate the signal correlations. It is assumed that image features are stationary over such a short interval. We perform inversions for every range gate and Doppler bin, and add the results from different Doppler bins to obtain a grayscale two- dimensional image. In some applications, the Doppler bins are assigned colors, resulting in a color image, but we do not do so here. The results of different inversion methods are shown in Figure 3.8. As discussed in Chapter 1, these images represent backscattered power at 50 MHz, which is mostly determined by 3m-irregularities in electron density. These large turbulent plumes of plasma are indicative of a phenomenon known as equatorial spread- F. The genesis and growth of equatorial spread-F is of great interest in ionospheric research [2, 4]. The true image in this case is unknown, so it is impossible to accurately assess image quality. However, we believe that compressed sensing gives the best resolution and image clarity of the four methods. The MaxEnt method is more blurred and has a nonzero background level in some range gates. The Fourier and Capon images have severe artifacts that could be due to the large beam width and grating lobes. However, we suspect that this could also be due to a miscalibration with the phase measurements or antenna locations. MaxEnt and compressed sensing are affected differently because the data format for these methods was provided pre-calibrated. Additionally, MaxEnt and compressed sensing have more freedom to deviate from the data to correct any possible miscalibration. 3.2.4 Conclusion Ionospheric radar imaging is an underdetermined problem, so it is necessary to incorporate prior information in order to recover a solution. In this chap- ter, we showed how compressed sensing can be used to invert these data. Simulations using a simple Gaussian image suggest that compressed sensing 44 Figure 3.8: An image of equatorial spread-F recovered using four inversion methods. Data is from the Jicamarca Radio Observatory, year 2011, day 277, 22:18:30 LT. can outperform MaxEnt, Capon’s method, and the basic Fourier inversion, which are the popular current methods for radar imaging. This result appears to hold for most values of SNR. A more realistic forward and error model will be necessary to corroborate these results. The application to real data fur- ther suggests that higher resolution is possible with compressed sensing. One item for future work is to robustly impose a nonnegativity constraint, since scattered power cannot be negative. While there are some results regard- ing nonnegative compressed sensing, to our knowledge, there are no known algorithms for an arbitrary sparsity basis [43]. We also wish to investigate whether compressed sensing theory can aid in not only the analysis, but also the design of radar imaging experiments. The optimal antenna layout may 45 be the one yielding maximum incoherence with the sparsity basis. Radar imaging is one example of how compressed sensing can be used for spatial electromagnetic imaging. In this case, our instrument was already built, and we use compressed sensing to improve our image reconstruction. We recognized our radar imager as a compressive sensor. However, a powerful impact of compressed sensing is to inform alternative instrument designs which optimize the data acquisition and the image reconstruction together, resulting in more efficiency in both steps. In the next chapter, we will discuss such an instrument and its application to optical imaging of airglow layers in the ionosphere. 46 CHAPTER 4 OPTICAL AIRGLOW IMAGING WITH COMPRESSED SENSING In this chapter we review the concept of the single-pixel camera, an optical imaging instrument inspired by compressed sensing. This instrument can be applied to image irregularities and wave structure in naturally-occurring airglow emissions. A problem with conventional CCD-based imagers is that sensitive detectors are expensive and only viable at visible wavelengths. By replacing the CCD with a single photodiode, the single-pixel camera enables imaging for a larger range of wavelengths and at lower cost compared to conventional imagers. We present preliminary simulations of the single-pixel camera observing a typical airglow image, and suggest ideas for future work. 4.1 Airglow Imaging Background The aurora is one well-known phenomenon resulting from natural optical emission in the ionosphere. It is a result of photon emission when atoms transition from an excited state to a ground state. These atoms are excited by collisions with solar wind particles precipitating down along magnetic field lines. However, ionospheric photon emission is not constrained to the polar regions. Indeed, at all latitudes there exist excited atoms and therefore optical emissions, referred to as airglow. Instead of particle precipitation, the source of airglow is ultraviolet radiation from the sun that ionizes neutral particles. As these ionized particles recombine over the course of the night, they create excited atoms that emit photons, resulting in airglow [44]. This airglow offers a natural probe for ionospheric measurements. The interplay of neutral and plasma density profiles constrains particular emissions to certain altitude ranges. For example, the dissociative recom- bination of O+2 at 630.0 nm occurs almost entirely between 200 and 350 km. However, the majority of this emission occurs near 250 km, so a com- 47 mon approximation is to assume that the airglow layer is a thin shell [44]. Under this assumption, an image from a ground-based imager provides a two-dimensional map of airglow emission in the latitude-longitude plane. Of course, this assumption loses validity for low elevation angle observations, for which the thickness of the airglow layer leads to a blurring of the horizontal structure. If we are imaging the airglow layer during equatorial spread-F conditions, the airglow structures are known to map along magnetic field lines, and therefore, the latitudinal direction in the images corresponds to the altitudinal direction at the magnetic equator [45]. Thus, much like in Chapter 3, we obtain a two-dimensional altitude- longitude image. However, the physical interpretation of the image differs from Chapter 3, in which we described radar imaging of 3m-scale irregulari- ties. In the airglow case, the emission is mostly, but not entirely, determined by electron density. Variations in electron density are associated with polar- ization electric fields. Large, km-scale features of this field map efficiently along magnetic field lines. Therefore, the image we obtain with optical imag- ing is of the large, km-scale features of the electron density. Of course, airglow imaging is useful for more than just equatorial spread-F observations. It has a rich history of productive measurements in the both the mesosphere and thermosphere. These data provide a spatial context for studying the dynamics of waves and instabilities. In addition to equatorial spread-F, they have contributed significantly to our understanding of gravity wave propagation, for example [3, 46]. 4.1.1 Existing Instrumentation Before the advent of CCDs, airglow imaging was either performed with film or by spatially scanning a single photodetector. With film, the interpreta- tion of absolute intensity was difficult, and with scanning, the separation of temporal and spatial structure was difficult. The use of CCDs bridged this gap and initiated a new era in airglow imaging with unprecedented spatial and temporal resolution [3]. The physical model for a typical CCD-based airglow imager is essentially captured by the discussion in Section 2.1.5 and Figure 2.2. Using a lens, the desired image, f(φ), is well-approximated by the signal on the image plane. 48 Sampling this signal corresponds to sampling f(φ). Using a CCD, this signal is directly sampled. To use the language developed in Section 2.1.6, the test functions are delta functions, which means that the measurement matrix, H, is the identity matrix. This greatly simplifies signal reconstruction, since we can create an image directly from the measurements. However, using a CCD has several disadvantages. The primary disadvan- tage is cost. To achieve the sensitivity needed for observations at 630.0 nm, for example, a CCD costs around $25,000 (based off the Andor iKon-M 934 CCD). This cost is a hindrance to obtaining a global understanding of iono- spheric dynamics, which requires many imagers deployed around the world. Also, while silicon has a reasonable quantum efficiency at visible wavelengths, the quantum efficiency falls off in the infrared band. To make infrared mea- surements, we would need to use other materials, which can make arrays such as CCDs prohibitively expensive. To make low-cost measurements of infrared airglow emissions, CCDs are not a viable option. Another disadvantage of CCDs is evident when one considers the widespread success of image compression. In Figure 4.1, we show a rudimentary image compression example. In the left panel is a typical airglow image where the stars have been removed. To obtain the compressed version, we perform a wavelet transform using the Daubechies-4 wavelets, set to zero all but the largest 5% of the wavelet coefficients, and transform back. The resulting image is shown in the right panel. This compressed image retains most of the signal structure, including the correct absolute values, and even arguably provides some denoising. We are tempted to question the necessity of obtain- ing so many measurements of f(φ) when we could obtain a virtually identical image with many fewer pieces of data. Compressed sensing, introduced in Chapter 2, offers a useful framework with which to address this concern. Indeed, using compressed sensing for op- tical imaging has inspired the so-called single-pixel camera. By replacing the CCD with a single photosensor, this single-pixel camera can reduce the cost of conventional airglow imagers and, because photosensors can be sensitive to the infrared, allow us access to spatial information at these wavelengths. In the remainder of this chapter we discuss this instrument and present some simulations pertaining to airglow imaging. 49 CC D Co un ts 800 1000 1200 1400 1600 CC D Co un ts 800 1000 1200 1400 1600 Figure 4.1: (Left) A typical airglow image showing the optical signature of Equatorial Spread F. (Right) The compressed version of the image, using the largest 5% of the wavelet coefficients. 4.2 Compressive Single-Pixel Airglow Imager The intuition from Figure 4.1 suggests that if we were able to measure the important wavelets directly, then we could faithfully reconstruct the image with 20 times fewer measurements than a CCD. However, compressed sensing theory tells us that the optimal strategy is the exact opposite. We do not know ahead of time which wavelet coefficients will be important, so instead of measuring in the wavelet domain, we should measure in an incoherent domain. Although there exist many choices, one example used in practice is a domain consisting of test functions with random Bernoulli 0/1 entries, which is incoherent with wavelet domains [16]. Figure 4.2: The signal path of the single-pixel camera. After [24]. Reproduced with permission of SPIE and the authors. 50 The single-pixel camera, originally developed in 2006, is one implementa- tion of these ideas [24]. The architecture is shown in Figure 4.2. The initial portion of the signal path is identical to the CCD-based imager, but instead of a CCD, we place a digital micromirror device (DMD) in the image plane. A DMD is an array of programmable pivoting mirrors. The mirrors can be independently commanded to reflect light towards a photodiode, or away from it (represented by white and black squares in Figure 4.2). The DMD is the primary component of digital light projection (DLP) chips, which are an integral part of modern digital projectors and are thus widely available. The DMD pattern (the “mask”) determines which pixels get measured and which ignored. The second lens focuses the reflected pixels into the photodi- ode, which records the intensity for this mask. After recording, the DMD is commanded to a new mask, and the process repeated. When enough mea- surements have been gathered, a nonlinear compressed sensing inversion is performed to estimate the image from these measurements. Each measurement is simply the sum of intensities of certain regions of our image. These regions correspond to the locations of the reflective mirrors and can be thought of as the “pixels” of the image we wish to recover. If we index the N pixels in our image into a vector, f , we can model a measurement as, for example: gm = [ 1 0 0 1 0 1 0 . . . 0 1 ] f (4.1) Stacking M of these measurements into a vector, g, gives us our generalized forward model from Section 2.1.6, Equation (2.35): g = H∗f (4.2) where each column of the measurement matrix, H, corresponds to a DMD mask. The reconstruction is performed by solving the Basis Pursuit Denoise prob- lem from Section 2.2.3 (Equation (2.56)): minimize : ‖s‖1 subject to : ‖g −H∗Ψs‖22 ≤ 2 (4.3) where Ψ is the sparsity basis (e.g., wavelets), and s is the transformation of f into this basis, so f = Ψs. 51 An advantage of the single-pixel camera is the “sample first, ask questions later” philosophy [24]. Incoherent measurements are universal, which means that a string of measurements can be acquired without regard to how they will be used in the future. No knowledge of the signal details is necessary to design and run the instrument. Upon reconstruction, the important decisions are made: setting the acquisition time, choosing a sparsity basis, and choosing an algorithm with which to invert the data. This is in opposition to current imaging systems, where the important decisions are often made in the design phase. It is assumed that the image f(φ) is constant over the time interval the measurements are being acquired. If f(φ) changes significantly from one mask to the next, then (4.2) would no longer provide a valid forward model since each measurement would correspond to a different f . CCD-based imag- ing does not suffer from this problem since all measurements are taken si- multaneously. However, the advantage of single-pixel imaging is that each measurement is a sum of pixels and therefore has a higher SNR, requiring a shorter exposure time per measurement. As we show below, as long as M < N/2, then the total exposure time for the single-pixel camera is actu- ally shorter than that for a CCD. Suppose we wish to ensure that the SNR of each measurement is equal to the SNR of one CCD pixel. We assume that Poisson noise dominates over read noise and dark current and that readout time is zero. Poisson noise has an SNR of: SNRPoisson = Pt√ Pt = √ Pt (4.4) where P is the incident photon rate, and t is the exposure time per mea- surement. We denote the incident photon rate for a CCD pixel as P0. For a single-pixel camera, if we assume that half the pixels in the mask are reflected into the photodiode, the photon rate is N 2 P0. Equating the SNRs, we find: SNRSPC = SNRCCD√ N 2 P0 tSPC = √ P0 tCCD tSPC = 2 N tCCD (4.5) The total exposure time for the single-pixel camera is given by MtSPC, where 52 M is the number of measurements used in a single inversion. The total exposure time for the CCD is simply tCCD. Thus, to ensure that the exposure time for the single-pixel camera is less than for the CCD, we must have: M tSPC < tCCD M < N 2 (4.6) As we increase M , we have more measurements of our scene and thus recon- struct a more accurate image with a better spatial resolution. However, a larger M decreases our temporal resolution. As we show above, the “break- even point” for temporal resolution is at M = N/2. This is the point at which the exposure times for CCD and single-pixel camera are equal. The question is whether this many measurements can provide a suitable spatial resolution. To answer this question we must resort to simulations, presented in the next section. Before moving on, we note that in practice this bound is probably im- proved, due to two effects. First, the nonlinear inversion used for the single- pixel camera tends to suppress noise for sparse images, so one must ques- tion whether setting the SNRs equal is the appropriate criterion. Second, a quantum efficiency of 100% was assumed. In reality, we can obtain higher quantum efficiencies with photodiodes than with CCDs. Both of these effects will reduce the necessary exposure time of the single-pixel camera compared to a CCD and thus improve upon the simple analysis above. 4.3 Simulations As discussed in Chapter 2, it is difficult to theoretically prove the performance of compressed sensing techniques when applied to real problems. Thus, we must judge these techniques on the basis of simulation. For our simulations, we must construct a truth image, choose a measurement matrix, H, add noise to the measurements, and invert the data to create an image. 53 4.3.1 Procedure Although we could design an artificial truth image, we choose to create a truth image based off of data, in order to capture the complex details of realistic airglow images. We start with the raw 512 x 512-pixel image from an imager in Cajazeiras, Brazil, taken November 6, 2010 using a 630.0-nm filter. This image is shown in Figure 4.3. For an initial proof of concept, we remove the stars and increase the pixel size by a factor of 4 in each dimension. This downsampling is used to speed up the inversion and reduce noise in the image. The resulting 128 x 128-pixel image is shown in Figure 4.4, which will be our truth image. CC D Co un ts 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 Figure 4.3: The raw image that is used to generate the truth image for the simulation (512 x 512 pixels). CC D Co un ts 1.2 1.4 1.6 1.8 2 2.2 2.4 x 104 Figure 4.4: The truth image used in the simulation (128 x 128 pixels). Our next choice is to design the pixel masks which compose the measure- ment matrix H. Although many choices are possible, there are a few criteria that we would like H to satisfy. First, to ensure that each measurement sees 54 roughly the same incident power, we require that each mask contain N/2 ones and N/2 zeros. Second, we need reasonable incoherence with wavelets. Few results exist regarding the incoherence of arbitrary matrices, but since the wavelets are well-localized in space and frequency, we would do well to pick measurement atoms that are spread-out in both space and frequency. Third, it would be advantageous for each measurement to provide as much new information as possible. In the language of linear algebra, we wish the measurements to be orthogonal. Finally, because two-dimensional image processing often requires large N on the order of 105, storing the N ×M ma- trix H in memory is often impossible. Thus, we require a matrix for which multiplication can be implemented as a fast transform. For these reasons, we do not use completely random matrices as above. In- stead, we choose to construct H by choosing random columns of a Hadamard matrix, which have all of the above properties, with the possible exception of incoherence. It has been reported that some groups have found Hadamard matrices useful for related problems [24]. A Hadamard matrix of size N = 32 is shown in Figure 4.5. An attractive feature of the Hadamard transform is its fast O(N logN) implementation. The wavelet transform is even faster, O(N). Indeed, we find that even with its fast implementation, the Hadamard transform is our computational bottleneck, using over 90% of the computa- tional resources of the inversion, so it is important to choose an H with an efficient implementation. 5 10 15 20 25 30 5 10 15 20 25 30 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure 4.5: A 32× 32 Hadamard matrix. One problem with the Hadamard matrix is that it contains negative entries, which are not physically realizable with the proposed architecture. How- 55 ever, there exists a simple transformation that allows us to utilize the fast Hadamard transform. Simply put, we replace all instances of −1 with 0. Let us denote the (−1, 1) Hadamard matrix H˜ and the (0, 1) Hadamard matrix H. We acquire measurements, g, using H. These measurements are not orthogonal and they do not allow us to use the fast implementation. We can correct this by the simple transformation: g˜j = 2gj − F (4.7) where gj denotes the j th element of g, and:. F = N−1∑ i=0 fi (4.8) We show why this transformation is effective by expanding the right-hand side: g˜j = 2〈hj , f〉 − 〈1 , f〉 g˜j = 〈2 hj − 1 , f〉 (4.9) where the brackets denote an inner product and 1 is a vector of ones. We recognize that the vector 2 hj − 1 is simply the vector hj where 0 has been replaced with −1. In other words, it is h˜j, so: g˜ = 〈h˜j , f〉 (4.10) which shows that our transformed measurements are related to the image through the (1,−1) Hadamard matrix. By transforming our measurements with (4.7), it is as if we acquired them using H˜, which has a fast transform. Recognizing that the vector 1 is simply the first column of H, we can write this transformation succinctly in matrix notation as: g˜ = Tg (4.11) and H˜ = TH (4.12) 56 where the transformation matrix T is: T =  1 0 0 0 . . . 0 0 −1 2 0 0 . . . 0 0 −1 0 2 0 . . . 0 0 −1 0 0 2 . . . 0 0 ... ... ... ... . . . ... ... −1 0 0 0 . . . 2 0 −1 0 0 0 . . . 0 2  (4.13) To summarize using matrix notation, the actual measurements use the (0, 1) Hadamard matrix and obey: g = Hf (4.14) where, for simplicity, we used the Hadamard matrix property that H∗ = H. To obtain the transformed measurements, we multiply both sides by T: Tg = THf g˜ = H˜f (4.15) which gives us the formulation that we can use in compressed sensing algo- rithms. We simulate the actual measurements, g, and add noise. We assume that Poisson noise dominates over read noise and dark current. With photon counts on the order of 104, we can approximate the Poisson distribution as Gaussian. Therefore, to the measurement gj is added a Gaussian random variable with mean zero and variance gj. We assume that the noise in g is uncorrelated. To recover the image, we transform to g˜ and solve (4.3), the Basis Pursuit Denoise problem. For two-dimensional problems, it is important to use fast algorithms. The fastest we were able to find as of yet is Orthogonal Matching Pursuit (OMP) [47]. We assume the magnitude of the noise, , is known exactly. We use the 2-D Daubechies wavelets for the sparsity basis, Ψ, but have found similar results using other wavelets. 57 4.3.2 Preliminary Results The remaining design parameter is to choose the number of measurements, M . Here, we show results for different values of what we call the “sampling factor”: α = M N (4.16) If there were no noise and we chose α = 1, we would have a square system of equations, and the reconstruction could be achieved with a simple matrix inversion. However, we are always limited by noise, and the sparsity of our image indicates that choosing such a large α is likely a waste of measurements. As we saw above, to a first approximation, choosing α = 0.5 will match the temporal resolution of a CCD. Truth CC D Co un ts 1 1.5 2 2.5 x 104 Recovered, α=0.5 CC D Co un ts 1 1.5 2 2.5 x 104 Figure 4.6: (Left) The original truth image. (Right) The recovered image using a sampling factor α = 0.5. Figure 4.6 shows an example reconstruction using α = 0.5. The left panel shows the true image, and the right panel shows the recovered image. While the overall structure is discernible, we see small-scale artifacts, some of which are clearly due to single wavelets. We define the relative reconstruction error between the true image, f , and the reconstructed image, fˆ , as: relative error (%) = 100 ‖f − fˆ‖2 ‖f‖2 (4.17) 58 The inversion shown in Figure 4.6 has a relative error of 3.1%. This can vary based on the particular noise realization. Despite the artifacts, it is promising that such a small error is possible with a problem that is both severely underdetermined and also noisy. Indeed, Poisson noise creates a relative error in the measurements, g˜, of 20%. However, the level of error in this reconstruction is likely not low enough to support detailed airglow measurements such as drift velocity, although it may be good enough to study simple features such as bubble occurrence. A more detailed analysis of the usability of these images is let for future work. Truth CC D Co un ts 1 1.5 2 2.5 x 104 Recovered, α=0.3 CC D Co un ts 1 1.5 2 2.5 x 104 Figure 4.7: (Left) The original truth image. (Right) The recovered image using a sampling factor α = 0.3. Inversions with much smaller α typically fail to reconstruct important sig- nal features. Shown in Figure 4.7 is such an example where α = 0.3 and the relative error is 11.6%. The outline of the aperture is discernible, but there are virtually no distinguishable signal features. It appears that many of the initial wavelet atoms are being wasted in faithfully reconstructing the outline of the image, suggesting that an alternative, specialized inversion algorithm may be worth developing. Inversions with larger α recover almost all of the signal features. Such an example is shown in Figure 4.8, for which α = 0.7 and the relative recon- struction error is 1.1%. Although some small-scale artifacts are still evident, the reconstruction appears suitable for most of the typical airglow analyses such as bubble drift velocity, growth, and occurrence rate. In order to better understand the relationship between α and the recon- struction error, we run many inversions for different values of α between 0.2 59 Truth CC D Co un ts 1 1.5 2 2.5 x 104 Recovered, α=0.7 CC D Co un ts 1 1.5 2 2.5 x 104 Figure 4.8: (Left) The original truth image. (Right) The recovered image using a sampling factor α = 0.7. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 2 4 6 8 10 12 14 16 18 Sampling Factor, α R el at iv e Er ro r ( %) Figure 4.9: A statistical analysis of airglow inversions for different sampling factors. The circles represent the average reconstruction error, and the vertical bars indicate ±1 standard deviation. and 0.8. For each value, we run 10 simulations with random realizations of noise and randomly chosen columns of the Hadamard matrix, and we report the statistics of the relative reconstruction error. In Figure 4.9, we show the mean and standard deviation of the relative error for each value of α. The circle indicates the mean and the error bar shows plus and minus one standard deviation. For low α, the signal is not reconstructed well, and the relative error is consistently above 10%. At α = 0.4, the error begins de- creasing but is still around 10%. Sampling factors of 0.5 and 0.6 exhibit a wide range of relative errors. Closer examination reveals that there appears to be a bimodal distribution, with a cluster of low errors around 2 − 3%, 60 and a cluster around 10%. The cause of this clustering is uncertain but may indicate that randomly choosing columns of H is not optimal, and we may find that certain sets of columns work better than others. At α = 0.7, all except one of the ten reconstructions are around 2% error. At α = 0.8, the reconstructions are consistently around 1.5%. 4.4 Future Work The inversions above show promise for eliminating the CCD in airglow im- agers, which would reduce cost and allow imaging at infrared wavelengths. However, a number of simplifying assumptions were made to achieve these preliminary results. For one, we assumed no read noise or dark current. A more realistic forward model will have to include these effects for a realistic photodiode. Moreover, we must carefully account for losses associated with lenses and DMD reflections. It was also assumed that the error magnitude, , was known exactly. In a realistic system, we will have to estimate this in a bootstrapped manner from the data, or use some a priori estimate. Another option is to treat  as a general regularization parameter and use any number of ad hoc methods for regularization parameter selection such as Generalized Cross Validation or L-curve regularization, but these can be computationally expensive. Another simplifying assumption was the removal of stars from the truth image. A real image has stars which can contribute to a large portion of the signal energy. In designing our algorithm, we will have to be careful to use the available signal energy to reconstruct airglow structure instead of wasting energy reconstructing minutiae of the star field. One possible solution would be to use a hybrid sparsity basis comprising wavelets and delta functions, to reconstruct airglow and stars, respectively. This hybrid basis is known as an overcomplete dictionary. This procedure could have the added benefit of improving the reliability of spatial calibration and star removal, since the star locations are effectively determined during the reconstruction. In general, the correct choice for the sparsity basis, Ψ, is still unknown and finding general- purpose bases is a topic of current research. A nonlinear extension such as Total Variation (TV) minimization could be employed, if a robust and fast algorithm is found. 61 In addition to Ψ, the optimal choice for H is unknown. Although attractive for certain reasons discussed above, the Hadamard basis may not be the best choice. It has been proven to be maximally incoherent with the Dirac basis [16], but the nature of its incoherence with wavelets is not known. By looking at Figure 4.5, we see that while no basis functions are localized in space, some appear to be localized in frequency, which may suggest some degree of coherence with the wavelets. Using a more incoherent measurement basis would allow for more accurate images to be reconstructed with fewer measurements. Ideally, we would like to find a measurement basis that is both incoherent with our sparsity basis and implementable with a fast transform. Additionally, the physical implementation of H could be changed. Instead of reflecting off of a DMD, we might use a MEMS-based shutter array in the image plane or aperture plane, a technique related to aperture coding. This may reduce reflective losses associated with the DMD mirrors. The running time of these inversions is a problem. Even for the 4 × 4- pixel reduced resolution version shown here, an image that only takes 90 seconds to acquire takes 10-30 minutes to reconstruct on an Intel Core 2 Duo processor. This suggests a full-resolution inversion time of several hours. We would like to improve the running time by finding more efficient algorithms or developing specialized methods. One possibility is to break the image into blocks, acquiring and reconstructing each block separately in a manner similar to the original JPEG standard. The measurement transformation T allows us to use the fast transform, but it also creates correlated errors. The implicit assumption in (4.3) is that errors are uncorrelated between measurements. This effect is probably small, but it should be accounted for. Another assumption is that the image is stationary. A real airglow image has temporal structure. It will be necessary to model this structure and sim- ulate our instrument in time in order to determine if the single-pixel camera is a viable option for motion capture of typical airglow images. This will create a lower limit on the temporal resolution and, consequently, an upper limit on the spatial resolution. The limits of and tradeoffs between temporal and spatial resolution will be important to estimate accurately before at- tempting to realize this device. A possible extension would be to treat the imaging problem as three-dimensional, with two spatial and one temporal di- mension. This would allow us to leverage temporal as well as spatial sparsity, 62 thereby improving the quality of the reconstruction. However, the present simulations suggest that this idea is not computationally feasible with current technology. 63 CHAPTER 5 CONCLUSION As we have shown, compressed sensing ideas are promising for ionospheric imaging applications. These ideas take advantage of the fact that many nat- ural images are inherently sparse when expressed in certain bases, a fact that underlies the widespread success of image compression techniques such as JPEG. This inherent sparsity reduces the number of necessary measure- ments in an imaging context, and we reviewed surprising theoretical results pertaining to this recovery in Chapter 2. In this chapter, we also described the general electromagnetic imaging problem and how compressed sensing can be used to solve it. Chapters 3 and 4 apply these ideas to radar imaging and airglow imaging. For radar imaging, compressed sensing offers us a better way to use our un- dersampled data, suggesting a sparsity-based reconstruction technique. We showed simulation results in which this technique returns better reconstruc- tions than state-of-the-art methods. We also showed results of inverting data collected at the Jicamarca Radio Observatory during equatorial spread- F conditions. For airglow imaging, compressed sensing offers us a new way to design instrumentation to minimize the number of redundant measure- ments, effectively transferring complexity from expensive sensors to low-cost computation. The proposed instrument is the single-pixel camera, which di- rectly gathers compressed data, combining the steps of sensing and compres- sion. We showed preliminary simulation results that suggest the single-pixel camera may be feasible for airglow imaging, allowing us to develop low- cost sensors for much-needed global distribution and also image at infrared wavelengths. However, much work remains to be done before designing a prototype. 64 REFERENCES [1] H. G. Booker and H. W. Wells, “Scattering of radio waves by the f-region of the ionosphere,” Terrestrial Magnetism and Atmospheric Electricity, vol. 43, no. 3, pp. 249–256, 1938. [Online]. Available: http://dx.doi.org/10.1029/TE043i003p00249 [2] D. L. Hysell, “An overview and synthesis of plasma irregularities in equatorial spread F,” Journal of Atmospheric and Solar-Terrestrial Physics, vol. 62, no. 12, pp. 1037–1056, Aug. 2000. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S136468260000095X [3] J. J. Makela, “A review of imaging low-latitude ionospheric irregularity processes,” Journal of Atmospheric and Solar-Terrestrial Physics, vol. 68, no. 13, pp. 1441–1458, Sep. 2006. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S136468260600109X [4] M. C. Kelley, J. J. Makela, O. de La Beaujardie`re, and J. Retterer, “Convective Ionospheric Storms: A Review,” Reviews of Geophysics, vol. 49, no. 2, p. RG2003, June 2011. [Online]. Available: http://www.agu.org/pubs/crossref/2011/2010RG000340.shtml [5] “The National Space Weather Program Strategic Plan,” National Space Weather Program Council, Tech. Rep. June, 2010. [Online]. Available: http://www.ofcm.gov/nswp-sp/fcm-p30.htm [6] R. Blahut, Theory of Remote Image Formation. Cambridge University Press, 2004. [Online]. Available: http://books.google.com/books?id= d8FMlHewYp0C [7] A. Thompson, J. Moran, and J. George W. Swenson, Interferometry and Synthesis in Radio Astronomy. John Wiley & Sons, 2008. [Online]. Available: http://books.google.com/books?id=AwBN5bpuEU0C [8] R. F. Woodman, “Coherent radar imaging: Signal processing and statistical properties,” Radio Science, vol. 32, no. 6, p. 2373, 1997. [Online]. Available: http://www.agu.org/pubs/crossref/1997/ 97RS02017.shtml 65 [9] Z. Feng, E. Kudeki, R. Woodman, J. Chau, and M. Milla, “F region plasma density estimation at Jicamarca using the complex cross-correlation of orthogonal polarized backscatter fields,” Radio science, vol. 39, no. 3, p. RS3015, 2004. [Online]. Available: http://www.agu.org/pubs/crossref/2004/2003RS002963.shtml [10] E. Jaynes, “The Relation of Bayesian and Maximum Entropy Methods,” in Maximum-Entropy and Bayesian Methods in Science and Engineering (Vol. 1), 1988, pp. 25–29. [11] E. Cande`s, “Compressive sampling,” in Proceedings of the International Congress of Mathematicians, 2006. [Online]. Available: http://dialnet. unirioja.es/servlet/dcart?info=link\&codigo=3140640\&orden=237292 [12] D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs\ all.jsp?arnumber=1614066 [13] E. J. Cande`s, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, Aug. 2006. [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1002/ cpa.20124/abstract [14] E. Cande`s and M. Wakin, “An introduction to compressive sampling,” Signal Processing Magazine, IEEE, no. March 2008, pp. 21–30, 2008. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs\ all.jsp? arnumber=4472240 [15] J. Romberg, “Imaging via Compressive Sampling,” IEEE Signal Process- ing Magazine, vol. 25, no. 2, pp. 14–20, Mar. 2008. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs\ all.jsp?arnumber=4472239 [16] E. Cande`s and J. Romberg, “Sparsity and incoherence in compressive sampling,” Inverse Problems, vol. 23, no. 3, pp. 969–985, June 2007. [Online]. Available: http://stacks.iop.org/0266-5611/23/i=3/a=008? key=crossref.bdb1d0e3390f5616a4395b50273198a9 [17] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Har- monic Analysis, vol. 26, no. 3, pp. 301–321, May 2009. [Online]. Avail- able: http://linkinghub.elsevier.com/retrieve/pii/S1063520308000638 [18] J. Ho¨gbom, “Aperture synthesis with a non-regular distribution of interferometer baselines,” Astronomy and Astrophysics Supplement Series, vol. 15, pp. 417–426, 1974. [Online]. Available: http: //adsabs.harvard.edu/full/1974A\%26AS...15..417H 66 [19] I. Daubechies, R. DeVore, M. Fornasier, and C. S. Gu¨ntu¨rk, “Iteratively reweighted least squares minimization for sparse recovery,” Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp. 1–38, Jan. 2010. [Online]. Available: http://www.stormingmedia.us/ 01/0158/A015825.html [20] E. van den Berg and M. P. Friedlander, “Probing the Pareto Frontier for Basis Pursuit Solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, Jan. 2009. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/080714488 [21] S. Wright, R. Nowak, and M. Figueiredo, “Sparse Reconstruction by Separable Approximation,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2479–2493, July 2009. [Online]. Available: http: //ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4799134 [22] S. Becker, J. Bobin, and E. J. Cande`s, “NESTA: A Fast and Accurate First-Order Method for Sparse Recovery,” SIAM Journal on Imaging Sciences, vol. 4, no. 1, pp. 1–39, Jan. 2011. [Online]. Available: http://epubs.siam.org/doi/pdf/10.1137/090756855 [23] S. Ramani and J. A. Fessler, “An accelerated iterative reweighted least squares algorithm for compressed sensing MRI,” in 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, 2010. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs\ all.jsp?arnumber=5490364 pp. 257–260. [24] D. Takhar, J. N. Laska, M. B. Wakin, M. F. Duarte, D. Baron, S. Sarvotham, K. F. Kelly, and R. G. Baraniuk, “A new compressive imaging camera architecture using optical-domain compression,” in Proc. of Computational Imaging IV at SPIE Electronic Imaging, C. A. Bouman, E. L. Miller, and I. Pollak, Eds., Feb. 2006. [Online]. Available: http://proceedings.asmedigitalcollection.asme.org/ proceeding.aspx?articleid=728899 pp. 509–10. [25] R. F. Woodman, “Inclination of the geomagnetic field measured by an incoherent scatter technique,” Journal of Geophysical Research, vol. 76, no. 1, p. 178, 1971. [Online]. Available: http://www.agu.org/pubs/ crossref/1971/JA076i001p00178.shtml [26] D. T. Farley, H. M. Ierkic, and B. G. Fejer, “Radar interferometry: A new technique for studying plasma turbulence in the ionosphere,” Journal of Geophysical Research, vol. 86, no. A3, p. 1467, 1981. [Online]. Available: http://www.agu.org/pubs/crossref/1981/JA086iA03p01467. shtml 67 [27] E. Kudeki and F. Su¨ru¨cu¨, “Radar interferometric imaging of field- aligned plasma irregularities in the equatorial electrojet,” Geophysical Research Letters, vol. 18, no. 1, p. 41, 1991. [Online]. Available: http://www.agu.org/pubs/crossref/1991/90GL02603.shtml [28] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. htm?arnumber=1449208 [29] R. D. Palmer, S. Gopalam, T.-Y. Yu, and S. Fukao, “Coherent radar imaging using Capon’s method,” Radio Science, vol. 33, no. 6, p. 1585, Nov. 1998. [Online]. Available: http://www.agu.org/pubs/crossref/ 1998/98RS02200.shtml [30] J. Ables, “Maximum entropy spectral analysis,” Astronomy and Astrophysics Supplement Series, 1974. [Online]. Available: http: //adsabs.harvard.edu/full/1974A\%26AS...15..383A [31] B. R. Frieden, “Restoring with maximum likelihood and maximum entropy.” Journal of the Optical Society of America, vol. 62, no. 4, pp. 511–8, Apr. 1972. [Online]. Available: http://www.ncbi.nlm.nih.gov/ pubmed/5019595 [32] S. F. Gull and G. Daniell, “Image reconstruction from incomplete and noisy data,” Nature, vol. 272, 1978. [Online]. Available: http: //www.nature.com/nature/journal/v272/n5655/abs/272686a0.html [33] E. Jaynes, “On the rationale of maximum-entropy methods,” Proceedings of the IEEE, vol. 70, no. 9, pp. 939–952, 1982. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. htm?arnumber=1456693 [34] T. J. Cornwell and K. Evans, “A simple maximum entropy deconvolution algorithm,” Astronomy and Astrophysics, vol. 143, pp. 77–83, 1985. [Online]. Available: http://adsabs.harvard.edu/full/ 1985A\&A...143...77C [35] D. L. Hysell, “Radar imaging of equatorial F region irregularities with maximum entropy interferometry,” Radio Science, vol. 31, no. 6, p. 1567, 1996. [Online]. Available: http://www.agu.org/pubs/crossref/ 1996/96RS02334.shtml [36] D. L. Hysell and J. L. Chau, “Optimal aperture synthesis radar imaging,” Radio Science, vol. 41, no. 2, p. RS2003, 2006. [Online]. Available: http://www.agu.org/pubs/crossref/2006/ 2005RS003383.shtml 68 [37] R. Wilczek and S. Drapatz, “A high accuracy algorithm for maximum entropy image restoration in the case of small data sets,” Astronomy and Astrophysics, vol. 142, pp. 9–12, 1985. [Online]. Available: http://adsabs.harvard.edu/full/1985A\&A...142....9W [38] T.-Y. Yu, R. D. Palmer, and D. L. Hysell, “A simulation study of coherent radar imaging,” Radio Science, vol. 35, no. 5, p. 1129, Sep. 2000. [Online]. Available: http://www.agu.org/pubs/crossref/2000/ 1999RS002236.shtml [39] J. L. Chau and R. F. Woodman, “Three-dimensional coherent radar imaging at Jicamarca: comparison of different inversion techniques,” Journal of Atmospheric and Solar-Terrestrial Physics, vol. 63, no. 2-3, pp. 253–261, Jan. 2001. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1364682600001425 [40] D. T. Farley, “A theory of electrostatic fields in the ionosphere at nonpolar geomagnetic latitudes,” Journal of Geophysical Research, vol. 65, no. 3, p. 869, 1960. [Online]. Available: http://www.agu.org/ pubs/crossref/1960/JZ065i003p00869.shtml [41] E. van den Berg and M. P. Friedlander, “SPGL1: A solver for large-scale sparse reconstruction,” June 2007, http://www.cs.ubc.ca/labs/scl/spgl1. [42] M. Duarte, M. Davenport, D. Takhar, J. Laska, K. Kelly, and R. Baraniuk, “Single-Pixel Imaging via Compressive Sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 83–91, Mar. 2008. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs\ all.jsp? arnumber=4472247 [43] P. D. O’Grady and S. T. Rickard, “Compressive sampling of non-negative signals,” in 2008 IEEE Workshop on Machine Learning for Signal Processing. IEEE, Oct. 2008. [Online]. Available: http: //ieeexplore.ieee.org/xpls/abs\ all.jsp?arnumber=4685468 pp. 133–138. [44] J. J. Makela, “Midlatitude Ionospheric Studies Using the Global Posi- tioning System and Airglow Cameras,” Ph.D. dissertation, Cornell Uni- versity, 2003. [45] E. S. Miller, J. J. Makela, K. M. Groves, M. C. Kelley, and R. T. Tsunoda, “Coordinated study of coherent radar backscatter and optical airglow depletions in the central Pacific,” Journal of Geophysical Research, vol. 115, no. A6, p. A06307, June 2010. [Online]. Available: http://www.agu.org/pubs/crossref/2010/2009JA014946.shtml 69 [46] M. Taylor, “A review of advances in imaging techniques for measuring short period gravity waves in the mesosphere and lower thermosphere,” Advances in Space Research, vol. 19, no. 4, pp. 667–676, Jan. 1997. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/ S0273117797001610 [47] S. Becker, “Orthogonal matching pursuit for sparse recovery,” Dec. 2012, http://www.mathworks.com/matlabcentral/fileexchange/32402- cosamp-and-omp-for-sparse-recovery. 70