l19
play

L19 July 19, 2017 1 Lecture 19: Introduction to Computer Vision - PDF document

L19 July 19, 2017 1 Lecture 19: Introduction to Computer Vision CSCI 1360E: Foundations for Informatics and Analytics 1.1 Overview and Objectives In this lecture, well touch on some concepts related to image processing and computer vision.


  1. L19 July 19, 2017 1 Lecture 19: Introduction to Computer Vision CSCI 1360E: Foundations for Informatics and Analytics 1.1 Overview and Objectives In this lecture, we’ll touch on some concepts related to image processing and computer vision. By the end of this lecture, you should be able to • Read in and display any JPEG or PNG image using Scientific Python (SciPy) • Understand core image processing techniques such as thresholding, blurring, and segmen- tation • Recall some of the computer vision packages available in Python for more advanced image processing 1.2 Part 1: Computer Vision Whenever you hear about or refer to an image analysis task, you’ve stepped firmly into territory occupied by computer vision , or the field of research associated with understanding images and designing algorithms to do the same. In [1]: %matplotlib inline from sklearn.datasets import load_sample_image import matplotlib.pyplot as plt img = load_sample_image("flower.jpg") plt.imshow(img) Out[1]: <matplotlib.image.AxesImage at 0x118aef240> 1

  2. 1.2.1 Examples of Computer Vision You can probably name numerous examples of computer vision already, but just to highlight a couple: • Facebook and Google use sophisticated computer vision methods to perform facial recogni- tion scans of photos that are uploaded to their servers. You’ve likely seen examples of this when Facebook automatically puts boxes around the faces of people in a picture, and asks if you’d like to tag certain individuals. • Tesla Motors’ "Autopilot" and other semi-autonomous vehicles use arrays of cameras to cap- ture outside information, then process these photos using computer vision methods in order to pilot the vehicle. Google’s experimental self-driving cars use similar techniques, but are fully autonomous. • The subarea of machine learning known as "deep learning" has exploded in the last five years, resulting in state-of-the-art image recognition capabilities. Google’s DeepMind can recognize arbitrary images to an extraordinary degree, and similar deep learning methods have been used to automatically generate captions for these images. This is all to underscore: computer vision is an extremely active area of research and appli- cation! • Automated categorization and annotation of YouTube videos (identification of illegal con- tent?) 2

  3. fbook tesla 3

  4. oscars • Analyzing photos on your smartphones • Law enforcement facial recognition • Disabled access to web technologies • Virtual reality 1.2.2 Data Representations From the perspective of the computer, the simplest representation of an image is a large rectangu- lar array of pixels. Each pixel has some value that corresponds to a color (or intensity ). For example, in black and white (or grayscale ) images, pixels are typically represented by a single integer value. Full-color images are often represented in RGB (Red-Green-Blue) format, and their data struc- ture consists of three rectangular arrays of pixels, one for each color channel. In this example, representing Red-Green-Blue requires three matrices stacked on each other-- one for the red values, one for the green, and one for the blue. The number of rows is the height of the image, and the number of columns is the width of the image. Both grayscale and RGB image pixels tend to be represented by 8-bit unsigned integers that range from 0 (black) to 255 (white), but can also be represented by floating point values that range from 0 (white) to 1 (black). There are many other image formats and representations, but they tend to be variations on this theme. 1.2.3 Image Processing There are lots and lots of ways in which you can process and analyze your images, but in this lecture we’ll discuss three methods: thresholding , blurring/sharpening , and segmentation , though these all interrelate with one another. 4

  5. rgbarrays • Thresholding is the process by which you define a pixel threshold--say, the value 100--and set every pixel below that value to 0, and every pixel above that value to 255. In doing so, you binarize the image. • Blurring and sharpening are self-explanatory: you’ve probably used these tools in an im- age editor like Photoshop or GIMP before. Formally, blurring is the process of "averaging" nearby pixels values together, smoothing out hard boundaries. Sharpening does the oppo- site. • Segmentation is the process through which you divide your image up into logical pieces. Perhaps you’re segmenting out people from the rest of the image to perform facial recogni- tion, or you’re segmenting distinct cells from a microscope image. 1.3 Part 2: Loading and Manipulating Images Let’s jump into it and get our hands dirty, shall we? We’ll use the flower image we saw before. In [2]: plt.imshow(img) Out[2]: <matplotlib.image.AxesImage at 0x118c1a8d0> 5

  6. (recall the matplotlib method imshow that’s useful for visualizing images!) As with any data, it’s good a have a "feel" for what you’re dealing with. This is the "data exploration" step, and usually involves computing some basic statistics. Things like shape, value range, average, median...even the histogram of values ( distribution !) is useful information. In [3]: print("Image dimensions: {}".format(img.shape)) Image dimensions: (427, 640, 3) Remember our discussion of image formats. Image are basically NumPy arrays, so all the usual NumPy functionality is at your disposal here. Our image is 427 rows by 640 columns (height of 427, width of 640), and is in RGB format (hence the 3 trailing dimensions--one 427x640 block for each of the three colors). Let’s take a look at these three color channels, one at a time. In [4]: print("Min/Max of Red: {} / {}".format(img[:, :, 0].min(), img[:, :, 0].max())) print("Min/Max of Green: {} / {}".format(img[:, :, 1].min(), img[:, :, 1].max())) print("Min/Max of Blue: {} / {}".format(img[:, :, 2].min(), img[:, :, 2].max())) Min/Max of Red: 0 / 255 Min/Max of Green: 0 / 229 Min/Max of Blue: 0 / 197 Red seems to have the widest range, from the minimum possible of 0 to the maximum possible of 255. What about average, median, and standard deviation? 6

  7. In [5]: import numpy as np # Need this for computing median. print("Mean/Median (Stddev) of Red: {:.2f} / {:.2f} (+/- {:.2f})" \ .format(img[:, :, 0].mean(), np.median(img[:, :, 0]), img[:, :, 0].std())) print("Mean/Median (Stddev) of Green: {:.2f} / {:.2f} (+/- {:.2f})" \ .format(img[:, :, 1].mean(), np.median(img[:, :, 1]), img[:, :, 1].std())) print("Mean/Median (Stddev) of Blue: {:.2f} / {:.2f} (+/- {:.2f})" \ .format(img[:, :, 2].mean(), np.median(img[:, :, 2]), img[:, :, 2].std())) Mean/Median (Stddev) of Red: 55.13 / 4.00 (+/- 89.02) Mean/Median (Stddev) of Green: 73.58 / 61.00 (+/- 45.51) Mean/Median (Stddev) of Blue: 57.00 / 54.00 (+/- 33.23) Well, this is certainly interesting. The mean and median for Blue are very similar, but for Red they’re very different, which suggests some heavy skewing taking place. Let’s go ahead and look at the histograms of each channel, then! In [6]: import seaborn as sns fig = plt.figure(figsize = (16, 4)) plt.subplot(131) _ = plt.hist(img[:, :, 0].flatten(), bins = 25, range = (0, 255), color = 'r') plt.subplot(132) _ = plt.hist(img[:, :, 1].flatten(), bins = 25, range = (0, 255), color = 'g') plt.subplot(133) _ = plt.hist(img[:, :, 2].flatten(), bins = 25, range = (0, 255), color = 'b') Well, this certainly explains a few things! As we can see, the vast majority of pixels in the red channel are black! The green and blue channels are a bit more evenly distributed, though even with green we can see a hint of a second peak around the pixel value 150 or so. Hopefully this illustrates why even all these basic statistics can be misleading! We can even visualize the image using only the pixel values from one channel, one at a time. In [7]: fig = plt.figure(figsize = (16, 4)) plt.subplot(131) plt.title("Red") plt.imshow(img[:, :, 0], cmap = "gray") plt.subplot(132) 7

  8. plt.title("Green") plt.imshow(img[:, :, 1], cmap = "gray") plt.subplot(133) plt.title("Blue") plt.imshow(img[:, :, 2], cmap = "gray") Out[7]: <matplotlib.image.AxesImage at 0x11b154e10> And there you have it. As we can see for ourselves, the red channel tends to be either black (entire background) or pretty bright (the flower), where the green and blue channels are much more evenly spread. 1.3.1 Thresholding So how would a threshold work? Let’s start just with the green channel, for simplicity. We’ll use the median pixel value (61) as the threshold. In [8]: green_channel = img[:, :, 1] # pull out the green channel, just so we don't have to keep binarized = (green_channel > np.median(green_channel)) plt.imshow(binarized, cmap = "gray") Out[8]: <matplotlib.image.AxesImage at 0x11b78c2e8> 8

  9. Whether or not this is a "good" binarization is entirely dependent on what your data are and what you’re trying to do. But if we were, for example, trying to separate the flower from the background, this would be an ok start--as you can see, a lot of the background was selected along with the flower. Maybe we could try to use the mean as a threshold? The mean value for the green channel is larger than the median, so it may cut out some of that background. In [9]: binarized = (green_channel > green_channel.mean()) plt.imshow(binarized, cmap = "gray") Out[9]: <matplotlib.image.AxesImage at 0x11b73ce10> 9

Recommend


More recommend