Humans process visual data in almost miracolous ways, it can adjust contrast and brightness, it can even recognize and track objects or faces to mention some of its capabilities, all of them in real time.
We do not know precisely how the brain does most of its analysis, however when finding the necessity to process images in computers some good abstractions can be made. Lets start with the most generalized definition of black and white images:
First we define @@0@@ x @@1@@ can be defined as:
This means that every point in the canvas is asigned with a value in the grey scale. Where 0 is white and 1 is black as seen in the following palette:
This definition is impossible for a computer to handle because of the discrete nature of computers, however it is the way to formalize further analysis and processing. To mention some examples, you are able to differentiate an image, or apply any mapping or filtering which could result in an interesting transformation.(Computer vision)
A definition which can be handled in a computer is made as follows:
All of it is only a discretization of the previous definition.
For example, lets say that @@5@@ and we choose random values for every @@6@@:
Which turns out to be a black and white @@0@@ pixel noisy image.
It gets a little bit more complicated for a color image, one definition corresponds to:
A color image @@1@@ of @@2@@ size corresponds to a real valued tensor or in other words a real multidimensional array of dimensions @@3@@. Again having @@4@@ and @@5@@ as the number of pixel rows and columns and @@6@@, in this case a value of 0 or 1 is not so easy to interpret as each of the entries of the last dimension corresponds to a scale of red, green or blue.
An easier way to understand this @@7@@ definition is having RGB pixels or @@8@@ matrix or "canvas".
With the following code you are able to interact with the RGB cube as you wish.
If you did not interact enough, some good intuition of the RGB cube space is:
Now that we know the basics of how images are represented lets try to load one. With the following code we are able to open most of the computer representation of images in this case @@5@@ which is the above definition with some metadata and (compression?) added.
the @@0@@ object returned by .imread() is an ndarray, which is just a multidimensonal array.
The following code prints the dimension sizes:
which correspond to row pixels, column pixels and RGB channel.
Now we know the basics of an image, we will try to do some easy processing with our image data. As some of you may know instagram allows users to upload preferrably square images, and recently it got a feature where you are able to post multiple images in a single post. If you are able to cut a panoramic image in multiple sequential squares you can upload it as a multiple image post and be able to slide it smoothly.
For our algorithm we will first find out how many complete squares can our original image be cut into. Then we are just slicing the original image in squares and fitting them into this individual squares. Each square will have side @@0@@.
If our panoramic image cant't be cut complete in perfect squares we will have a last image with the end of it as a rectangle. This rectangle will be fitted into a white square.
For this we are going to use the following code:
As you can see, most of the hard work was done by the @@0@@ objects which allows you to extract rectangular or square regions of our original image. The slice object is initialized as @@1@@, @@2@@ is the index for the start, @@3@@ for the end and @@4@@ helps you choose how dense your uniform sample is. You can also implement it as @@5@@ in some cases.
You can find the instagram result here.
In this section we are going to read a different image and we will try to extract its color "palette" whatever this means, I believe is somekind of human spectrography?, for sure it reveals the content of whatever atom is inside that. The color palette can be interpreted as a sample of representative colors from a given image, in this approach I will get these representative colors using the @@0@@ of the clusters labeled by the K-means algorithm.
In easy words the K-means algorithm starts by us manually choosing the number of clusters @@1@@ we want our data to be labeled by. In this case we are using the @@2@@ data of all our image, sometimes it can be interesting to label the @@3@@ data which will clusterize not only by color but also by the given x and y pixel position.
After choosing @@4@@ the algorithm will randomly select @@5@@ cluster centers and iteratively try to improve the @@6@@ clusterization. In each iteration the algorithm labels each @@7@@ point to its closest (euclidean distance) center, and after this recalculates each cluster center as the mean of the cluster. This is done until the new centers stop moving, or better stated, when each of the new centers is in a sufficently small neigbourhood of the previous centers.
This is done with the following code:
Which will again load an image and its current size. And then:
Which first chooses @@0@@ as the number of colors for the palette. And then run the K-means algorithm until it converges. To obtain the following palette:
As the data is labeled we just find out the median of each cluster as it may be more robust than the mean, and after some scaling, the palette is printed.
In the next block of code we will reconstruct the original image but painting it with only the palette colors.
It can be stated more clear with the following python code:
In this example we used a different kind of @@0@@, which is a boolean array for indexing each cluster, this slicer is then used to paint each of the image clusters, slices or segments with the color in the palette they had assigned.
Of course when the previous image is saved it uses the previous definition of an @@1@@ of @@2@@ dimensions, with this already clustered information we could develop a data structure with exactly the same information as that new image file but just by saving the assigned labels, the resulting color palette and the original @@3@@ dimensions. Some kind of topological RGB compression I guess? Luminosity order of the palette seems to be also an interesting information for reconstruction used later.
To transfer a sequential palette, one that keeps luminosity order, we will first get the luminosity order from our clusterized image.
In the following lines, I tried a way to order the palete by luminosity, if you have some understanding of the RGB colors, luminosity can be interpretted as the @@4@@ of your RGB vector. Specially the norm of the projection of your color vector to the @@5@@ vector. This process is basically computing the euclidean norms and using them to reorder the previous palete.
The following code will get a different palette and use it to reconstruct the image.
The code basically gets the new sequential palette and orders it with the previously extracted luminosity order. The following code is exactly the same previous reconstruction but with the transfered palette, I guess it to be some kind of transfer learning? Specially if you use palettes from graphic artists or so.
Again the first step will be reading the image in a
The following code gets the median RGB color of every column in the img slice given by:
then it creates a new image that will include the obtained gradient in the bottom of it.
Which appears to be similar to the
matplotlib color palette:
Basic image processing may be really easy with high level tools as photoshop, however understanding some basics for computer image handling may open your eyes to developing by yourself any filtering, analysis or transformation that you can imagine. I reccomend taking a course in computer vision to further develop your skills in image processing or just experiment with the already developed libraries there exist.