Ekspozzer - create a panorama from a video, averaging a video stream

    Hi, Habrahabr!


    I must say right away: there is nothing phenomenal in the article. This article focuses on the developed "on the knee" program to create panoramas from video and temporal of th averaging video (frames). The program can also be used as a virtual slit camera. The article will be interesting to all those who are fond of video and image processing, as well as hic-art. A very simple program is a very interesting result. At the end of the article is a download link. Caution traffic!

    I noticed that a good and, most importantly, popular software is born not just like that, but from the emerging needs to solve a particular problem. I don’t know how others do, but it constantly happens to me. And this story is no exception.

    Once, at a funny graphic forum, I accidentally got into a dispute with one dude who claimed something like this: "It is impossible to get an image of an empty Red Square in the afternoon." Of course, the statement is wildly controversial: yes, people there dofig during the day, but you can ask everyone to leave. It is theoretically possible. A mere mortal, of course, will not succeed, however, if you are some kind of Utin, then by trying very hard and calling someone you need, you can organize something like that. And what does the “image” of the empty Red Square mean? The image is not a photograph, I can draw a freehand image on a piece of paper on a piece of paper. In general, I don’t remember all the details of the dispute over the past years, but it was wildly hot and we, damn bookworms, came to this clarified statement (approximately): “Because of the huge number of people on Red Square, it is impossible, using only photo and video equipment, to get a photorealistic image (photograph) of Red Square without a single person on it from a perspective standing in the warm tourist season (for example, summer) the full growth of a person, being a simple person without leverage over the government. ” Well, clarification! Literally legal. Now everything fell into place.

    The opponent argued that such a photograph can be taken only by resorting to Photoshop (or another image editing program) in order to remove all people from the Red Square photograph. This procedure is long and painstaking, and in order to get a decent shot, you will need to work at least three or four hours for an experienced editor. Yes, he is undoubtedly right, it can be done. But if this person (an artist and a photographer by profession) would have known even the slightest bit of mathematics or imagined the possibility of elementary programming, then he would never have undertaken to say such a thing. Especially to the argument. And I proved him the opposite.

    If you ever held a camera in your hands, and, especially, took pictures, you probably know what exposure is. If it’s quite simple (for people shooting in “auto” mode), this is the length of time the frame was taken. Surely you have more than once got "blurry" photos in motion. Here! It was just the exposure that was incorrectly set up: it was too big. And if suddenly the photo turned out too dark, then most likely the exposure was too small. I’m telling you this in a very childish language so that everyone understands that I don’t touch the aperture settings and other subtleties of shooting. So, just thinking about the exposition, the idea of ​​solving this dispute problem came to my mind.

    I thought: what if you take a very, very, very-sooooo long (huge exposure) photo of Red Square, but with a very-very-very-very-very-very-tightly closed aperture (so that there is no overexposure). After all, then the people who walked in the frame will simply be lubricated, and the permanent details (buildings, the Kremlin, the square) will remain in place. Yes! This is the key to the solution. One would have to try to do this somehow. But how? I don’t have the opportunity to get to Red Square. Not only for the sake of experiment, but in general, not to mention “for the sake of experiment”. Okay, any other area is suitable for checking, this is not a problem. The problem is that my Canon 550D, which I bought a million years ago, can take pictures with a maximum exposure of 30 seconds, which is very small for the experiment. I can’t buy a new camera for the sake of experiment either. We need a really long exposure, somewhere around 30 minutes. Why? To increase the chance that people in the square will precisely change their position and leave those places where they were at the beginning of the frame. Roughly speaking, in 30 minutes at each point in the frame should get more area than people. I began to think how to solve the problem of photographing with little blood?

    But we are not born! After all, we have in our hands the most powerful tool available to few - programming! I decided: you can create a “virtual” camera that would simply photograph the screen with any arbitrary shutter speed. You know all these programs for recording from the screen: SnagIt, BandiCam, FRAPS ... only it would record not one frame (photo) or sequence of frames (video), but would accumulate information (as with a long exposure, which, in essence, is the exposure , only electronic), and at the end of the record would average the information received. Then, if the screen simply plays back the recording from the camera from the area, then this will be the required picture! Hurrah! The problem is solved ... theoretically. It remains only to create the necessary software and find the video from a fixed camera, which would take the floor for half an hour.

    It’s good that the requests to the program are trifling, and I easily implemented the required for the evening.

    There were no questions with the video either, since there are a billion. Any more or less good recordings from web-cameras or surveillance cameras are suitable, since they are predominantly fixed and do not move throughout the clip.

    Experiment 1. So, the experiments began. Before you is a time-lapse video of Red Square. But do not be in a hurry to be surprised if you do not see familiar places on the video. Red Square in the world is not alone (like St. Petersburg and other painfully familiar names), there are about twenty of them. Presented in the video is Red Squarelocated near the University of Washington. This is a very crowded place, which is a landmark of the University and even the city. In the square there are constantly a large number of students, tourists, travelers, applicants, teachers and just passers-by. By the way, an interesting fact: our Red Square is “red” because the word “red” in ancient times meant “beautiful” (and the square itself was originally built white , made of white brick), and the Red Square near the University of Washington is “red” precisely because made of reddish stone.

    By the way, here's the irony for you: in the dispute with this tipik, we did not specify exactly which Red Square is meant. Our native was simply meant. Since there are several Red Squares in the world, perhaps among them there would be such a few people, on which at certain moments of the day there would be no people. Then you can take a photo, then I win the dispute automatically.

    Well, down with the lyrical digressions and irony. Here's what happened after averaging Red Square:

    The video lasts only 17 seconds, but since it is a time lapse, then actually the video shows the elapsed time much more than 17 seconds. Maybe 5 minutes, maybe 15.


    As can be seen from the result, in the photo there were only very long people sitting in the same place throughout the video. Some of them get up and leave, and the so-called "ghosts" are obtained. In general, the result is almost what we need.

    Now compare how many people are in the video and how many are in the photo. And no matter how much I tormented in Photoshop, cutting people out and looking for frames that do not have people in the cut out fragments, to insert these fragments into empty parts, and, besides, the insert would be torn, because there’s even background lighting from frame to frame changing due to cloud shadows, recording errors, and so on. And my ekspozzer did it in just 17 seconds; it turned out smoothly and without much work. Cool? Cool! And this is just the beginning! Experiment 2. Let us return to oursRed Square. I never found a good enough and long video shot from the square itself using a fixed camera. Even time lapse: there, in general, the guys all the time will deliberately move the camera smoothly. I found only this video:


    Pay attention to the huge number of cars on the Big Stone Bridge.


    It turned out very nice and smooth, despite the dancing shadows from the clouds.

    And where did the cars go after averaging? That's right: disappeared. Check out how clean the picture turned out. Well, how can you catch such a frame in the afternoon? Of course, any method has errors. So mine: no, no, and in some places the “ghosts” of cars or people will remain. In fact, the math is pretty simple. If in a video of 100 frames a person is found in 5 frames, then he will be ghosted by 100 - 5 = 95 percent. That is, 95 percent of the information will be received from the area, and 5 - per person. With this proportion, it is practically invisible. And since people and cars in general are constantly moving, the percentage is even less! Just chocolate! Experiment 3. Go ahead, take the most densely populated area in the world - New York Times Square:


    Everything here is literally teeming with people and cars.


    And at the exit they got only a lonely police car ...

    Experiment 4. ... and a bunch of ghostly stains in the street on the left. Well, this is the imperfect method. Yet:


    The video lasts only 16 seconds.


    Therefore, the result will be worse!

    Experiment 5.

    Busy Wall Street.


    The result is impressive. Purely! This means that almost everyone moves and does not stand still.

    Everything is clear here: the shorter the exposure and the slower the objects move, the more ghosts will be expressed. And vice versa: the longer the exposure and the faster the objects move, the better the background will be visible. In this case, long time lapse videos shot with a fixed camera are ideal . Well, there’s a thousand of such videos.

    Experiment 6. We begin to look at other results. Here is the video from the intersection surveillance camera on which the accident occurs:


    Averaged from 20 to 40 seconds, only 20 seconds.


    Nice, clean junction with the ghost of a white car.

    Experiment 7. And here is the very ideal specimen: a long time lapse video, recording from a crowded street in Arnsberg, Germany:


    Notice how the flags flutter in the wind.


    As a result, they wave in the average photo. I observe people barely.

    Experiment 8. Well, and where without the Eiffel Tower!


    This time lapse lasts almost a day! Ideal, but how will averaging behave when switching from day to night and vice versa?


    It turned out very tolerably and mysteriously. Some indefinite time of day.

    Experiment 9. Well, then you can just play around and average the average. For example, there is such a video where the type, traveling, takes a picture of himself every day. Let's see what comes of this.


    Interesting video. I want it too!


    It turned out very psychedelic. It seems to me, or he looks like Jesus. Or is Jesus himself a kind of averaged image?

    Experiment 10. Why, with this thing you can look under the water !!! Here is what I mean: when the sea sways, the waves refract the bottom pattern. Unless, of course, it is visible. Then, taking an averaged distorted pattern with a large exposure, we get an image of the bottom without the influence of water. Cool! We select a cool long video in which the bottom is visible through the rippling water and watch:


    Try to find a flat surface of the water. Will not work!


    The averager averages the fluctuations of the surface of the water, thereby averaging the refraction. We see the bottom pattern and the smooth surface of the water like a mirror!

    Experiment 11. And if you average the movement shot from the window of a vehicle, you can get the effect of rapid forward movement. We take the video from the trains and average literally 1 second!




    Experiment 12.




    Good effects are obtained! Excellent. I began to experiment with different videos and get interesting results. But when I got it, it seemed to me somehow not enough. Cool, but not enough. And then another interesting idea came to me. When I watched and averaged the video shot from the windows of moving trains (for the effect of rapid movement), I realized what functionality my little program lacks! And let her begin to shoot panoramas too!

    Yes, panoramas are pretty simple. The train moves, the picture in the window changes, you just need to take a sequence of images from the window with a certain offset and glue one to the other from left to right or from right to left. Then you get a huge panorama with the image of everything that drove outside the window. I immediately began to experiment. I wrote an intelligent gluer with a border detector, but it turned out very badly! All the time, the barrel effect and the illumination jumping from fragment to fragment interfered. I realized that before such giants as, for example, Autopano Giga, my program did not hold out almost never, and began to cunning. Dream up. How to make panorama gluing smooth and continuous. The first idea that came to my mind and became decisive: it is necessary to glue each frame, not fragments, with each frame being added one column to the resulting picture. We take the first frame, cut out a thin vertical strip of the image from there, take the next frame, cut the same thin strip and glue it to the first cut strip. Left or right - depending on the direction of movement of the camera, which can be specified explicitly. Since the second frame differs from the first one by some offset, the image in two glued stripes will be something like a panorama scan. A kind of cheap analogue of a slit camera ( times ,two , three ). Happened? Go!

    Experiment 13. To begin with, I need a video in which a fixed camera would shoot a moving object against a fixed background. Then, if the object is quite long, it can be "scanned" in its entirety! Moving cars and trains are ideal for the role of such videos. Here is such a handsome I have collected, see:


    The train runs from 01:57 to 03:17.


    Different lengths of cars turned out due to the changing speed of the train. The picture is clickable.

    It turned out unexpectedly great! True, the program takes a panorama terribly flattened horizontally, and in order to return the correct proportions, you have to compress the resulting image vertically, which makes it small. This is probably a drawback of both the program and the video supplied to the input: if the train on it moved sooo slowly, then the proportions would be normal. Experiment 14. Let’s collect another picture, but this time from the window of a moving train.


    The camera is fixed pretty well from 3:25.


    The result was a mini-panorama of the city. The picture is clickable.

    And here with the flaws, everything is clear: strong distortion of objects. Moving objects closer to the train fly by frame faster, moving farther from the train - slower. The law of parallax. This means that the foreground objects will be strongly flattened horizontally, and the background objects will be very elongated. Here you have to tune in to a specific one plane (the distance of objects from the camera) of perception. In this case, at home in the distance. They turned out to be quite “assembled”. Everything that is closer (trees, wires, poles) will be flattened strongly, everything that is further - will be stretched. An ideal image in all planes cannot be obtained using fused scanning. Experiment 15. We take the following example and assemble the platform of the station in Kislovodsk:





    Here we see disproportionately flattened lights. I confess, my mistake: since we collected the apron, and the lights are right in the center of the apron, they should have turned out to be absolutely even. We take it now and collect the panorama from another video:





    One can see how unevenly the wires are suspended. The picture is clickable.

    Then I neglected the trees and tuned in to the distantly standing huts. Experiment 16. Now let's have one more, with a suburb of St. Petersburg:


    The panorama has been gathering since 02:43.


    The distant houses are slightly stretched, the neighbors are slightly flattened. You won’t break the system. The picture is clickable.

    Experiment 17. Why not try cars, not trains? I found an interesting video from the parade in the square, where a fixed camera shot cars passing by the parade:


    The video quality is terrible, the subwoofer shoots with a very low FPS.


    Hence the quality of the final panorama. However, the formation of the front column is read quite naturally. The picture is clickable.

    Having played enough, I began to stupidly indulge.

    Experiment 18.

    Mad


    Slit-Mad The picture is clickable.

    Experiment 19.

    Michael


    Slit Michael The picture is clickable.

    Experiment 20.

    Volodya


    Slit-Volodya The picture is clickable.

    Bewitching! =) Experiment 21. And finally, even more charming: the panorama shows a smooth change in the color of the sky in the evening. Here, too, everything is quite simple:


    The panorama was created from 1 to 38 seconds.


    We observe the sunrise from left to right.

    Now you can play around with this small and cool program.


    Control in averaging mode: select the "averaging" mode. Move the mouse to the upper left corner of the video and hold down "[" - the program remembers the upper left corner. It is not necessary that an application showing a video be active at that moment; any application can be active. Bring the mouse to the lower right corner of the video and press "]" - the program remembers the lower right corner and, thereby, all coordinates of the frame with the video completely. We start the video. We begin to average, at any time holding "/" on the numeric keypad on the right. During averaging, the program reports on the number of averaged frames. The same neighboring frames are not averaged, but ignored (thus, when the video is frozen, the program will not spoil the result). We average the selected amount of time. To complete the averaging, hold down "*" on the numeric keypad on the right. The result is written in the same folder where Ekspozzer lies.

    Control in pan mode: select the "panorama" mode. We select the arrow to the left if the camera on the video “flies” to the left (tobish the picture moves to the right); or right, if the camera on the video “flies” to the right (tobish the picture moves to the left). Select the width of the panorama in pixels. Bring the mouse to the upper edge of the video (approximately in the center) and hold down "[" - the program remembers the upper left corner. Bring the mouse to the bottom edge of the video, stepping a little to the right, and press "]" - the program remembers the lower right corner and, thereby, the coordinates of the "gap" through which the panorama will be collected from the video. We start the video. We begin to collect the panorama by holding "/" on the numeric keypad on the right at any time. The result is written in the same folder where Ekspozzer lies.

    Please do not beat: the program was originally created "on the knee" and for myself, there is no usability in the program. All of the above is a purely entertaining popular science experiment.

    Thank you for attention!

    Also popular now: