Sound and music based environment generation in Unity3D. Part 2. Creating a 2D track from music

Tutorial

annotation

Hello. Relatively recently, I wrote an article Generating an environment based on sound and music in Unity3D , in which I gave several examples of games that use the mechanics of generating content based on music, and also talked about the basic aspects of such games. There was practically no code in the article and I promised that there would be a sequel. And here it is, in front of you. This time we will try to create a track for a 2D race, in the style of Hill Climb, from your music. Let's see what we get ..

Introduction

I remind you that this series of articles is designed for beginner developers and for those who have just recently started working with sound. If you are doing a fast Fourier transform in your mind, then you will probably be bored.

Here is our Road Map for today:

Consider what discretization is.
Find out what data we can get from Audio Clip Unity
Understand how we can work with this data.
Find out what we can generate from this data.
Learn how to make a game out of all of this (well, or something similar to a game)

So let's go!

Discretization of Analog Sinhala

As many people know, in order to use a signal in digital systems, we need to convert it. One of the conversion steps is signal sampling, in which the analog signal is divided into parts (temporary reports), after which each report is assigned the amplitude value that was at the selected moment.

The letter T denotes the sampling period. The shorter the period, the more accurate the signal conversion will be. But most often they talk about the inverse: Sample rate (it is logical that this is F = 1 / T). 8,000 Hz is enough for a telephone singal, and, for example, one of the options for the DVD-Audio format requires a sampling frequency of 192,000 Hz. The standard in digital recording (in game editors, music editors) is 44 100 Hz - this is the frequency of CD Audio.

The numerical values of the amplitude are stored in the so-called samples and it is with them that we will work. The value of the sample is float and it can be from -1 to 1. Simplified, it looks like this.

Sound wave rendering (static)

Basic information

The waveform (or audio form, and in common people - “fish”) is a visual representation of the sound signal over time. The waveform can show us at what point in the sound the active phase occurs, and where the attenuation occurs. Often the waveform is presented for each channel separately, for example, like this:

Imagine that we already have an AudioSource and a script in which we work. Let's see what Unity can give us.

// Получаем AudioSource от нашего объекта
AudioSource myAudio = gameObject.GetComponent<AudioSource>();
// Сохраняем частоту дискретизации нашего аудиофайла. По умолчанию она равна 44100.int freq = myAudio.clip.frequency;

Select Number of Reports

Before we go any further, we need to talk a little about the depth of rendering of our sound. With a sampling frequency of 44100 Hz per second, we are able to process 44100 reports. Let's say we need to render a track 10 seconds long. We will draw each report with a line in a pixel wide. It turns out that our waveform will be 441,000 pixels long. You get a very long, elongated and little understood sound wave. But, in it you can see each specific report! And you will terribly load the system, no matter how you draw it.

If you do not make professional audio software, you do not need such accuracy. For a general audio picture, we can break all the samples into larger periods and take, for example, the average of each 100 samples. Then our wave will have a very distinct form:

Of course, this is not entirely accurate, since you can skip the volume peaks that you may need, so you can try not the average value, but the maximum from this segment. This will give a slightly different picture, but your peaks will not disappear.

Preparing to receive audio

Let's define the accuracy of our sample as quality, and the final number of reports as sampleCount.

int quality = 100;
int sampleCount = 0;
sampleCount = freq / quality;

An example of calculating all the numbers will be below.

Next, we need to get the samples ourselves. This can be done from an audio clip using the GetData method .

publicboolGetData(float[] data, int offsetSamples);

This method takes an array into which it writes samples. offsetSamples - parameter that is responsible for the starting point of reading the data array. If you read the array from the beginning, then there should be zero.

To record samples, we need to prepare an array for them. For example, like this:

float[] samples;
float[] waveFormArray; //Сюда мы запишем уже усредненные данные
samples = newfloat[myAudio.clip.samples * myAudio.clip.channels];

Why did we multiply the length by the number of channels? Now I will tell ...

Audio Channel Information in Unity

Many people know that in sound we usually use two channels: left and right. Someone knows that there are 2.1 systems, as well as 5.1, 7.1 in which sound sources surround from all sides. The theme of the channels is well described on the wiki . How does this work in Unity?

When downloading a file, when opening a clip, you can find the following image:

It is just shown here that we have two channels, and you can even notice that they are different from each other. Unity records samples of these channels one after another. It turns out this picture:
$inline$

That is why we need twice as much space in the array than just for the number of samples.

If you select the Force To Mono clip option, the channel will be one and all the sound will be in the center. The preview of your wave will immediately change.

Receive audio data

Here's what we get:

privateint quality = 100;
privateint sampleCount = 0;
privatefloat[] waveFormArray;
privatefloat[] samples;
private AudioSource myAudio;
voidStart()
{
    myAudio = gameObject.GetComponent<AudioSource>();
    int freq = myAudio.clip.frequency;
    sampleCount = freq / quality;
    samples = newfloat[myAudio.clip.samples * myAudio.clip.channels];
    myAudio.clip.GetData(samples,0);
    // создаем массив, куда запишем усредненные сэмплы. Из него мы будем рисовать волну
    waveFormArray = newfloat[(samples.Length / sampleCount)];
    //Дальше проходим по нашему массиву и находим среднее значение в каждой группе сэмпловfor (int i = 0; i < waveFormArray.Length; i++)
    {
        waveFormArray[i] = 0;
        for (int j = 0; j < sampleCount; j++)
        {
            //Abs тут использован для создания "красивой" и зеркально отраженной волны. См. ниже
            waveFormArray[i] += Mathf.Abs(samples[(i * sampleCount) + j]);
        }
        waveFormArray[i] /= sampleCount;
    }
}

Total, if the track goes 10 seconds and it is two-channel, then we get the following:

The number of samples in the clip (myAudio.clip.sample) = 44100 * 10 = 441000
Samples array for two channels is long (samples.Length) = 441000 * 2 = 882000
Number of reports (sampleCount) = 44100/100 = 441
The length of the final array = samples.Length / sampleCount = 2000

As a result, we will work with 2000 points, which is enough for us to draw the wave. Now you need to include imagination and think about how we can use this data.

Rendering Audio Information

Create a simple audio track with Debug tools

As many people know, Unity has convenient means for displaying all kinds of Debug information. An intelligent developer based on these tools can make, for example, very powerful extensions for the editor. Our case shows a very atypical use of Debug methods.

To draw, we need a line. We can do it with the help of a vector that will be created from the values of our array. Please note, to make a beautiful mirror audio form, we need to “glue” the two halves of our visualization.

for (int i = 0; i < waveFormArray.Length - 1; i++)
{
    //Создание вектора для верхней половины аудиоформы
    Vector3 upLine = new Vector3(i * .01f, waveFormArray[i] * 10, 0);
    //Создание вектора для нижней половины аудиоформы
    Vector3 downLine = new Vector3(i * .01f, -waveFormArray[i] * 10, 0);
}

Next, just use Debug.DrawLine to draw our vectors. Any color can choose. All these methods must be called in Update, so we will update the information every frame.

Debug.DrawLine(upLine, downLine, Color.green);

If you want, you can add a “slider” that will show the current position of the track being played. This information can be obtained from the "AudioSource.timeSamples" field.

privatefloat debugLineWidth = 5;
//Создание "бегунка" на аудиоформе. Положение привязано к текущему временному сэмплуint currentPosition = (myAudio.timeSamples / quality) * 2;
Vector3 drawVector = new Vector3(currentPosition * 0.01f, 0, 0);
Debug.DrawLine(drawVector - Vector3.up * debugLineWidth, drawVector + Vector3.up * debugLineWidth, Color.white);

Total, here is our script:

using UnityEngine;
publicclassWaveFormDebug : MonoBehaviour
{
    privatereadonlyint quality = 100;
    privateint sampleCount = 0;
    privateint freq;
    privatereadonlyfloat debugLineWidth = 5;
    privatefloat[] waveFormArray;
    privatefloat[] samples;
    private AudioSource myAudio;
    privatevoidStart()
    {
        myAudio = gameObject.GetComponent<AudioSource>();
        //Базовые расчеты
        freq = myAudio.clip.frequency;
        sampleCount = freq / quality;
        //Получение аудиоданных
        samples = newfloat[myAudio.clip.samples * myAudio.clip.channels];
        myAudio.clip.GetData(samples, 0);
        //Создание массива с данными для отрисовки аудиоформы
        waveFormArray = newfloat[(samples.Length / sampleCount)];
        for (int i = 0; i < waveFormArray.Length; i++)
        {
            waveFormArray[i] = 0;
            for (int j = 0; j < sampleCount; j++)
            {
                waveFormArray[i] += Mathf.Abs(samples[(i * sampleCount) + j]);
            }
            waveFormArray[i] /= sampleCount;
        }
    }
    privatevoidUpdate()
    {
        for (int i = 0; i < waveFormArray.Length - 1; i++)
        {
            //Создание вектора для верхней половины аудиоформы
            Vector3 upLine = new Vector3(i * 0.01f, waveFormArray[i] * 10, 0);
            //Создание вектора для нижней половины аудиоформы
            Vector3 downLine = new Vector3(i * 0.01f, -waveFormArray[i] * 10, 0);
            //Отрисовка Debug информации
            Debug.DrawLine(upLine, downLine, Color.green);
        }
        //Создание "бегунка" на аудиоформе. Положение привязано к текущему временному сэмплуint currentPosition = (myAudio.timeSamples / quality) * 2;
        Vector3 drawVector = new Vector3(currentPosition * 0.01f, 0, 0);
        Debug.DrawLine(drawVector - Vector3.up * debugLineWidth, drawVector + Vector3.up * debugLineWidth, Color.white);
    }
}

And here is the result:

Create a smooth soundscape with PolygonCollider2D

Before proceeding to this section, I want to note the following: of course, driving along the track generated from music is fun, but from the point of view of gameplay it is practically useless. And that's why:

In order for the track to be passable, we need to smooth out our data. All peaks disappear and you practically stop "feeling your music"
Usually, music tracks are highly compressed and represent a sound brick, which is poorly suited for a 2D game.
The unresolved issue of the speed of our transport, which should fit the speed of the track. I want to consider this issue in the next article.

Therefore, as an experiment, this type of generation is quite funny, but it is difficult to make a real gameplay feature based on it. In any case, we continue.

So, we need to make PolygonCollider2D using our data. This is easy to do. PolygonCollider2D has a public points field that accepts Vector2 []. First, we need to transfer our points to the vectors of the desired form. Let's make a function to translate the array of our samples into a vector array:

private Vector2[] CreatePath(float[] src)
{
    Vector2[] result = new Vector2[src.Length];
    for (int i = 0; i < size; i++)
    {
        result[i] = new Vector2(i * 0.01f, Mathf.Abs(src[i] * lineScale));
    }
    return result;
}

After that, just pass our resulting array of vectors to the collider:

path = CreatePath(waveFormArray);
poly.points = path;

We look at the result. Here is the beginning of our track ... hmm ... it doesn’t look very passable (don’t think about visualization yet, comments will come later).

We have too sharp audio form, so the track comes out weird. Need to smooth it. Here we use the moving average algorithm. In more detail about it it is possible to read on Habré, in article the Moving Average Algorithm (Simple Moving Average) .

In Unity, the algorithm is implemented as follows:

privatefloat[] MovingAverage(int frameSize, float[] data)
{
    float sum = 0;
    float[] avgPoints = newfloat[data.Length - frameSize + 1];
    for (int counter = 0; counter <= data.Length - frameSize; counter++)
    {
        int innerLoopCounter = 0;
        int index = counter;
        while (innerLoopCounter < frameSize)
        {
            sum = sum + data[index];
            innerLoopCounter += 1;
            index += 1;
        }
        avgPoints[counter] = sum / frameSize;
        sum = 0;
    }
    return avgPoints;
}

We modify our path creation:

float[] avgArray = MovingAverage(frameSize, waveFormArray);
path = CreatePath(avgArray);
poly.points = path;

Checking ...

Now our track looks quite normal. I used a window width of 10. You can modify this parameter to choose the smoothing you need.

Here is the full script for this section:

using UnityEngine;
publicclassWaveFormTest : MonoBehaviour
{
    privateconstint frameSize = 10;
    publicint size = 2048;
    public PolygonCollider2D poly;
    privatereadonlyint lineScale = 5;
    privatereadonlyint quality = 100;
    privateint sampleCount = 0;
    privatefloat[] waveFormArray;
    privatefloat[] samples;
    private Vector2[] path;
    private AudioSource myAudio;
    privatevoidStart()
    {
        myAudio = gameObject.GetComponent<AudioSource>();
        int freq = myAudio.clip.frequency;
        sampleCount = freq / quality;
        samples = newfloat[myAudio.clip.samples * myAudio.clip.channels];
        myAudio.clip.GetData(samples, 0);
        waveFormArray = newfloat[(samples.Length / sampleCount)];
        for (int i = 0; i < waveFormArray.Length; i++)
        {
            waveFormArray[i] = 0;
            for (int j = 0; j < sampleCount; j++)
            {
                waveFormArray[i] += Mathf.Abs(samples[(i * sampleCount) + j]);
            }
            waveFormArray[i] /= sampleCount * 2;
        }
        //Получаем сглаженный массив, с шириной окна frameSizefloat[] avgArray = MovingAverage(frameSize, waveFormArray);
        path = CreatePath(avgArray);
        poly.points = path;
    }
    private Vector2[] CreatePath(float[] src)
    {
        Vector2[] result = new Vector2[src.Length];
        for (int i = 0; i < size; i++)
        {
            result[i] = new Vector2(i * 0.01f, Mathf.Abs(src[i] * lineScale));
        }
        return result;
    }
    privatefloat[] MovingAverage(int frameSize, float[] data)
    {
        float sum = 0;
        float[] avgPoints = newfloat[data.Length - frameSize + 1];
        for (int counter = 0; counter <= data.Length - frameSize; counter++)
        {
            int innerLoopCounter = 0;
            int index = counter;
            while (innerLoopCounter < frameSize)
            {
                sum = sum + data[index];
                innerLoopCounter += 1;
                index += 1;
            }
            avgPoints[counter] = sum / frameSize;
            sum = 0;
        }
        return avgPoints;
    }
}

As I said at the beginning of the section, with this smoothing, we stop feeling the track, in addition, the speed of the machine is not tied to the speed of music (BPM). We will analyze this problem in the next part of this series of articles. In addition, there we will touch on the topic of specials. effects under the beat. By the way, I took the typewriter from this free asset .

Probably many of you, looking at the screenshots, wondered how I drew the track itself? After all, colliders are not visible.

I took advantage of the wisdom of the Internet and found a way with which you can turn a polygon collider into a mesh to which you can assign any material, and line renderer will make a stylish outline. This method is described in detail here . Triangulator you can take on Unity Community .

Completion

What we learned in this article is a basic sketch for music games. Yes, in this form it is, so far, a little ugly, but you can safely say, “Guys, I made the machine drive the audio track!”. To make this a real game, you need to make a lot of effort. Here is a list of what we can do here:

Bind the speed of the machine to the BPM track. The player can only control the tilt of the car, but not the speed. Then the music will be felt much stronger during the course.
Make a bit detector and add specials. effects that will be triggered by bit. In addition, you can add animation to the car body, which will bounce on the beat of a beat. It all depends on your imagination.
Instead of a moving average, you need to more competently process the track and get an array of data so that the peaks do not disappear, but it was easy to build the track.
Well, and, of course, you need to make the gameplay interesting. You can place a coin bit on every hit, add danger zones, etc.

We will study all this and much more in the remaining parts of this series of articles. Thank you all for reading!

Tags: