Optimize game performance with Unity sound import options

Transfer

Developers usually do not fully understand the sound import options in Unity, and at the time of this writing, I could not find a single detailed guide on how to use them. The Unity documentation describes well what sound import options do, but I would like to parse these descriptions for a wider audience, as well as explain in more detail how to use these settings to squeeze the maximum performance out of the game.

This document is divided into five parts:

How sound affects performance
Understanding Import Options
My recommended options for PC and consoles
My recommended options for mobile platforms
Warnings and notes

Optimizing audio import settings Unity is one of the easiest ways to optimize. In the case of a small project, it can take you less than an hour to achieve significant improvements in load time, busy RAM, and other aspects of performance. I hope this guide will be useful to you. Information is relevant for Unity version 2018.3

1. How sound affects performance

The sound data is voluminous. In many games, audio data takes up the lion's share of disk space (disk / cartridge / optical disk space where game data is stored) and RAM (the system's working memory). But moreover, they also seriously burden the CPU, especially if you use the effects of DSP (real-time audio processing), and also greatly increase the boot time.

In these three areas (disk space occupied by RAM, CPU utilization), optimization is a three-way over-pull of the rope, similar to the “good-cheap-fast” problem .. If any of the aspects of the game cause you the most damage, you can save the effectiveness of one aspect by sacrificing the other. For example, if an uncompressed sound takes too much RAM, you can store it in a compressed Vorbis form - this saves space in RAM, but at the cost of CPU load, because accessing a compressed file requires more processing power to decode it. Below is a diagram with different parameters and their effect on these three areas:

It is worth considering that this diagram does not tell us anything about the bandwidth of the data read from the disk / RAM.

In reality, the nuances are a bit more, but the diagram should give you a general idea of how these problems are interconnected. To understand how to use these parameters (and cope with problems such as too long loading times), you need to take a closer look at each of the sound import options.

2. We understand the import options

When you select AudioClip in the Unity editor, the following panel appears in the inspector window:

Below is a list of sound parameters from top to bottom with a description of what they do:

Force to mono

Yes: if AudioClip is recorded in stereo (or with a different number of channels), then this parameter reduces all channels to one mono channel.
No: the number of channels does not change.

In my work as a sound designer, I never reduce the audio signal to mono, because I create sounds for a specific purpose. But if you use standard sounds from drains and want to enable this parameter, then make sure that the mono-file is not flat and strange because of the interfacial interaction of the left and right channels. You can preview the processed sound by clicking the Play button in the lower right corner of the inspector window - if you hear phase distortions, you can try to divide the sound in a third-party audio editor and separately export the left and right channels as mono sound.

Normalize (available only with Force to Mono enabled)

Yes: adjusts the gain of the AudioClip so that the mono-converted sound has the same volume as the original stereo file.
No: does not adjust the gain.

If you use Force to Mono, then usually it is worth turning on normalization. A loud stereo file, when mixed in mono, can become even louder, exceed the maximum amplitude and lead to a sharp digital distortion, which is undesirable.

Load in Background / Preload Audio Data

Load in background	Preload audio data	Result
Enabled	Enabled	When loading a scene, audio clips with this parameter begin to load, but do not stop the main stream. If by the time the scene is fully loaded, not all of them are loaded, the download will continue in the background when the scene is already running. If the sound is not loaded, but is already running, then it will behave in the same way as when Preload is disabled (see the line below).
Enabled	Disabled	When the sound starts for the first time, it will load in the background and play when it is ready. If the file is large, then a noticeable delay may appear between the launch and playback, but on subsequent playbacks of the file everything will be fine.
Disabled	Enabled	The sound is loaded in the process of loading the scene. The scene will not start until all sounds with this parameter are loaded into memory.
Disabled	Disabled	When the sound starts for the first time, it uses the main stream to load itself into memory - if the file is large, it may cause frame deceleration, but on subsequent playbacks everything will be fine. I recommend using this configuration only for very small files, but even in this case, it is worth measuring its impact on performance in the profiler, and consider whether a large number of such sounds can be launched simultaneously, increasing the performance load.

These parameters have a direct impact on each other, so I combined them.
^{Load in background Preload audio data Result
Enabled Enabled When loading a scene, audio clips with this parameter begin to load, but do not stop the main stream. If by the time the scene is fully loaded, not all of them are loaded, the download will continue in the background when the scene is already running. If the sound is not loaded, but is already running, then it will behave in the same way as when Preload is disabled (see the line below).
Enabled Disabled When the sound starts for the first time, it will load in the background and play when it is ready. If the file is large, then a noticeable delay may appear between the launch and playback, but on subsequent playbacks of the file everything will be fine.
Disabled Enabled The sound is loaded in the process of loading the scene. The scene will not start until all sounds with this parameter are loaded into memory.
Disabled Disabled When the sound starts for the first time, it uses the main stream to load itself into memory - if the file is large, it may cause frame deceleration, but on subsequent playbacks everything will be fine.

I recommend using this configuration only for very small files, but even in this case, it is worth measuring its impact on performance in the profiler, and consider whether a large number of such sounds can be launched simultaneously, increasing the performance load.}

Ambisonic

Check this box if Ambisonic encoded sound. Ambisonic sounds are useful for VR, AR, panoramic video, etc., but this parameter has nothing to do with our management.

Platform-Specific Parameters

These tabs allow you to set default parameters and platform-specific parameters for the settings listed below. On some platforms, there are compression formats that are not available on others; on some, there may be other equipment that requires a different optimization. See platform notes for details on platform-specific compression.
Always check platform-specific parameters, even if you want to use common parameters — sometimes Unity can automatically set platform-specific parameters. For example, in assemblies for iOS, the default may be "specify sample rate: 22kHz", which can lead to aliasing (a sound defect arising from an incorrect reduction of the sampling rate).

Load type

Decompress on Load: The sound is stored on disk in the specified Compression Format, but is unpacked and loaded into RAM uncompressed in PCM format. It takes a lot of RAM and slightly increases the boot time, but is very low in cost from the point of view of the processor, and access is very fast.
Compressed in Memory: both on disk and in RAM, sound is stored in the specified compression format. The space and load time occupied in the RAM decreases, but the processor load increases when the sound is played, because it must be unpacked every time it is played.
Streaming: streaming audio is performed directly from the disc, completely bypassing the RAM. This takes up some of the disk bandwidth and CPU resources, but on PCs and consoles it doesn’t have a big effect on performance, provided that no more than two sounds are played simultaneously. On mobile platforms (especially on cheap and old devices), simultaneous streaming of several stereo files is very processor-intensive (see the “Cautions” section below).

Compression format

PCM: raw audio data, fully unpacked and take up a lot of disk space and RAM, but you don’t really need to spend on playing them, because you don’t need unpacking.
ADPCM: a very old compression format with a 3.5: 1 compression ratio. Compression / decompression is quite inexpensive compared to Vorbis or other compression formats, but artifacts of digital noise are introduced into the sound, so you only need to use it in “noisy sounds”, in which this will not be noticeable. If you are not sure whether ADPCM is suitable for a particular sound, then enable previewing sound in PCM and ADPCM formats - if you hear the difference, I recommend choosing PCM.
Vorbis: compressed format compatible with most popular platforms. It can provide quite high compression ratios, while maintaining a decent sound quality, but when compressed and decompressed on the fly, it is quite expensive.

I have listed here only the standard editor formats, for more information about platform-specific types and compression, see below in the Remarks section.

Here is a brief comparison of CPU usage for different formats on my PC in the Unity editor:

Compression format	CPU load with 1 vote	CPU load with 6 votes
PCM	~ 0.05%	~ 0.3%
ADPCM (compressed in memory)	~ 0.2%	~ 1.0%
Vorbis (compressed in memory)	~ 0.5%	~ 3.2%

Quality (not applicable to PCM / ADPCM)

70-100%: almost indistinguishable from PCM in full quality for all but audiophiles with expensive sound equipment
1-69%: varying level of quality, lower values create strong repulsive noise artifacts, reduce dynamics and make the sound flat and lifeless. You can click the preview button in the inspector panel to see how noticeably the quality of each particular sound drops.

These quality parameters mean that the sound will play at a rate of 100%, so they cut off some of the higher frequencies that are usually outside the audible range. But when playing at reduced speed, they are shifted down into the audible range. If you are planning to play low tones / speeds, it’s best to select PCM encoding.

The lower the quality, the stronger the files are compressed:

Vorbis quality	% of original size	Compression ratio
100	~ 20%	~ 5: 1
75	~ 10%	~ 10: 1
50	~ 7%	~ 14: 1
25	~ 4%	~ 25: 1
one	~ 2%	~ 50: 1

Sample Rate Setting

Preserve: uses the sampling rate at which the sound is recorded.
Optimize: Unity analyzes the sound and finds its maximum frequency, and then uses the Nykvist theorem to determine the lowest sampling rate that can be applied without losing these frequencies. For example, if the maximum sound frequency is 10 kHz, the sampling rate can be reduced to 20 kHz without losing the sound content. This parameter can only be used for PCM / ADPCM.
Override: if desired, you can manually set a new sampling rate for the AudioClip. In general, I do not recommend this if you do not understand where this will lead.

3. Recommended options for PC and consoles

Sound type	Load in background	Load type	Preload audio data	Compression format	Quality	Sample rate setting	Remarks
Dialogues	Y	Compressed in memory	Y	Vorbis	70	Preserve
Long loops environment sounds	n / a	Streaming	n / a	Vorbis	70	Preserve
Single sounds of the environment	Y	Decompress on load	Y	Vorbis	70	Preserve
Noise effects	N	Compressed in memory	Y	PCM	n / a	Optimize	If the tone of these sounds ever went down, then the sampling rate needs to be kept
Sounds of footsteps	N	Compressed in memory	Y	PCM	n / a	Optimize
Music (long songs)	n / a	Streaming	n / a	Vorbis	85	Preserve	Streaming may not be appropriate if several music tracks are played simultaneously.
Music (short fragments)	Y	Compressed in memory	Y	Vorbis	85	Preserve
Direct Voices	Y	Decompress on load	Y	Vorbis	70	Preserve
Special effects (SFX, short)	N	Compressed in memory	Y	PCM	n / a	Optimize	If the tone of these sounds ever went down, then the sampling rate needs to be kept
Special effects (long)	N	Decompress on load	Y	Vorbis	70	Preserve
Sounds UI (long)	Y	Decompress on load	Y	Vorbis	70	Preserve
Sounds UI (short)	N	Compressed in memory	Y	PCM	n / a	Optimize

^{Sound type Load in background
Load type Preload audio data Compression format Quality Sample rate setting Remarks
Dialogues Y Compressed in memory Y Vorbis 70 Preserve
Long loops environment sounds n / a Streaming n / a Vorbis 70 Preserve
Single sounds of the environment Y Decompress on load Y Vorbis 70 Preserve
Noise effects N Compressed in memory Y PCM n / a Optimize If the tone of these sounds ever went down, then the sampling rate needs to be kept
Sounds of footsteps N Compressed in memory Y PCM n / a Optimize
Music (long songs) n / a Streaming n / a Vorbis 85 Preserve Streaming may not be appropriate if several music tracks are played simultaneously.
Music (short fragments) Y Compressed in memory Y Vorbis 85 Preserve

Direct Voices
Y
Decompress on load
Y
Vorbis
70
Preserve

Special effects (SFX, short)
N
Compressed in memory
Y
PCM
n / a
Optimize
If the tone of these sounds ever went down, then the sampling rate needs to be kept
Special effects (long)
N
Decompress on load
Y
Vorbis
70
Preserve

Sounds UI (long)
Y
Decompress on load
Y
Vorbis
70
Preserve

Sounds UI (short)
N
Compressed in memory
Y
PCM
n / a
Optimize}
These recommendations are suitable for games where there are up to 10 thousand sound clips. For most sounds, I recommend Decompress on Load, that is, they will be stored in RAM as unpacked audio data. If the total size of the unpacked sound files is greater than the limitations you put on the RAM, you can select the Compressed in Memory option for the longest files. But note that each time you start such a sound, the load on the CPU will increase slightly. The version of these tables in PDF can be downloaded from here .

4. Recommended options for mobile platforms

Sound type	Load in background	Load type	Preload audio data	Compression format	Quality	Sample rate setting	Remarks
Dialogues	Y	Compressed in memory	Y	Vorbis / MP3	50	Preserve
Long loops environment sounds	Y	Compressed in memory	Y	Vorbis	35	Preserve	For sounds with no noise, use higher quality.
Single sounds of the environment	Y	Decompress on load	Y	Vorbis / MP3	50	Preserve
Noise effects	N	Compressed in memory	Y	PCM / ADPCM *	n / a	Preserve
Sounds of footsteps	N	Compressed in memory	Y	PCM / ADPCM *	n / a	Optimize
Music (long songs)	n / a	Streaming	n / a	Vorbis	70	Preserve	See below for streaming warnings.
Music (short fragments)	Y	Compressed in memory	Y	Vorbis / MP3	70	Preserve
Direct Voices	Y	Decompress on load	Y	Vorbis / MP3	50	Preserve
Special effects (SFX, short)	N	Compressed in memory	Y	PCM / ADPCM *	n / a	Optimize	If the tone of these sounds ever went down, then the sampling rate needs to be kept
Special effects (long)	N	Decompress on load	Y	Vorbis / MP3	50	Preserve
Sounds UI (long)	Y	Decompress on load	Y	Vorbis / MP3	50	Preserve
Sounds UI (short)	N	Compressed in memory	Y	PCM / ADPCM *	n / a	Optimize

^{Sound type
Load in background
Load type
Preload audio data
Compression format
Quality
Sample rate setting
Remarks
Dialogues
Y
Compressed in memory
Y
Vorbis / MP3
50
Preserve

Long loops environment sounds
Y
Compressed in memory
Y
Vorbis
35
Preserve
For sounds with no noise, use higher quality.
Single sounds of the environment
Y
Decompress on load
Y
Vorbis / MP3
50
Preserve

Noise effects
N
Compressed in memory
Y
PCM / ADPCM *
n / a
Preserve

Sounds of footsteps
N
Compressed in memory
Y
PCM / ADPCM *

n / a
Optimize

Music (long songs)
n / a
Streaming
n / a
Vorbis
70
Preserve
See below for streaming warnings.
Music (short fragments)
Y
Compressed in memory
Y
Vorbis / MP3
70
Preserve

Direct Voices
Y
Decompress on load
Y
Vorbis / MP3
50
Preserve

Special effects (SFX, short)
N
Compressed in memory
Y
PCM / ADPCM *

n / a
Optimize
If the tone of these sounds ever went down, then the sampling rate needs to be kept
Special effects (long)
N
Decompress on load
Y
Vorbis / MP3
50
Preserve

Sounds UI (long)
Y
Decompress on load
Y
Vorbis / MP3
50
Preserve

Sounds UI (short)
N
Compressed in memory
Y
PCM / ADPCM *
n / a
Optimize}
* If you do not know what to choose, PCM or ADPCM, then see the above description of the ADPCM format in the “Understanding the Parameters” section . If saving disk space is not yet desperately needed, then I recommend leaning towards PCM.

These recommendations will work well in most mobile games; at least they can be used as a starting point. If you think that these parameters may not be suitable for your game, then read the full description of the parameters above. The version of these tables in PDF can be downloaded here .

5. Cautions and notes

Cautions

Simultaneous streaming (Streaming) of several audio files presents a rather small load on processors in PCs and consoles, but can create a serious problem on mobile platforms (especially on cheap or old devices). Below is a graph of measurements made by me in the Unity profiler. It assesses the effect of streaming multiple audio files on various Samsung Galaxy phones and my PC. The first graph shows 1-12 simultaneously reproducible audio sources, the second shows 1-3 sources on an enlarged scale.

If the sound is set to Decompress on Load, then the use of Vorbis compression will reduce its disk size by a factor of ten, but not in RAM, where the raw PCM data will still be stored. If you set the Compressed in Memory parameter, you can save RAM, but at the cost of CPU time, which unpacks the sound on the fly.
Each data type, with the exception of Streaming, by default loads sound into RAM and leaves it there until the scene is unloaded. If the whole game is played on one big stage, then the sounds can fill all available RAM. Worse, manual removal of audio clips from RAM is extremely inefficient and can lead to a drop in the frame rate, which can happen and if you assign this task to the garbage collector. If you have a lot of audio data, then you may need to perform optimization to reduce the amount of RAM in other areas, using Unity AssetBundle. Parameter Preload Audio Data also can not change this, because it determines only when the data is loaded into RAM, and not what happens to them later.
If the target platform supports the MP3 format, then note that automatic looping is not performed for it, so I do not recommend using MP3 compression for atmospheric and music loops. Due to the nature of MP3 encoding, a piece without a sound is often added to the end of the file so that the total number of samples is evenly divided into “frames” of 1,152 samples. There are ways to create seamless loops from MP3s, but this is a topic for another tutorial.
When Preload Audio Data is disabled and Load in Background is enabled, large files will not be played immediately, but the processor load will not increase. This is because the download takes time, but the main thread in this case is not idle.
When Preload Audio Data is disabled and Load in Background is disabled, large files on the first call occupy the main stream. However, this is not a problem when using FMOD, which performs decoding in a separate stream.

Remarks

Compression formats:

When importing files into Unity, they must always be in an uncompressed format, for example, in WAVE (.wav) or AIFF (.aiff). Many compressed formats are lossy formats , that is, when they are encoded, information is lost. If you import a compressed file, such as MP3 or Vorbis, into Unity, Unity first decodes it into an uncompressed format, and then re-encodes it into the format you choose, even if it is the same format you started with. This may add new compression artifacts, which is generally undesirable.
The AudioKinetic Wwise middleware documentation has a great article on various audio compression formats, their pros and cons, and supported platforms.
If you use FMOD, then you have access to its FADPCM format, which is significantly better than the old ADPCM format. However, it is not built into Unity.
You may want to use the iPhone instead of Vorbis, for example, MP3, because it has a hardware MP3 decoder that allows the processor not to unpack MP3 files stored in RAM or on disk. But be careful - this may not be appropriate for looped sounds (see the “Warnings” section above ); in addition, it can decode only one MP3 at a time. If you need to unpack several MP3s at the same time, this operation will be performed programmatically, as in the case of Vorbis. This should not pose any serious problems, but it's worth noting that MP3 decoding loads the processor a little more than Vorbis decoding.
If the target platform is a Playstation 4, then the ATRAC9 format provides a fairly high compression ratio with less processor overhead than Vorbis or MP3.
For Xbox One, a good replacement for Vorbis or MP3 is the Microsoft XMA format. For best performance and quality, Microsoft recommends a compression ratio of 8: 1 to 15: 1.

Miscellanea:

In the tables of recommended parameters I indicated for some sounds the type of compression PCM (actually without compression) and the type of loading Compressed in Memory. In this case, Decompress on Load is also suitable, I just wanted to show that you do not need to choose Streaming.
Some cross-platform packages, such as FMOD and Wwise, have their own ways of processing sound imports, which makes the Unity parameters superfluous (only if for some reason you don’t want to run some sounds without these packages).
In the table of recommended parameters there is a category “Noise Effects” (Foley), which can cause controversy, because it is a term from the movie that has a very specific meaning. It probably should not be used in games. However, I believe that this is the most appropriate term for different sounds related to the physical interactions between characters and objects in the game.
Although many target platforms have native sample rates (Sample Rate), Unity is constantly experimenting with mixing and sampling frequencies. This means that if the sound is not played at a rate of 100%, or if the sound is imported with the Optimize Sample Rate parameter, then the console / device will have to interpolate its own sampling rate on the fly so that everything is output with the native sampling rate. This is usually a negligible operation, even on mobile, but it can potentially cause problems on some platforms.

Tags: