A little history about the `yes` command in Unix
- Transfer
What do you know the simplest Unix command? There is
Among the many simple Unix commands, a command hid
Although at first glance the team seems pointless, but sometimes it is useful:
Ever installed a program that requires you to type "y" and press Enter to install? The team
Here is the basic version on ... hmm ... BASIC.
And here is the same thing in Python:
Seems simple? Wait a minute!
As it turns out, such a program runs quite slowly.
Compare with the built-in version on my "poppy":
So I tried to write a faster version in Rust. Here is my first attempt:
Some explanations:
We will test.
Oops, nothing really improved. It is even slower than the Python version! This interested me, so I looked for the source code for the implementation in C.
Here is the very first version of the program that was released as part of Version 7 Unix by Ken Thompson on January 10, 1979:
No magic.
Compare with the 128-line version of the GNU coreutils kit, which has a mirror on Github . After 25 years, the program is still in active development! The last code change happened about a year ago. She's pretty fast:
The important part is at the end:
Yeah! So it simply uses a buffer to speed up write operations. The buffer size is set constant
I have expanded my Rust program:
It is important here that the buffer size is divided by four, this ensures alignment in memory .
Such a program produces 51.3 MiB / s. Faster than the version installed on my system, but much slower than the version from the author of the post I found on Reddit . He says he achieved a speed of 10.2 GiB / s.
As usual, the Rust community did not disappoint. As soon as this article got into Rust , the user nwydo pointed to a previous discussion on this topic. Here is their optimized code that breaks through 3 GB / s on my machine:
So this is a completely different matter!
The only thing I can add is this
The trivial program
Recycling standard Unix tools is a fun experience and it makes you appreciate the nifty tricks that make our computers fast.
echo, which prints a line in stdout, and there is true, which does nothing, but only ends with zero code. Among the many simple Unix commands, a command hid
yes. If you run it without arguments, you will get an endless stream of characters "y", each with a new line:y
y
y
y
(...ну вы поняли мысль)Although at first glance the team seems pointless, but sometimes it is useful:
yes | sh boring_installation.shEver installed a program that requires you to type "y" and press Enter to install? The team
yescomes to the rescue! She will neatly complete this task, so you can not be distracted from watching the Pootie Tang .Write yes
Here is the basic version on ... hmm ... BASIC.
10 PRINT "y"
20 GOTO 10And here is the same thing in Python:
while True:
print("y")Seems simple? Wait a minute!
As it turns out, such a program runs quite slowly.
python yes.py | pv -r > /dev/null
[4.17MiB/s]Compare with the built-in version on my "poppy":
yes | pv -r > /dev/null
[34.2MiB/s]So I tried to write a faster version in Rust. Here is my first attempt:
use std::env;
fn main() {
let expletive = env::args().nth(1).unwrap_or("y".into());
loop {
println!("{}", expletive);
}
}Some explanations:
- The line we print in the loop is the first command line parameter called expletive . I learned this word from the manual
yes. - I use
unwrap_orto get expletive from parameters. If parameters are not set, the default is "y". - The default parameter is converted from string fragment (
&str) toowned()in heap (String) withinto().
We will test.
cargo run --release | pv -r > /dev/null
Compiling yes v0.1.0
Finished release [optimized] target(s) in 1.0 secs
Running `target/release/yes`
[2.35MiB/s]Oops, nothing really improved. It is even slower than the Python version! This interested me, so I looked for the source code for the implementation in C.
Here is the very first version of the program that was released as part of Version 7 Unix by Ken Thompson on January 10, 1979:
main(argc, argv)
char **argv;
{
for (;;)
printf("%s\n", argc>1? argv[1]: "y");
}No magic.
Compare with the 128-line version of the GNU coreutils kit, which has a mirror on Github . After 25 years, the program is still in active development! The last code change happened about a year ago. She's pretty fast:
# brew install coreutils
gyes | pv -r > /dev/null
[854MiB/s]The important part is at the end:
/* Repeatedly output the buffer until there is a write error; then fail. */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
continue;Yeah! So it simply uses a buffer to speed up write operations. The buffer size is set constant
BUFSIZ, which is selected for each system in order to optimize I / O operations (see here ). On my system, it was set as 1024 bytes. In reality, the best performance was with 8192 bytes. I have expanded my Rust program:
use std::io::{self, Write};
const BUFSIZE: usize = 8192;
fn main() {
let expletive = env::args().nth(1).unwrap_or("y".into());
let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
loop {
writeln!(writer, "{}", expletive).unwrap();
}
}It is important here that the buffer size is divided by four, this ensures alignment in memory .
Such a program produces 51.3 MiB / s. Faster than the version installed on my system, but much slower than the version from the author of the post I found on Reddit . He says he achieved a speed of 10.2 GiB / s.
Addition
As usual, the Rust community did not disappoint. As soon as this article got into Rust , the user nwydo pointed to a previous discussion on this topic. Here is their optimized code that breaks through 3 GB / s on my machine:
use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;
use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;
pub fn to_bytes(os_str: OsString) -> Vec {
use std::os::unix::ffi::OsStringExt;
os_str.into_vec()
}
fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
if output.len() > buffer.len() / 2 {
return output;
}
let mut buffer_size = output.len();
buffer[..buffer_size].clone_from_slice(output);
while buffer_size < buffer.len() / 2 {
let (left, right) = buffer.split_at_mut(buffer_size);
right[..buffer_size].clone_from_slice(left);
buffer_size *= 2;
}
&buffer[..buffer_size]
}
fn write(output: &[u8]) {
let stdout = io::stdout();
let mut locked = stdout.lock();
let mut buffer = [0u8; BUFFER_CAPACITY];
let filled = fill_up_buffer(&mut buffer, output);
while locked.write_all(filled).is_ok() {}
}
fn main() {
write(&env::args_os().nth(1).map(to_bytes).map_or(
Cow::Borrowed(
&b"y\n"[..],
),
|mut arg| {
arg.push(b'\n');
Cow::Owned(arg)
},
));
process::exit(1);
} So this is a completely different matter!
- We have prepared a filled-in string buffer that will be reused in each loop.
- The standard output stream (stdout) is protected by a lock . So instead of continuously capturing and releasing, we keep it all the time.
- We use native ones for the platform
std::ffi::OsStringandstd::borrow::Cowto avoid unnecessary allocations in memory.
The only thing I can add is this
убрать необязательный mut.Lessons learned
The trivial program
yesactually turned out to be not so simple. To improve performance, it uses output buffering and memory alignment. Recycling standard Unix tools is a fun experience and it makes you appreciate the nifty tricks that make our computers fast.