Ideal OS: rethinking desktop operating systems

Original author: Josh Marinacci
  • Transfer
TL; DR : By the end of this essay, I hope to convince you of the following facts. Firstly, that modern desktop operating systems are worthless. They are bloated, inhibited and stuffed with legacy rubbish , and somehow work only thanks to Moore's law. Secondly, that innovations in desktop OSs stopped about 15 years ago , and the main players are hardly going to invest a lot in them again. And finally, I hope to convince you that we can and should start from scratch by learning the lessons of the past.

"Modern" desktop OS bloated


Take the Raspberry Pi. For $ 35, I can buy a great computer with four processor cores, each at a frequency of more than a gigahertz . It also has a 3D accelerator, gagabytes of RAM, built-in WiFi with Bluetooth and Ethernet. For 35 bucks! Nevertheless, for many tasks that I want to run on it, the Raspberry Pi is no better than the 66 megahertz computer that I had in college.



In fact, in some cases, he manages even worse. It took tremendous effort to launch Doom with 3D acceleration in X Windows in the mid-2000s, a trivial task for the mid-1990s in Microsoft Windows.

Below is a screenshot of the Processing environment, first launched on the Raspberry Pi with hardware acceleration, just a couple of years ago. And this was made possible only thanks to a very special X Windows video driver. This driver is still experimental and not officially released, five years after the release of the Raspberry Pi.



Despite the problems with X Windows, Raspberry Pi has a surprisingly powerful GPU that can produce the result as in the screenshot below, but only if you remove X from the path (the real screenshot below was made in OS X, but the same code works in Pi 3 on 60 fps).



Or another example. Today Atom is one of the most popular editors. Developers love it for a bunch of plugins, but let's see how it is written. Atom uses Electron, which is essentially a whole web browser with a NodeJS runtime. These are two Javascript engines built into one application. Electron applications use graphical browser APIs that access the native APIs, which then access the GPU (if you're lucky) to actually display the image. So many layers.



For a long time, Atom could not open the file for more than two megabytes , because scrolling was too slow. The problem was solved by writing a buffer implementation in C ++, essentially removing one extra layer.



Even the simplest applications are very complex these days. An email client like the one in the screenshot above is conceptually simple. There should be several database queries, a text editor and a module for communication with IMAP and SMTP servers. But creating a new email client is a difficult task, and it takes up a lot of megabytes of disk space, so few take it on. And if you want to modify your mail client, or at least the one in the screenshot (Mail.app, the default client for Mac), then there is no clear way to expand its functionality. No plugins. No API extensions. This is the result of multilayer trash and bloating.

No innovation


Innovations in desktop operating systems have essentially stopped. We can say that they ended somewhere in the mid-90s or even in the 80s with the release of the Mac, but all progress has definitely stopped after the smartphone revolution.

Mac OS


Mac OS X once shone with a fireworks display of new features, with each new version significant progress and inventions were observed. Quartz 2D! Expose! System synchronization of devices! Widgets! But now Apple is putting a minimum of effort into the desktop OS, unless it changes the design and strengthens the attachment to mobile devices.



The latest version of Mac OS X (now renamed macOS after the system that was twenty years ago) is called High Sierra. What major innovations are we looking forward to this fall? A new file system and a new video encoding format. Is that really all? Oh, and they also added the editing function in Photos, which was already in iPhotos, but it was deleted after the upgrade, and they will now block the automatic video production in Safari.

Apple is the most expensive company in the world, and this is the best thing it can come up with? Just desktop UX is not their priority.

Microsoft Windows


There was hectic activity in the Windows camp as Microsoft tried to reinvent the desktop as an operating system with touchscreen support for tablets and phones. This has become a disaster from which they are still recovering. In the process of this transition, they did not add any features really useful to desktop users, although they spent an absurd amount of money on creating a custom background image.



Instead of improving the desktop UX, they focused on adding new application models with more and more layers on top of the old code. By the way, Windows can still run applications from the early 90s.

The terminal program CMD.exe, which essentially allows you to run DOS applications, was replaced only in 2016. And the most significant innovation in the latest version of Windows 10? They added the Linux subsystem. Layered even more layers on top.

X windows




There were even fewer improvements in X Windows than in the other two desktop OS. In fact, this model represents a lack of change. People complained about this back in the early 90s . I'm glad that you can change the skin in the GUI, but what about the through system buffer, which fits more than one element at a time? This has not changed since the 80s!

In the mid-2000s, the layout of window managers was added, but due to legacy problems, it cannot be used for anything other than moving the windows back and forth.



Wayland should have fixed it, but after ten years of development, it is still not ready. It is really difficult to ensure compatibility with old code. I think that Apple made the right decision when it transferred the old macOS to an emulator called Classic, isolating it from the new code.

Workstations?


In a fundamental sense, it became easier to work with desktop OSs when they entered the mass market, but then this mass market switched to smartphones and companies lost any interest in improving desktop OSs.

I cannot blame Apple and Microsoft (and now Google) for this. The three billion smartphones that are replaced every two years are a much larger market than the several hundred million desktops and laptops that are replaced every five years.



I think we need to return the feeling of working with a desktop operating system. Such things were called workstations. If the desktop has freed itself from the bonds of the mass market, then you can again return the operating system to work.

What we do not have in 2017


Now is the year 2017. Let's see what should exist by now, but for some reason does not exist.

Why can I transfer tabs in the browser and file manager, but can not do it between two different applications? There are no technical limitations. Application windows are just bitmap rectangles of bits, ultimately, but the OS developers did not implement the function because it is not considered a priority.

Why can't I have a file in two places at the same time in my file system? Why is it fundamentally hierarchical? Why can't I sort files by tags and metadata? Database file systems have been around for decades. Microsoft tried to implement this feature in WinFS, but due to internal conflicts, deleted it from Vista before it was released. BeOS did this twenty years ago . Why is this feature not available in modern OS?



Any web application can be zoomed. I just press command + - and the text gets bigger. All elements in the window are automatically scaled. Why can't my native applications do this? Why can't I make one window with enlarged text and the other with a small one? Or even scale them automatically as you switch between windows? All these are trivial things for a window manager with layout, trivial technology for more than ten years.

Limited interaction


My computer has a mouse, keyboard, tilt sensors, light sensors, two cameras, three microphones and a ton of Bluetooth accessories; but only the first two are used as common input devices. Why can’t I give commands to the computer with my voice or gestures in the air, or better yet, he should follow my work and let me know when I’m tired and better to rest.

Why is my computer unable to follow my eyes and watch what I'm reading, or scan objects that I hold in my hands using any of these cool augmented reality technologies that will soon appear on smartphones. Some of these functions are in separate applications, but they are not common to all systems and are not programmable.

Why is my Macbook Pro unable to communicate with the right HID devices via Bluetooth instead of syncing via Apple Watch. Wait a minute, the Mac can't sync with the Apple Watch. This is another point where it gives way to my phone.

Why can't my computer use anything other than a display to display information? The new Razor notebook has a colored backlight under each key, but it is used only for transfusion with color waves . What about using LEDs for some useful task ! (Bjorn Stahl’s idea, I think).



Application bins


Almost every application on my computer is a bunker. Each application has its own part of the file system, its own configuration system, its own settings, database, file formats and search algorithms. Even their own assignments of keyboard shortcuts. This is an incredible amount of duplicated work.

More importantly, the lack of communication between applications makes it very difficult to coordinate their work. The founding principle of Unix was small tools that work together, but it is not implemented at all in X Windows.

Created for 1984


So why are our computers so clumsy? The bottom line is that they were created for 1984. The desktop GUI was invented when most users created a document from scratch, saved it, and printed. If you're lucky, you could save the document in a shared file system or send it to someone by mail. It's all. The GUI was created to work with tasks that were previously performed on paper.

The problem is that we live in 2017. We are no longer working as we did in 1984. On a typical day, I get the code from several remote sites, create several tests and generate a data structure that displays the result, it is then sent to the Internet for use by other people. Import, synthesis, export.

I am creating VR content. I process images. I post to dozens of social networks. My perfect playlist is made up of 30,000 songs. I process orders of magnitude more data from more sources than it was just 20 years ago, and even more so 40 years ago when these concepts were invented. The desktop metaphor simply does not scale to modern tasks. I need a computer that helps me do modern work.

We need a modern workstation




So, now we are at the theoretical level. Suppose we really have the resources and the way to provide (or ignore) backward compatibility. Suppose we can actually create something to design a different desktop for modern working methods. How do we do this?

First you need to get rid of everything that does not cope with its tasks.

  • Traditional file systems are hierarchical, with a slow search and do not store by default all the metadata we need.
  • All interprocess interactions . There are too many ways to communicate between programs. Channels, sockets, shared memory, RPC, kernel calls, drag-and-drop, copy-paste.
  • Command line interfaces are not up to date with application usage. We just can't do everything in clear text. I would like to redirect my video call via Skype to the video analysis service during a call, but I really can not start the video stream through awk or sed.
  • Window managers on traditional desktops do not monitor context or content and are not controlled by other programs.
  • Native applications are too heavy, they need to be developed for a long time and they live in bunkers.

So what remains with us? Little. We still have the kernel and device drivers. We can keep a reliable file system, but it will not be available to end users or applications. Now let's add some elements back.

Document database


Let's start with a common database of documents for the system. Wouldn’t it be easier to create a new mail client if the database is ready? The UI will consist of just a few lines of code. In reality, many common applications are just text editors combined with data queries. Take iTunes, address book, calendar, notifications, messages, Evernote, to-do list, bookmarks, browser history, password database and photo manager. Each of these programs is equipped with its own unique data warehouse. So much wasted effort and interference for interaction!

BeOS has proven that a database file system can really work and provides incredible benefits. We need to get her back.



A file system with a document database has many advantages over a traditional file system. Not only “files” exist in more than one place and become easily accessible for searching, but the guaranteed availability of a high-performance database makes application creation much easier.

Take iTunes for example. It stores mp3 files on disk, but all metadata is in a closed database. The presence of two "sources of truth" creates a lot of problems. If you add a new song to the disc, you must manually instruct iTunes to rescan it. If you want to develop a program that works with a database of songs, you will have to reverse engineer the iTunes DB format and pray that Apple does not change it. All these problems disappear with a single system database.

Message bus


The message bus will become a single way of interprocess interactions. We get rid of sockets, files, channels, ioctrl, shared memory, semaphores and everything else. All communications are only via the message bus. We get a single place for managing security and creating many interesting functions through competent proxying.

In reality, some of the types of communication will still remain as options for applications that need them, such as sockets for the browser, but all communications with the system and between applications go through a common bus.

Linker


Now we can add a linker - a window manager that really works with 3D surfaces, converts coordinates and is controlled through messages on the bus. Most of what a typical manager does, such as placing windows, overlay notifications, and determining which window is active, can actually be done by other programs that simply send messages to the linker, and it already does the real work.

This means that the linker will be closely integrated with the graphics driver, this is important to ensure high performance. Below is a diagram of Wayland, the linker that will someday work by default on Linux.



Applications display graphics on the screen, requesting a surface from the linker. After completing the graphics output and updating if necessary, they simply send messages: please redraw me. In practice, we will likely have several types of surfaces for 2D and 3D graphics, and maybe for an unprocessed video buffer. The important thing is that ultimately the linker controls everything that appears on the screen and when. If one application goes insane, the linker can suppress its screen output and ensure that the rest of the system works fine.

Applications become modules


All applications turn into small modules with all communications via the message bus. Completely . No more access to the file system. No access to hardware. Everything is only in the form of messages.

If you want to play the mp3 file, then send a message playto the mp3 service. Display graphics on the screen through the linker. This separation ensures system security. In Linux terminology, each application will become completely isolated through user permissions and chroot, possibly right down to Docker containers or virtual machines. Here you need to work out a lot of details, but everything is decided today.

Modular applications will be much easier to develop. If the database is the only source of truth, then there is no need to do a lot of work to copy data to and from memory. In the example with an audio player, the search field will not load data and perform filtering to display a list, it simply defines the query. The list is then tied to this query, and the data appears automatically. If another application adds a song to the database that matches the search query, the player’s UI is automatically updated. This is all done without any additional effort on the part of the developer. "Live" requests with auto-update greatly facilitate life and they are more reliable.

Alteration of applications


On such a basis, we can create everything we need. However, this also means that we have to redo everything from scratch. High-level structures on top of the database greatly simplify this process. Let's look at a few examples.

Email. If you divide the standard mail client into GUIs and network modules that communicate exclusively through messages on the bus, then program development will become much easier. The GUI should not know anything about Gmail or Yahoo mail, or how to handle SMTP error messages. It just searches the database for documents with the specified type of "email". When the GUI wants to send a message, it assigns the outgoing = true property to it . A simple module will compile a list of all outgoing mail messages and send them via STMP.

Separation of the mail client into components greatly facilitates the replacement of its individual parts. You can develop a new front-end in half a day, and you don’t have to rewrite the network modules. You can develop a spam filter without the user interface at all, it simply scans incoming messages, processes them and marks suspicious messages with the spam tag. He does not know and does not care about how spam is displayed in the GUI. He just does one good thing.

Mail filters can do other interesting things. For example, you sent a command to your bot by mail play beatles. The tiny module scans incoming mail and sends another message to the mp3 module to play music, and then marks the message as deleted.

When everything turns into database queries, the whole system becomes more flexible and customizable.

Command line


I know, I used to say that we will get rid of the command line. I take my words back. I really sometimes like the command line as an interface, my only concern is its purely textual nature. Instead of building chains of CLI applications with text streams, you need something more functional, like serialized streams of objects (like JSON, but more efficient). Then we will have real strength.

Consider the following tasks:

  • I want to use the laptop as an amplified microphone. I speak into it, and a voice sounds from the Bluetooth speakers at the other end of the room.
  • As soon as I post a tweet with the hashtag #mom, a copy of it should be emailed to my mom.
  • I want to use the iPhone as a microscope mounted on a stand from the Lego constructor. He transmits the picture to a laptop, where I have control - buttons for recording, pausing, zooming in and relaying live broadcasts on YouTube.
  • I want to make a simple Bayesian filter that responds to e-mail messages from Energosbyt, adds the tag “utilities”, makes an entry on the website, extracts the amount and date of payment from the letter and adds the entry to my calendar.

Each of these tasks is conceptually simple, but just think how much code you have to write to implement it today. With the command line interface on object streams, each of these examples fits into a script of one or two lines.

We can carry out more complex operations, such as “Find all photos taken over the past four years within a radius of 80 km from Yosemite National Park with a rating of 3 stars or higher, resize them to 1000px on the long side, upload to the Flickr album called“ The Best of Yosemite ”and put a link to the album on Facebook. This can be done with built-in tools, without additional programming, simply by connecting several primitives.



In fact, Apple created a similar system. It is called Automator. You can create powerful workflows in the graphical interface. The system has never been advertised, and now they are removing the binding to Applescript, on which everything works. Recently, all Automator employees have been transferred to other teams. Oh ...

Semantic keyboard shortcuts throughout the system


Now, after remaking the world, what do we do?

Services are available throughout the system. This means that we can launch a single service where the user can assign keybindings. It also means that keyboard shortcuts will have a deeper meaning. Instead of indicating the function of a particular program, they indicate a message about the command. In all applications that work with documents, there may be commands “Create a new document” or “Save”. The keyboard shortcut service will be responsible for turning control-S into a Save command. I call this semantic keybindings.

Using semantic keyboard shortcuts will make it much easier to support alternative input methods. Let's say you developed a fancy button on an Arduino, when you click on it, a phrase sounds. You do not need to write special code for it. Just tell Arduino to send a button press event, and then attach an audio file to the event in the keyboard shortcut editor. Turn a digital pot into a custom scroll wheel. UI now changes as you like.

Some research is still needed in this area, but it seems to me that semantic keyboard shortcuts will simplify the development of screen readers and other programs to facilitate access.

Window


In our new OS, any window is docked as a tab to another window. Or to the sidebar. Or to something else. Regardless of the application. Here is a lot of freedom for experimentation.



Old MacOS 8 had a kind of tabbed window, at least in the Finder app, that could dock at the bottom of the screen for quick access. Another cool thing that was thrown when switching to Mac OS X.

In the screenshot below, the user lifts the border of the window to see what's down there. It is very cool!



This was an example from the scientific article “Ametista: a mini-kit for exploring new ways to manage windows,” by Nicholas Roussel.

Since the system completely controls the environment of all applications, it can enforce security restrictions and demonstrate this to the user. For example, trusted applications may have green borders. A new application just downloaded from the Internet will have a red frame. An application of unknown origin has a black frame, or it is not displayed at all. Many types of spoofing will become impossible.

Clever copy paste


When you copied text from one window and switched to another, the computer knows that you copied something. He can use this knowledge to carry out some useful actions, for example, automatically move the first window to the side, leaving it in the field of visibility, and display the selected text in green. This helps the user maintain focus on the current task. When a user inserts text in a new window, you can show how a green fragment jumps from one window to another.

But why limit yourself to this. Let's make a clipboard that holds more than one element. We have gigabytes of memory. Let's use it. When I copy something, why should I remember what exactly I copied before pasting it in another window? The clipboard is nowhere to be seen. Fix it.

The clipboard should be displayed on the screen as a kind of shelf on which all copied fragments are stored. I can go to three web pages, copy their addresses to the clipboard, and then return to the document and paste all three at once.

The clipboard viewer allows you to scroll through the entire history of the clipboard. I can search in it and filter by tags. I can “attach” my favorite instances for later use.

The classic macOS actually had a great built-in tool called [name], but it was abandoned when moving to OS X. We had a future decades ago! We will return it back.

Working sets


And finally, we move on to what I consider the most powerful paradigm shift in our new Ideal OS. In the new system, all applications are tiny, isolated modules that only know what the system tells them. If you regard the database as the only source of truth, and the database itself is versioned, and our window manager is configured for every taste ... then really interesting things become possible.

I usually separate personal and work files. These are separate folders, accounts, sometimes different computers. In an Ideal OS, my files can be shared by the OS itself. I can have one screen with home mail and another screen with work mail. This is the same application, just initialized with different query settings.

When I open the file manager on the home screen, it only shows files intended for home projects. If I create a document on the working screen, then it is automatically tagged as a strictly working document. Managing all this is trivial; just a few extra fields in the database.

Researchers at the Georgia Institute of Technology have actually described such a system in their research paper, “Giornata: Redefining the Desktop Metaphor to Promote High-Qualified Work .



Now let's take one more step. If everything is versioned, even the GUI settings and window layout (since everything is stored in the database), I can save the state of the screen. It will store the current state of all parameters, even my keyboard shortcuts. I can continue to work, but there will always be an opportunity to return to this state. Or I can look at the old state - and restore it on a new screen. I essentially created a “template” that can be used again and again as soon as I start a new project. This template contains everything you need: email client settings, chat history, to-do lists, code, windows for describing bugs, or even the corresponding Github pages.

Now the entire state of the computer is essentially considered as a Github repository, with the ability to fork the state of the whole system. I think it will be just magical. People will exchange useful workspaces online, like Docker images. You can customize your workflows, add useful scripts to the workspace. The opportunities here are truly amazing.

None of this is new


So there you go. Dream. All of the above is based on three principles: a real-time, all- system versioned database, a real-time, all-system message bus , and a programmable linker .

I want to emphasize that absolutely nothing of what I talked about is new. I didn’t come up with anything. All these ideas are years or decades. File databases first appeared in BeOS. A single mechanism for interprocess interactions appeared in Plan 9. Setting up the environment from an edited document is implemented in Oberon. And of course there are still a lot of scientific articles with research results.

Why do not we have this?


Nothing new here. And we still do not have this? Why is that?

I suspect that the main reason is simply the complexity of developing a successful operating system. It is much more convenient to expand an existing system than to create something new; but expansion also means that you are limited by choices made in the past.

Can we really create the Perfect OS? I suspect not. Nobody has done it yet, because, to be honest, you won’t make money here. And without money, you simply won’t find resources for development.

However, if someone still sets the goal to create such an OS, or at least a working prototype, then I would start with a specific limited set of hardware with existing device drivers. Lack of driver support has always been the Achilles heel of desktop Linux. For example, the Raspberry Pi 3 would be a great option.

So my question is for you: do you think the idea is worth the effort to implement it, at least to create a working prototype? Would you participate in such a project? What part of the functionality should work for you to agree to take the system for testing? And of course, what do we call her?

If you are interested in discussing the future of desktop UX, subscribe to our new Ideal OS Design group .

Also popular now: