 August 12, 2012 at 19:11
 August 12, 2012 at 19:11A short digression into Linux on ARM and ArchLinux on Mele A1000 / A2000
Hello. 
Watching the video on youtube, how to run ubuntu on single-board computers like Mele A1000 or MK802, I decided that I need to do something with performance and at the same time run ArchLinux on this device, because for some reason, no one has done this yet.
Why do we see that all devices on ARM and MIPS work so slowly as desktop systems? The reasons are many:
If for x86 the first practically does not matter, because everyone has FPUs and optimizations for a specific processor give no more than 5-10% of the performance, then ARM devices can get a huge increase, for Mele it is something like from 30% to 150% on complex floating-point tasks (such as video decoding).
Since there is practically no full support in the mainline core of real devices, but not test boards, we are forced to use the kernel from the manufacturer, which is good if there is a third branch. Moreover, changes to the kernel are often made through the back, which is why we get missing dependencies between the options in the configuration menu and the impossibility of porting these changes to more recent kernels using enthusiast tools (this, of course, is not for everyone).
Distribution maintainers don’t want to waste strength, computing power and disk space for additional repositories, and for a very long time they compiled everything either with FPU emulation or with softfp (allows you to use FPU, but compatible with emulation), and with optimizations for ARMv5, but, about a little over a year ago, when the Cortex-A8 went to the masses, the maintainers thought about it and decided to try to compile everything with hardware floating point. So, ubuntu 12.04 was the first mass distribution in which the armhf repository appeared. This is a big progress, only this gives the Cortex-A8 an increase of 20% -40%, compared to softfp, all applications are now built under ARMv7, but this is not enough.
Today, there are 3 distributions that have repositories with hardware floating point: ArchLinux-ARM, Ubuntu and Fedora. Because I love ArchLinux, the choice is obvious to me.
“Why do maintainers collect packages without NEON?” You ask:
ARMv7 without a set of NEON instructions (for example, Marvell Armada in a CuBox device)
Cortex-A8 (ARMv7 + NEON, Chinese devices with Allwinner A10: Mele, MK802, MiniX)
Cortex-A9 without NEON (nVidia Tegra 2 in Toshiba AC100)
Cortex-A9 with NEON
It should be noted that VFP in Cortex-A9 is almost as fast as NEON, and NEON optimization is more a matter of energy saving than performance.
CFLAGS:
Hardware video acceleration does not work.
It is not clear whether gles works, as glxinfo says that direct rendering: yes, glxgears are spinning, but somehow not really. You can use it as a server.
How to record all this on a USB flash drive can be read at:
www.cnx-software.com/2012/07/20/nightly-builds-for-allwinner-a10-u-boot-linux-kernel-and-hardware-packs
To run interface, you need to log in as ssh root / root and type startx.
If you have the desire and the ability to help, you love ArchLinux and want to see it on Chinese devices on Allwinner, please contact me.
And a little more information: Allwinner is working with XBMC on video acceleration in XBMC for Android. There will be a real STB, but now somehow there isn’t even anything.
Download: rghost.ru/39743296
Watching the video on youtube, how to run ubuntu on single-board computers like Mele A1000 or MK802, I decided that I need to do something with performance and at the same time run ArchLinux on this device, because for some reason, no one has done this yet.
Performance issue
Why do we see that all devices on ARM and MIPS work so slowly as desktop systems? The reasons are many:
- All devices are different. Some support one set of commands, some others. Some have FPUs, some don't. Some ARMv5, others ARMv6, others ARMv7
- Unoptimization of compilers.
- Sluggishness / inaction of manufacturers.
- The complexity of support and the sluggishness of distributors.
If for x86 the first practically does not matter, because everyone has FPUs and optimizations for a specific processor give no more than 5-10% of the performance, then ARM devices can get a huge increase, for Mele it is something like from 30% to 150% on complex floating-point tasks (such as video decoding).
Since there is practically no full support in the mainline core of real devices, but not test boards, we are forced to use the kernel from the manufacturer, which is good if there is a third branch. Moreover, changes to the kernel are often made through the back, which is why we get missing dependencies between the options in the configuration menu and the impossibility of porting these changes to more recent kernels using enthusiast tools (this, of course, is not for everyone).
Distribution maintainers don’t want to waste strength, computing power and disk space for additional repositories, and for a very long time they compiled everything either with FPU emulation or with softfp (allows you to use FPU, but compatible with emulation), and with optimizations for ARMv5, but, about a little over a year ago, when the Cortex-A8 went to the masses, the maintainers thought about it and decided to try to compile everything with hardware floating point. So, ubuntu 12.04 was the first mass distribution in which the armhf repository appeared. This is a big progress, only this gives the Cortex-A8 an increase of 20% -40%, compared to softfp, all applications are now built under ARMv7, but this is not enough.
Today, there are 3 distributions that have repositories with hardware floating point: ArchLinux-ARM, Ubuntu and Fedora. Because I love ArchLinux, the choice is obvious to me.
Devices
“Why do maintainers collect packages without NEON?” You ask:
ARMv7 without a set of NEON instructions (for example, Marvell Armada in a CuBox device)
Cortex-A8 (ARMv7 + NEON, Chinese devices with Allwinner A10: Mele, MK802, MiniX)
Cortex-A9 without NEON (nVidia Tegra 2 in Toshiba AC100)
Cortex-A9 with NEON
It should be noted that VFP in Cortex-A9 is almost as fast as NEON, and NEON optimization is more a matter of energy saving than performance.
What was done
- The last U-Boot and kernel from dl.linux-sunxi.org/nightly/latest
- Compiled important packages with NEON and optimizations for Cortex-A8 (glibc, xz, bzip2, gzip, bash, openssl, zlib)
- Recompiled packages that my hands reached (mplayer2)
- Added video driver and GLES libraries (not sure about performance)
- Everything was compiled by Linaro GCC, because It is the most optimized for ARM.
CFLAGS:
march=armv7-a -mfloat-abi=hard -mfpu=neon -ftree-vectorize -mvectorize-with-neon-quad -mcpu=cortex-a8 -mtune=cortex-a8 -mthumb -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2Hardware video acceleration does not work.
It is not clear whether gles works, as glxinfo says that direct rendering: yes, glxgears are spinning, but somehow not really. You can use it as a server.
How to record all this on a USB flash drive can be read at:
www.cnx-software.com/2012/07/20/nightly-builds-for-allwinner-a10-u-boot-linux-kernel-and-hardware-packs
To run interface, you need to log in as ssh root / root and type startx.
If you have the desire and the ability to help, you love ArchLinux and want to see it on Chinese devices on Allwinner, please contact me.
And a little more information: Allwinner is working with XBMC on video acceleration in XBMC for Android. There will be a real STB, but now somehow there isn’t even anything.
Download: rghost.ru/39743296