OptimizeLab Overlay Developer Guide

Aug 25, 2021

The OptimizeLab Overlay software repository provides high-performance pre-compiled key applications and function libraries for the AArch64 architecture. It is provided as Launchpad personal package archives (PPAs) and built against Ubuntu 18.04 LTS and Ubuntu 20.04 LTS. This guide describes techniques and methods required to participate in the repository development so that it can be quickly started.

Repository Architecture

Currently, the repository consists of four components: base, database, media, and science. As development continues, we will also add more components based on user feedback and development plans.

You can install repositories as required. However, most of them depend on the base component, which bears the modification of basic system development tools such as the compilation toolchain. The database component is mainly used for database management systems such as MySQL and PostgreSQL. The media is audio and video encoding and decoding software. The science is targeted at the core software package of scientific computing and is mainly used in scenarios such as scientific computing and high-performance computing (HPC).

Currently, the following sub-repositories are planned: general library sub-repository common, deep learning software sub-repository deeplearning, and robot software sub-repository ros.

General Principles

The number of applications in a repository should not be as large as possible. An increasing number of software packages means the increase of maintenance workload. When the number of software packages reaches a certain value, the overall maintenance quality deteriorates. The OptimizeLab Overlay repository is oriented to performance optimization and aims to provide software versions with high-quality maintenance. Therefore, the software packages in the repository need to be carefully selected for the following purposes:

Improved performance or new significant functions.
No conflict with the existing software in the system.
No disturb to common developers who use the software in a repository. For example, applications that are dynamically compiled can still compatible with the original system in most cases through application binary interfaces (ABIs), so that applications compiled in the development environment that uses the current repository can run smoothly in the same version environment that does not use the repository.
Providing a smooth upgrade path for users to safely and smoothly upgrade the Ubuntu LTS version.
Changes introduced where necessary only, limiting the potential maintenance work.

Testing Environment

During the development, comply with the following principles:

The version of the development and test environment must be the same as the major version of the target environment.
The compilation test environment is the minimum basic system environment. Only software packages that are explicitly declared in build-essential and software package dependencies are installed.
During the compilation test of different software, a new environment is used each time to prevent the previous compilation test activities from changing the compilation environment and affecting the subsequent compilation result.
During compilation, dependencies are parsed based on the principle of unique selection. That is, if there are multiple candidate software in the dependency statement, only the first one is used to meet the dependency. If the first one does not meet the dependency, the compilation cannot continue.
You are advised to use sbuild or pbuilder to build the development and test environment.

Compatibility

When modifying library functions, ensure that APIs are compatible with the original version. Typically, ABIs must also be compatible with the original version.
For function libraries that cannot maintain the ABI compatibility, all reverse dependencies must be reviewed. Function libraries can be evaluated only when the number of reverse dependencies is less than 10 and all the dependencies are leaf packages without deep dependency.
For toolchain software, the default major version and compilation parameters of the system cannot be silently modified.
Software of different scenarios and categories is stored in independent sub-repositories. Except base and common, no dependency exists between sub-repositories.
For the final application software, all its reverse dependencies should be reviewed to evaluate and verify the behavioral compatibility in the case of process invocation.

Development and Maintenance

Initial Release

The overall development and release process consists of seven steps. To make the process as lightweight as possible, only a node description is provided if necessary.

Functions that have been automated are as follows:

Automatic compilation and release to the Staging repository, and one-click release to the formal repository after manual recheck

The following functions are planned to be automated:

Direct push to the compilation phase through the Git repository
Automatic running of test cases for the Staging repository through the debci platform

The following functions are not planned for automation:

When a new software package is introduced or the major version of the software package is updated, the license of the software must be manually reviewed according to Debian requirements.
During solution selection and evaluation, if necessary, use tools to scan and manually review the code of the upstream community to estimate the difficulty in subsequent maintenance.

Maintenance and Update Policies

The maintenance and upgrade policies are closely related to the software. The software package in the software repository needs to continuously track the update of security issues and provide quality update for the software package if possible. 100% of the security and quality issues that have been fixed in the Ubuntu official release need to be identified and fixed in the repository.

During maintenance and upgrade, the minimum modification principle is used. If conditions permit, only necessary modifications are made to fix target issues.
100% of the security and quality issues that have been fixed in the Ubuntu official release have been identified and fixed in the repository.
For issues that are not involved in the original release, track the upstream security and quality issue list and respond to maintenance in a timely manner.
Do not modify the security model that affects the system security. Comply with the security design of the original release, such as repository integrity, automatic upgrade, startup security, default network security, and application isolation.
Take a prudent attitude towards cryptographic software and changes that may affect FIPS compatibility. Do not introduce such software by default.

QA Pipeline in the Release Process

The release pipeline consists of two phases: Build and Test. Details are as follows:

Build:
- build: builds the binary package.
- build-source: builds the source code package.
Test:
- autopkgtest: runs the debci automatic test case.
- blhc: analyzes and scans the compilation log.
- lintian: scans accumulated known issues.
- piuparts: performs the installation and upgrade test.
- reprotest: tests reproducible build of the binary package.
- test-build-all: tests independent build of architecture-irrelevant software packages.
- test-build-all: tests independent build of architecture-relevant software packages.

In addition to the preceding steps, manual inspection is required before entering the formal repository to ensure that there is no back line feed in the verification of each step and no new issue is introduced.

Basic Knowledge and Skills

Debian Build Guide

The Ubuntu system is developed based on Debian and uses the .deb software package format. Therefore, mastering Debian software packaging is a basic skill for participating in OptimizeLab Overlay. To master the basic knowledge of Debian software packaging, read the following documents:

In addition to the preceding tutorials, you are also advised to read the following documents, which describe Debian's technical specifications, values for free software, and best practices summarized by many developers in the past. It is a must-read document for developers.

How to Confirm the Reverse Dependency

You can run the apt rdepends <package> command to list the reverse dependencies of a software package. For example, to query all software packages that depend on the software package julia, run the following command:

 $ apt rdepends julia
 julia
 Reverse Depends:
   Breaks: julia-common (<< 0.4.1-1~)
   Replaces: libjulia1 (<< 0.5.0~)
   Breaks: libjulia1 (<< 0.5.0~)
   Suggests: julia-doc
   Replaces: julia-common (<< 0.4.1-1~)
   Recommends: julia-common
   Recommends: science-mathematics
   Suggests: elpa-ess
   Recommends: science-numericalcomputation

What Are API and ABI?

API is short for application programming interface, and ABI is short for application binary interface. API compatibility indicates that a code segment can invoke a function library of any version for compilation to obtain an available program when the API remains unchanged. ABI compatibility indicates that a code segment can dynamically invoke the binary function library files (such as .so and .dll) of other versions that are compatible with the ABI without recompilation when the binary file is compiled using any compatible version.

Typically, an application program ABI includes the following aspects:

Invoking conventions
Data types
Parameter passing method
Method for obtaining returns
Program library function
Binary object file format
Exception handling process
Bytecode serialization mode
Register usage
...

On the Linux platform, except for bytecode applications, programs (such as target file format and function call procedure) are compatible by default under most conditions, mainly considering the compatibility of function data types, parameter passing method, and returns in the symbol table of the program.

How to View the Symbol Table of a Dynamically Linked Binary File?

You can use the objdump tool, which is stored in the binutils software package. In this example, the symbol table of libzstd.so.1.3.8 is analyzed. To clearly see symbols provided by the function library, the part with the GLIBC tag is excluded and only the first 30 lines are used to keep concise.

 $ objdump -T libzstd.so.1.3.8 | grep -v GLIBC | head -30

 libzstd.so.1.3.8:     file format elf64-x86-64

 DYNAMIC SYMBOL TABLE:
 0000000000000000  w   D  *UND*  0000000000000000              _ITM_deregisterTMCloneTable
 0000000000000000  w   D  *UND*  0000000000000000              __gmon_start__
 0000000000000000  w   D  *UND*  0000000000000000              _ITM_registerTMCloneTable
 000000000006c990 g    DF .text  0000000000000005  Base        ZBUFF_decompressInit
 000000000006c740 g    DF .text  000000000000000a  Base        ZBUFF_isError
 000000000000d890 g    DF .text  00000000000001f8  Base        ZSTD_CCtxParam_getParameter
 00000000000670d0 g    DF .text  000000000000006e  Base        ZSTD_DCtx_setMaxWindowSize
 000000000000db30 g    DF .text  0000000000000031  Base        ZSTD_CCtx_refCDict
 0000000000067090 g    DF .text  000000000000003d  Base        ZSTD_dParam_getBounds
 0000000000013850 g    DF .text  000000000000007f  Base        ZSTD_compressStream2_simpleArgs
 0000000000065860 g    DF .text  0000000000000007  Base        ZSTD_getFrameHeader
 0000000000065550 g    DF .text  000000000000026e  Base        ZSTD_getFrameHeader_advanced
 0000000000012a90 g    DF .text  000000000000013b  Base        ZSTD_CCtx_loadDictionary_advanced
 0000000000077be0 g    DF .text  000000000000000a  Base        ZDICT_isError
 000000000005bcb0 g    DF .text  0000000000000025  Base        ZSTDMT_endStream
 00000000000113a0 g    DF .text  0000000000000006  Base        ZSTD_CStreamInSize
 0000000000010cf0 g    DF .text  000000000000007b  Base        ZSTD_freeCDict
 0000000000013600 g    DF .text  0000000000000219  Base        ZSTD_compressStream2
 0000000000013be0 g    DF .text  000000000000006e  Base        ZSTD_estimateCDictSize
 000000000000dbb0 g    DF .text  000000000000000a  Base        ZSTD_CCtx_refPrefix
 0000000000010fe0 g    DF .text  000000000000011c  Base        ZSTD_initStaticCDict
 00000000000672a0 g    DF .text  000000000000001d  Base        ZSTD_decodingBufferSize_min
 000000000000dbc0 g    DF .text  0000000000000064  Base        ZSTD_CCtx_reset
 00000000000118c0 g    DF .text  0000000000000006  Base        ZSTD_minCLevel
 0000000000067370 g    DF .text  0000000000000cec  Base        ZSTD_decompressStream
 000000000006c770 g    DF .text  0000000000000005  Base        ZBUFF_createCCtx
 ......

How to Obtain a Readable Symbol Table for a C++ Program?

The compiler encodes more information in the symbol table of the C++ program, which affects manual analysis. In this case, you can run the c++filt command to decode the information. For example, you can parse libboost system.so.1.67.0, use the first 30 lines to exclude the impact of the GLIBC and C++ standard libraries.

 $ objdump -T libboost_system.so.1.67.0 | grep -E -v "(GLIBC|CXX)" | head -30 | c++filt

 libboost_system.so.1.67.0:     file format elf64-x86-64

 DYNAMIC SYMBOL TABLE:
 0000000000000000  w   D  *UND*  0000000000000000              _ITM_deregisterTMCloneTable
 0000000000000000  w   D  *UND*  0000000000000000              __gmon_start__
 0000000000000000  w   D  *UND*  0000000000000000              _ITM_registerTMCloneTable
 0000000000002430 g    DF .text  00000000000000af  Base        boost::system::system_category()
 0000000000002cd0 g    DF .text  000000000000004d  Base        boost::system::detail::system_error_category::message[abi:cxx11](int) const
 0000000000002e70  w   DF .text  0000000000000052  Base        boost::system::detail::system_error_category::~system_error_category()
 0000000000005b28  w   DO .data.rel.ro   0000000000000018  Base        typeinfo for boost::system::error_category::std_category
 0000000000005b18  w   DO .data.rel.ro   0000000000000010  Base        typeinfo for boost::noncopyable_::noncopyable
 0000000000002dc0  w   DF .text  0000000000000019  Base        boost::system::error_category::std_category::default_error_condition(int) const
 0000000000002590 g    DF .text  0000000000000737  Base        boost::system::detail::system_error_category::default_error_condition(int) const
 0000000000005b70  w   DO .data.rel.ro   0000000000000018  Base        typeinfo for boost::system::detail::system_error_category
 0000000000004260  w   DO .rodata        0000000000000024  Base        typeinfo name for boost::noncopyable_::noncopyable
 0000000000002d20  w   DF .text  000000000000000a  Base        boost::system::error_category::std_category::name() const
 0000000000002de0  w   DF .text  0000000000000013  Base        boost::system::error_category::std_category::~std_category()
 0000000000002ed0  w   DF .text  0000000000000038  Base        boost::system::detail::generic_error_category::~generic_error_category()
 0000000000002f70  w   DF .text  000000000000016a  Base        boost::system::error_category::std_category::equivalent(std::error_code const&, int) const
 00000000000030e0  w   DF .text  0000000000000179  Base        boost::system::error_category::std_category::equivalent(int, std::error_condition const&) const
 0000000000002210 g    DF .text  000000000000021a  Base        boost::system::detail::generic_error_category::message[abi:cxx11](int) const
 0000000000002e00  w   DF .text  0000000000000025  Base        boost::system::error_category::std_category::~std_category()
 0000000000002d70  w   DF .text  0000000000000006  Base        boost::system::error_category::default_error_condition(int) const
 00000000000024e0 g    DF .text  00000000000000af  Base        boost::system::generic_category()
 0000000000002e30  w   DF .text  0000000000000038  Base        boost::system::detail::system_error_category::~system_error_category()
 00000000000021f0 g    DF .text  0000000000000008  Base        boost::system::detail::generic_error_category::name() const
 0000000000002200 g    DF .text  0000000000000008  Base        boost::system::detail::system_error_category::name() const
 0000000000005b40  w   DO .data.rel.ro   0000000000000018  Base        typeinfo for boost::system::error_category
 0000000000002d80  w   DF .text  000000000000001a  Base        boost::system::error_category::equivalent(int, boost::system::error_condition const&) const
 ......

In some cases, different encode results are generated when the same symbol table is decoded. The possible causes are that the compiler versions are different or different C++ specification versions are used. In this case, the symbol tables are different and need to be distinguished.

FAQs

How to Commit a Patch

You can commit the patch as an issue attachment. Currently, all supported software packages are planned to be managed by Git. In this case, you can commit the patch to a pull request.

Which Methods Can Be Used to Improve the Performance of Software in the Repository?

Adjust the compilation parameters to optimize the efficiency. Compilation parameters may be architecture-related to generate better binary code, or may be architecture-independent, but the overall benefits for program running are good.
Introduce an efficiency improvement patch, and maintain the API and ABI compatibility with the original version.
Upgrade the software, introduce new features, and use the more efficient upstream software version. This method is applicable when the backport patch workload is heavy and the software package has less reverse dependencies.

For more information about performance optimization, see OptimizeLab Documents.

Can I Connect to the Internet During Software Compilation?

Do not connect to the Internet during software compilation to ensure that the compilation result is deterministic. Technical measures have been taken on the formal compilation platform to prevent network connection. During local development and test, check whether the network is connected to ensure the consistency of compilation results.

What Is autopkgtest and How to Use It?

autopkgtest is a test case design based on the DEP-8 standard defined by the Debian community. Different from the unit test provided by software, autopkgtest is mainly used to test the integration result after the release is integrated, the automatic CI test system is allowed to perform automatic cross-combination tests on software of different versions, so as to detect an impact that may be caused to other software when the software is updated. In the Debian project, the debci platform is used to execute all actions.

Footnotes

The repository itself is distributed under the Apache-2 license. For details, see the ``LICENSE` file. All software in the repository retains its original license agreement.

If you have any questions or requests, please commit an issue to us.

Select a Country or Region

Connectivity

Computing

Cloud

SEARCH HISTORY