LLVM’s TableGen tool, as well the domain-specific language that goes with it, are exceptionally powerful. Tablegen is intended to take an extremely compact representation of almost everything a target machine does, and to spew out tons of helpful C++ support code, to make writing a compiler easier.
However, the documentation for TableGen is dreadful. It’s not written to the audience of developers who trying to port LLVM to a new target.
Here are some items that I wish that the documentation had explained going in.
- At its core, LLVM is fundamentally a set of base classes. The base classes can be used to build language-related tools, such as a compiler, an assembler, a linker, and so on. LLVM is not a single toy; it’s a bunch of Lego blocks that can be pulled apart and reassembled. Being productive in LLVM development, is strongly related to understanding the existing LLVM class hierarchy. This, in turn, implies that you must first understand what the LLVM class libraries take care of for you, so that you don’t reinvent the proverbial wheel.
- The standard LLVM distribution, for Windows anyway, doesn’t have all the built-in tools that you will need to work on LLVM. You will need to bootstrap your own build of the LLVM compiler, with all the bells and whistles enabled, to do productive work. This will probably be a via a two-stage bootstrap, where you use Visual C++ to generate a stage-one LLVM compiler, and then use the stage-one compiler to generate a complete, optimized stage-two LLVM compiler.
- On Windows, make the new Visual Studio Code your primary development environment. Use CMake and Ninja as your build system. Although it is possible to generate a working build with classic Visual studios, do not do so, unless you absolutely must for some reason. The projects generated by CMake for LLVM on Visual Studio, are bloated and slow. You will eventually hate life as you wait for your Visual Studio solution to check every single project, in order to compile a single file.
- The “Porting LLVM” guide suggests that you start with the SPARC target implementation, and copy it, in order to implement your own target support. This might have been a good idea twelve years ago, but it’s a bad idea now. A far more workable solution is to start from an implementation that already resembles your target in some way. For example
- Although the TableGen language is a general-purpose templating language, a great deal of convention already exists regarding what kinds of things can be done with the language. You must observe these conventions, as the base classes in LLVM
Bootstrapping the LLVM compiler on Windows
Here’s the formula I used to bootstrap a complete release build of LLVM on Windows.
- Open a Visual Studio 2019 command prompt window
git clone https://github.com/llvm/llvm-project
cd llvm-project
mkdir -p build/stage1
cd build/stage1
cmake ../../llvm -G "Visual Studio 16 2019" -A x64 -Thost=x64 -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;libcxx;libcxxabi;lldb;compiler-rt;lld" -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=1 -DCLANG_ENABLE_BOOTSTRAP=On -DCMAKE_INSTALL_PREFIX=[ a complete path to a stage-one compiler output directory ]
cd ..
mkdir stage2
cd stage2
cmake ../../llvm -G Ninja -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR -DLLVM_TARGETS_TO_BUILD="AArch64;X86" -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;libcxx;libcxxabi;lldb;compiler-rt;lld" -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=1 -DCLANG_ENABLE_BOOTSTRAP=On -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=c:/git/llvm-project/build/ninja-clang-release/install -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_LINKER=C:\git\llvm-project\build\ninja-msvc-release\bin\lld-link.exe