Wednesday, April 22, 2026

This page runs on coffee, please consider supporting it.

Tuesday, April 21, 2026

CapyPDF is approaching feature sufficiencyIn the past I have written many blog posts on implementing various PDF features in CapyPDF . Typically they explain the feature being implemented, how confusing the documentation is, what perverse undocumented quirks one has to work around to get things working and so on. To save the effort of me writing and you reading yet another post of the same type, let me just say that you can now use CapyPDF to generate PDF forms that have widgets like text fields and radio buttons. What makes this post special is that forms and widget annotations were pretty much the last major missing PDF feature Does that mean that it supports everything? No. Of course not. There is a whole bunch of subtlety to consider. Let's start with the fact that the PDF spec is massive , close to 1000 pages. Among its pages are features that are either not used or have been replaced by other features and deprecated. The implementation principle of CapyPDF thus far has been "implement everything that needs special tracking, but only to the minimal level needed". This seems complicated but is in fact quite simple. As an example the PDF spec defines over 20 different kinds of annotations. Specifying them requires tracking each one and writing out appropriate entries in the document metadata structures. However once you have implemented that for one annotation type, the same code will work for all annotation types. Thus CapyPDF has only implemented a few of the most common annotations and the rest can be added later when someone actually needs them. Many objects have lots of configuration options which are defined by adding keys and values to existing dictionaries. Again, only the most common ones are implemented, the rest are mostly a matter of adding functions to set those keys. There is no cross-referencing code that needs to be updated or so on. If nobody ever needs to specify the color with which a trim box should be drawn in a prepress preview application, there's no point in spending effort to make it happen. The API should be mostly done, especially for drawing operations. The API for widgets probably needs to change. Especially since form submission actions are not done. I don't know if anything actually uses those, though. That work can be done based on user feedback.📝Nibble Stew
Meeting C++ 2025I have been meaning to write up this conference for a very long time. The call for papers for this year (2026) runs until 4th June. Here's the link: https://meetingcpp.com/mcpp/submittalk/ . So, last year's conference was great, as ever. Anthony Williams gave the opening keynote, called "Software and Safety". I had to look that up, because my notes say "Think Harder/Think outside the box". His talk encouraged us to do just that. He discussed UB, pointing out that it often comes from unexpected events and talked through how to avoid the unexpected. He pointed out validation is a design problem, and used the analogy of Swiss cheese: you can have several holes, but with a thick enough slice of cheese, you won't be able to see right through it. So, having several layers of testing helps. For example, in addition to unit tests, using sanitizers and fuzz testing is useful. To continue this theme, I went to see Anders Schau Knatten talking about "Real-time Safety - Guaranteed by the Compiler!" next. He went through Clang's non-blocking and non-allocating attributes, which I've not used before. He talked about real-time sanitizers (RTSan) too. He always finds neat, short examples that stick in my head for a while afterwards. I haven't got around to trying these out yet - I really should. After lunch I went to "Using std::generator<> in Practice" by Nicolai Josuttis. He talked about using them to write state machines and showed how much less boilerplate code you need to swap from states. As ever, he gave simple examples and pointed out various important things to be aware of, like getting in trouble when using references. Later I went to see Anders again, talking about "The Two Memory Models". He explained atomics and memory ordering. My notes make no sense - I'll have to rewatch the talk. I do remember coming away and feeling like this had made more sense than usual though. Memory ordering is a deep topic. I gave the "Center keynote" the next day. I used the title "Stoopid questions", and dug into why people are often shy about speaking up when they don't understand something. There is, after all, no such thing as a stupid question. I suspect many people end up mentoring or training others and haven't been taught how to teach. I am a trained secondary school maths teacher, so have had some input on this. Teaching is hard work, and can be emotionally draining, But learning is draining too. If you can find engaging or fun examples that helps. I got loads of questions afterwards, so I clearly got people thinking. I went to "The Missing Step: Making Data Oriented Design One Million Times Faster" by Andrew Drakeford next. I've seen a variant of this before, but he had added more details and things to think about. The punchline is optimising code might require a bit of thinking and some maths :-) I don't have notes after that. My head was still spinning after giving my talk But I did take photos. I went to "(Don't) use coroutines for this" by Ivan Cukic on the last day. He talked about "prog" C++ a while ago, so talked about "punk" C++ this time. For example, trying to get a stack based, rather than heap based, coroutine., and various other ways to subvert the original intentions of this relatively new language feature. Lots of fun, and lots of C++ covered, beyond just coroutines. If you can make people laugh by being slightly subversive, you are winning. Steve Love gave a talk called "CMake for the impatient" later. I can use CMake but don't have it 100% clear in my head. Many people give in depth talks, so it was nice to see a more beginner-level one for a change. We've all got gaps in our knowledge, after all. If you don't know CMake, this talk is a good place to start. I did go to several other talks, but I'll leave you to watch the videos. One last talk I'll mention is a lightning talk, by Rahel Natalie Engel. It was called "Let them eat cake", wherein she introduced a language based on C++ designed for teaching, called "Cat Pie". The talk is here: https://www.youtube.com/watch?v=gQ6grpbhW8k and you can try it out online here: https://catpie.compscicomp.de/ . The idea is to get a cat to move around a maze and eat some pie. The "Help" button takes you to a cheat sheet, and you can read more details here: https://dl.gi.de/server/api/core/bitstreams/d10b2056-9d7b-4552-bac7-a510dd2522e3/content . You can find the conference videos on YouTube . If you can't think of a talk to propose, you could volunteer instead. Or get someone to buy a ticket for you for this year. If you need help persuading work to let you go, reach out to someone to get help explaining why going to a conference like this is such a great learning opportunity. I really should get on with my proposal for this year now.📝BuontempoConsulting
Boost.URL: Audited, Constexpr, and PolishedWe had been putting off the Boost.URL security review for a while. There was always something more urgent. When the review finally happened, it confirmed what we hoped: the core parsing logic held up well. Around the same time, a constexpr feature request that we had been dismissing suddenly became a cross-library collaboration when other Boost maintainers started applying changes to their own libraries. And while working on Boost.Beast2 integration, we noticed friction in common URL operations that led us to clear a backlog of usability improvements. Security Review Round 1: 1,207 Findings (February 2, 2026) Round 2: 27 Findings (February 17, 2026) Round 3: 15 Findings (April 2, 2026) Compile-Time URL Parsing The Conversation That Changed Everything Error Handling at Compile Time The -Wmaybe-uninitialized Problem The Shared Library Problem The Result Usability Improvements Convenience Functions C++20 Integration Performance Acknowledgments and Reflections Security Review The C++ Alliance arranges professional security audits for the libraries we maintain. The results for Boost.Beast (2020) and Boost.JSON (2021) are publicly available. For Boost.URL, we always had the plan but kept delaying because there was so much other work to do first. That delay turned out to be a good thing: we found and fixed issues ourselves first, so the reviewers could focus on the subtle problems. Laurel Lye Systems Engineering conducted three rounds of assessment. Each finding was manually reviewed against the source code and categorized as a confirmed bug (fixed), a false positive, or a deliberate design choice. For every confirmed bug, we also proposed new test cases to prevent regressions. Round 1: 1,207 Findings (February 2, 2026) The first assessment was the broadest. Of 1,207 findings, 15 were confirmed bugs resulting in fix commits. The vast majority were false positives or by-design patterns: Verdict CRITICAL HIGH MEDIUM LOW INFO Total FIXED 1 9 0 2 3 15 FALSE POSITIVE 3 47 46 186 110 392 BY DESIGN 0 129 445 170 56 800 Total 4 185 491 358 169 1,207 The single CRITICAL fix was a loop condition in url_base that dereferenced *it before checking it != end. Three other CRITICAL findings were false positives: the audit flagged raw-pointer writes in the format engine, but these use a two-phase measure/format design that guarantees the buffer is pre-sized correctly. Most false positives fell into recognizable themes: BOOST_ASSERT as sole bounds check (29 HIGH findings): internal _unsafe functions rely on preconditions validated by the public API. The _unsafe suffix signals the contract. This is the standard Boost/STL pattern (std::vector::operator[] vs at()). Non-owning view lifetime safety (27 HIGH findings): string_view and url_view types do not own their data. The audit flagged potential use-after-free, but lifetime management is the caller’s responsibility by design. Atomic reference counting (multiple findings across all rounds): the audit tool did not recognize the #ifdef BOOST_URL_DISABLE_THREADS guard that switches between std::atomic and plain std::size_t. Round 1 fix commits bcdc891 CRITICAL: url_base loop condition order ec15fce HIGH: encode() UB pointer arithmetic for small buffers 81fcb95 HIGH: LLONG_MIN negation UB in format 42c8fe7 HIGH: ci_less::operator() return type 76279f5 HIGH: incorrect noexcept in segments_base::front() and back() d4ae92d HIGH: recycled_ptr::get() nullptr when empty 8d98fe6 LOW: decode() noexcept on throwing template The proportion of false positives to confirmed bugs was large enough that we discussed a second round with Laurel Lye, where we shared the false positive categories we had identified so they could be more targeted. Round 2: 27 Findings (February 17, 2026) The second assessment was more targeted. The reviewers had learned from our Round 1 triage and produced fewer false positives: Verdict HIGH MEDIUM LOW INFO Total FIXED 7 3 1 1 12 FALSE POSITIVE 2 2 0 0 4 BY DESIGN 0 0 1 1 2 ALREADY FIXED 0 5 4 0 9 Total 9 10 6 2 27 9 of the 27 findings had already been fixed in Round 1 commits. The new confirmed bugs included a heap overflow in format center-alignment padding (lpad = w / 2 used total width instead of padding amount), an infinite loop in decode_view::ends_with with empty strings, and an OOB read in ci_is_less on mismatched-length strings. Both rounds are tracked in PR #982. Round 2 fix commits d06df88 HIGH: format center-alignment padding 4fe2438 HIGH: decode_view::ends_with with empty string f5727ed HIGH: stale pattern n.path after colon-encoding d045d71 HIGH: ci_is_less OOB read 88efbae HIGH: recycled_ptr copy self-assignment fe4bdf6 MEDIUM: url move self-assignment ab5d812 MEDIUM: encode_one signed char right-shift b662a8f MEDIUM: encode() noexcept on throwing template 5bc52ed LOW: port_rule has_number for port zero at end of input 9c9850f INFO: ci_equal arguments by const reference 4f466ce test: public interface boundary and fuzz tests Round 3: 15 Findings (April 2, 2026) The third round was the shortest and the most precise. Of 15 findings, 4 were confirmed bugs and 11 were false positives. No CRITICAL findings. The false positives were the same recurring themes (atomic refcounting, pre-validated format strings, preconditions guaranteed by callers). Verdict HIGH MEDIUM LOW Total FIXED 0 1 3 4 FALSE POSITIVE 4 6 1 11 Total 4 7 4 15 The confirmed bugs were more subtle: a decoded-length calculation error in segments_iter_impl::decrement() that only manifested during backward iteration over percent-encoded paths, two noexcept specifications on functions that allocate std::string (which can throw bad_alloc), and a memcpy with null source when size is zero (undefined behavior per the C standard, even though it copies nothing). This round is tracked in PR #988. Round 3 fix commits 7cd6702 MEDIUM: segments_iter_impl decoded-length in decrement() b55336c LOW: param noexcept on throwing constructor 7c0665d LOW: string_view_base noexcept on throwing operator std::string() 003696d LOW: url_view memcpy with null source when size is zero The progression from 1,207 findings to 27 to 15 shows the reviewers learning the peculiarities of our codebase. The ratio of false positives dropped with each round, and the confirmed bugs got more subtle. %%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e4eee8", "primaryBorderColor": "#affbd6", "primaryTextColor": "#000000", "lineColor": "#baf9d9", "secondaryColor": "#f0eae4", "tertiaryColor": "#ebeaf4", "fontSize": "14px"}}}%% mindmap root((Confirmed Bugs)) UB in edge cases encode_one right-shift LLONG_MIN negation pointer arithmetic Self-assignment url move recycled_ptr copy OOB reads ci_is_less decode_view ends_with Incorrect noexcept encode / decode segments_base front/back param constructor string_view_base operator Iterator bugs segments decoded-length Null pointer recycled_ptr get url_view memcpy Compile-Time URL Parsing constexpr URL parsing has been one of the most recurring requests since the library’s inception. Every few months someone would ask about it, and every few months we would decide the refactoring cost was too high. The parsing engine is heavily buffer-oriented, and moving enough code into headers for constexpr evaluation required careful refactoring without breaking the shared library build. When we finally prototyped it, the diff touched thousands of lines, but most of those were code being moved from source files to headers rather than new logic. The actual new code was limited to alternative code paths to bypass non-literal types and refactoring url_view_base to eliminate a self-referencing pointer that prevented constexpr evaluation. Still, given the size of the change, we initially marked it as unactionable and moved on to the security review. Beyond the refactoring cost, we had blockers beyond our control. Our parsing code depended on boost::optional (not a literal type, no constexpr constructors), boost::variant2 (not literal when containing optional), and boost::system::result (could not be constructed with a custom error_code in constexpr because error_category::failed() is virtual). Without changes to those libraries, constexpr URL parsing was not possible regardless of how much we refactored our own code. The Conversation That Changed Everything Then Peter Dimov, the maintainer of Boost.System and Boost.Variant2, joined the conversation. We had assumed that system::result could not be constexpr in C++14 because it wraps error_code, which uses virtual functions. Peter pointed out that system::result is already a literal type in C++14 when T is literal and the error code is not custom. Boost.URL uses a custom error code category, and constructing a result from a custom error_code requires calling error_category::failed(), which is virtual and therefore not constexpr before C++20. Peter offered to fix this in Boost.System (#141, af53f17) for C++20 so that custom error codes would also work at compile time. Allowing constexpr virtual functions in C++20 Peter Dimov is also one of the authors of P1064: “Allowing Virtual Function Calls in Constant Expressions”, the C++ committee proposal that made constexpr virtual functions possible in C++20. The paper uses error_code and error_category as the motivating example. That shifted the problem. Instead of building our own constexpr_result type to bypass the entire error handling system, we could use system::result directly in C++20. The scope of the refactoring shrank, and we focused on C++20 as the initial target. The remaining blocker was that system::result requires T to be a literal type, and we use boost::optional heavily in our parsing code. boost::optional was not a literal type. Andrzej Krzemieński, the Boost.Optional maintainer, started working on it. The conversation went back and forth on the C++14 constraints: std::addressof is not constexpr until C++17, mandatory copy elision is only available in C++17, and there were questions about what subset of constructors could realistically become constexpr in C++14. After several iterations (including a feature/constexpr branch), the constexpr implementation landed on develop. With optional becoming literal, boost::variant2 containing optional could also become literal. All three blockers were now resolved. Peter had fixed Boost.System, Andrzej had fixed Boost.Optional, and we contributed fixes to Boost.Variant2. There was no going back: we could no longer dismiss the constexpr feature after three library maintainers had already done their part. %%{init: {"theme": "base", "themeVariables": {"primaryColor": "#f7f9ff", "primaryBorderColor": "#9aa7e8", "primaryTextColor": "#1f2a44", "lineColor": "#b4bef2", "secondaryColor": "#fbf8ff", "tertiaryColor": "#ffffff", "fontSize": "14px"}}}%% flowchart TD A[Boost.URL constexpr parsing] --> B[Boost.Optional] A --> C[Boost.Variant2] A --> D[Boost.System] B --> E[boost::optional constexpr] C --> F[boost::variant2::variant constexpr] D --> G[boost::system::result constexpr] D --> H[boost::system::error_code constexpr] Cross-library commits for constexpr support Boost.URL (PR #976, PR #981) 0a2c39f feat: constexpr URL parsing for C++20 b9db439 build: remove -Wno-maybe-uninitialized from GCC flags (see below) 59b4540 fix: suppress GCC false-positive -Wmaybe-uninitialized in tuple_rule (see below) Boost.Optional (issue #143, PR #145) 3df2337 make optional constexpr in C++14 046357c add more robust constexpr support 88e2378 add -Wmaybe-uninitialized pragma (see below) Boost.Variant2 (PR #57) b6ce8ac add missing -Wmaybe-uninitialized pragma (see below) Boost.System (issue #141) af53f17 add constexpr to virtual functions on C++20 or later Error Handling at Compile Time Boost.URL attaches source location information to error codes for better diagnostics at runtime. In a constexpr context, BOOST_CURRENT_LOCATION is not available, so the BOOST_URL_CONSTEXPR_RETURN_EC macro branches on __builtin_is_constant_evaluated(): at compile time it returns the error enum directly, at runtime it attaches the source location. #if defined(BOOST_URL_HAS_CXX20_CONSTEXPR) # define BOOST_URL_CONSTEXPR_RETURN_EC(ev) \ do { \ if (__builtin_is_constant_evaluated()) { \ return (ev); \ } \ return [](auto e) { \ BOOST_URL_RETURN_EC(e); \ }(ev); \ } while(0) #endif The -Wmaybe-uninitialized Problem GCC’s -Wmaybe-uninitialized flagged code inside boost::optional and boost::variant2 union storage constructors. The root cause was neither library. The inlining chain: Boost.URL’s parsing code constructs a variant2::variant that contains an optional alternative. At -O3, GCC inlines the entire chain: Parse function Variant construction variant2 storage optional storage Union constructor After inlining, GCC sees a union with a dummy_ member and a value_ member, and it cannot prove which member is active. It conflates the “uninitialized dummy” path with the “initialized value” path. The in_place_index_t dispatch guarantees which member is initialized, but GCC’s data flow analysis loses track across the nested layers. -fsanitize=address makes it worse by changing inlining thresholds. The compiler blames the wrong library. The root cause is in variant2’s union storage, but when variant2 contains an optional, GCC reports the warning in optional’s code. The pragma has to go where GCC reports it, not where the issue originates. We contributed pragmas to both Boost.Optional and Boost.Variant2, and replaced Boost.URL’s blanket -Wno-maybe-uninitialized flag with targeted pragmas. This particular false positive requires GCC 14+, -O3, ASan, on x86_64 Linux, with a variant2::variant containing a boost::optional, constructed through a system::result dereference. Change any one of those conditions and the warning disappears. This leaves an open question for the Boost ecosystem: when a false positive surfaces because library A’s optimizer behavior interacts with library B’s union storage and gets reported in library C’s code, who is responsible for the pragma? For now, we placed pragmas where GCC reports the issue, but the underlying problem recurs every time a new combination of types triggers the same inlining pattern. The Shared Library Problem Making URL parsing constexpr means the parsing functions must be available in headers. But Boost.URL is a compiled library, and on MSVC, __declspec(dllexport) on a class exports all members, including inline and constexpr ones. This causes LNK2005 (duplicate symbol) errors for any class that mixes compiled and header-only members. Each class must follow exactly one of two policies: (a) Fully compiled: class BOOST_URL_DECL C. All members in .cpp files. No inline or constexpr members. (b) Fully header-only: class BOOST_SYMBOL_VISIBLE C. All inline/constexpr/template. No .cpp file. We documented the full rationale in config.hpp. We suspect other C++ libraries have not encountered this because they either do not test shared library builds as extensively as we do, or they are header-only. The Result Boost.URL can now parse URLs at compile time under C++20 (PR #976). All parse functions (parse_uri, parse_uri_reference, parse_relative_ref, parse_absolute_uri, and parse_origin_form) are fully constexpr. A malformed URL literal becomes a compile error rather than a runtime failure: // Parsed and validated at compile time. // A malformed literal would fail to compile. constexpr url_view api_base = parse_uri("https://api.example.com/v2").value(); Pre-parsed constexpr URL views also serve as zero-cost constants: because all parsing happens during compilation, components like scheme, host, and port are available at runtime with no parsing overhead. This is useful for applications that compare against well-known endpoints, pre-populate configuration defaults, or build routing tables without paying for string parsing at startup. The constexpr feature taught us that dismissing a request because the cost seems too high for one library misses the bigger picture. Once Peter Dimov and the other maintainers got involved, the cost was shared and the scope shrank. In the Boost ecosystem, a feature that seems expensive in isolation can become practical when the dependencies cooperate. Usability Improvements While integrating Boost.URL into Boost.Beast2, the Beast2 authors noticed friction in common operations that worked correctly but required more code than they should. At the same time, several community issues had been open for a while. We used this as an opportunity to address both. Convenience Functions The most requested feature was get_or for query containers: look up a query parameter by key and return a default value if it is not present. Before: auto it = url.params().find("page"); auto page = it != url.params().end() ? (*it).value : "1"; After: auto page = url.params().get_or("page", "1"); We also added standalone decode functions for working with individual URL components without constructing a full URL object: auto plain = decode("My%20Stuff"); assert(plain && *plain == "My Stuff"); auto n = decoded_size("Program%20Files"); assert(n && *n == 13); C++20 Integration enable_borrowed_range is now specialized for 10 Boost.URL view types (segments_view, params_view, decode_view, and others). Unlike a std::vector, which owns its data, Boost.URL views point into the URL’s buffer without owning it. When a temporary view is destroyed, its iterators still point to valid memory. enable_borrowed_range tells the compiler this is safe, so algorithms like std::ranges::find can return iterators from temporary views without the compiler rejecting the code: segments_view::iterator it; { segments_view ps("/path/to/file.txt"); it = ps.begin(); } // iterator is still valid (points to external buffer) assert(*it == "path"); The grammar system gained user-provided RangeRule support. Custom grammar rules for parsing URL components satisfy a concept requiring first() and next() methods returning system::result : struct my_range_rule { using value_type = core::string_view; system::result first(char const*& it, char const* end) const noexcept; system::result next(char const*& it, char const* end) const noexcept; }; The motivation was performance and API clarity (#943). Previously, grammar::range always type-erased the rule through a recycled_ptr with string storage. Stateless rules were paying for storage they did not need. With user-provided RangeRule, range detects empty rules and avoids the type-erasure overhead entirely. Performance Component offsets in url_impl changed from size_t to uint32_t, reducing the size of every URL object on 64-bit platforms. The maximum URL size is capped at UINT32_MAX - 1 (enforced by a static_assert). Constructing a segments_view or segments_encoded_view from a URL is now a constant-time operation: offsets are computed directly from iterator indices without scanning the path. Other improvements Fixes a87998a params_iter_impl::decrement() computed incorrect decoded key/value sizes when a query parameter’s value contains literal = characters (PR #978, #972) 60c281a decode_view::remove_prefix/remove_suffix asserted n <= size() instead of preventing undefined behavior (PR #978, #973) 01e0571 decode_view was forward-declared but not complete when pct_string_view::operator*() was declared (PR #963) Refactors e809ee4 token_rule_t now uses the empty base optimization via empty_value and provides conditional default construction (PR #964) Documentation 32c3ddc new design rationale page Legacy QuickBook documentation removed in favor of Antora-based docs 8c7c4c7 plus scheme convention documented 888cd8c MrDocs-generated tagfiles for cross-referencing with other Boost libraries Tests e946887 URL with ? in query string (PR #978, #926) Most of these improvements came from real usage. The Beast2 integration exposed friction that we would not have found from inside the library, and the community issues represented patterns that multiple users had independently hit. The best usability feedback comes from people who are actually building something with the library. Acknowledgments and Reflections The constexpr work benefited from the contributions of Peter Dimov (Boost.System, Boost.Variant2) and Andrzej Krzemieński (Boost.Optional), who applied fixes to their libraries so that Boost.URL could proceed. The Beast2 usability feedback came from the Beast2 authors as they integrated Boost.URL into the new design. The work on Boost.URL has shifted. The problems we are solving now (edge cases found by professional auditors, compiler limitations for constexpr, usability friction from real integrations) are different from the problems we used to solve. They are smaller and more specific, but they matter more because real people hit them. The complete set of changes is available in the Boost.URL repository.📝The C++ Alliance

Monday, April 20, 2026

My BeCPP talk video posted: “C++ — Growing in a world of competition, safety, and AI”BeCPP just posted this video of my talk at their March 30 Symposium. This is the first time I’ve given this material on camera — it’s extension of themes in my New Year’s Eve blog post, with major updates because some big industry changes happened in the first quarter of 2026. This talk is different … Continue reading My BeCPP talk video posted: “C++ — Growing in a world of competition, safety, and AI” →📝Sutter’s Mill

Sunday, April 19, 2026

Saturday, April 18, 2026

Friday, April 17, 2026

Multi merge sort, or when optimizations aren'tIn our previous episode we wrote a merge sort implementation that runs a bit faster than the one in stdlibc++. The question then becomes, could it be made even faster. If you go through the relevant literature one potential improvement is to do a multiway merge. That is, instead of merging two arrays into one, you merge four into one using, for example, a priority queue. This seems like a slam dunk for performance. Doubling the number of arrays to merge at a time halves the number of total passes needed The priority queue has a known static maximum size, so it can be put on the stack, which is guaranteed to be in the cache all the time Processing an element takes only log(#lists) comparisons Implementing multimerge was conceptually straightforward but getting all the gritty details right took a fair bit of time. Once I got it working the end result was slower. And not by a little, either, but more than 30% slower. Trying some optimizations made it a bit faster but not noticeably so. Why is this so? Maybe there are bugs that cause it to do extra work? Assuming that is not the case, what actually is? Measuring seems to indicate that a notable fraction of the runtime is spent in the priority queue code. Beyond that I got very little to nothing. The best hypotheses I could come up with has to with the number of comparisons made. A classical merge sort does two if statements per output elements. One to determine which of the two lists has a smaller element at the front and one to see whether removing the element exhausted the list. The former is basically random and the latter is always false except when the last element is processed. This amounts to 0.5 mispredicted branches per element per round. A priority queue has to do a bunch more work to preserve the heap property. The first iteration needs to check the root and its two children. That's three comparisons for value and two checks whether the children actually exist. Those are much less predictable than the comparisons in merge sort. Computers are really efficient at doing simple things, so it may be that the additional bookkeeping is so expensive that it negates the advantage of fewer rounds. Or maybe it's something else. Who's to say? Certainly not me. If someone wants to play with the code, the implementation is here . I'll probably delete it at some point as it does not have really any advantage over the regular merge sort.📝Nibble Stew

Thursday, April 16, 2026

No imageProposal: faster and smaller MC/DCThis post describes a proposal and is a call for clients, if you will. I have an idea for an optimization in the GCC MC/DC instrumentation ( -fcondition-coverage ) that I estimate would reduce the overhead in both compile time, object size, and run time, roughly 2-3 times, and I am looking for clients to fund this work. GCC has supported MC/DC since version 14 (released May 2024) and works quite well. I recently fixed a performance bug that made a massive difference for (very) large expressions, which went unnoticed for a while because such large expressions are quite uncommon and the performance has been adequate. I do believe we can improve both the compile time and the quality of instrumentation further. I did some more experiments and compared the compile times for a couple of real-world programs. The baseline is built with no flags, coverage with -ftest-coverage -fprofile-arcs , and MC/DC with -ftest-coverage -fcondition-coverage on GCC 14.2. This table shows that compile times are very sensitive to the program structure and that MC/DC slows down compiles significantly, sometimes up to 3 times. All compile times are in seconds. Baseline Coverage MC/DC SQLite 2.3 3.4 5.2 TagLib 2.2 3.2 2.9 FreeType 1.3 1.7 2.5 JUCE 1.4 1.9 1.9 Object sizes too are greatly affected by instrumentation, albeit slightly less consistent, but does support that the generated instrumentation (and not the analysis) is the main driver for compile times. The object sizes are in megabytes (MB). Baseline Coverage MC/DC SQLite 1.4 2.5 3.9 TagLib 3.0 6.5 4.5 FreeType 1.0 2.3 3.8 JUCE 1.5 3.4 2.8 Faster compiles are important and goes beyond simple ergonomics. Increased latency in the edit-compile-test cycle is very detrimental to both efficiency and effectiveness; reduced latency empowers engineers to solve harder problems and, in my experience, in a better way. The faster compile does not just save time (which is already precious and expensive), but reduces context switches and makes for a stronger feedback loop. The importance of fast and solid feedback loops is well understood, and has been covered by countless books, articles, blog posts, and talks. I would argue it is vital to optimize the edit-compile-test cycle when working with MC/DC since reaching (and understanding) coverage often is the most expensive and time consuming phase of system development. There’s efficiency to be gained, too, just by reducing the time wasted waiting for the compiler to finish. When developing the test suite for MC/DC the program will be recompiled many times which will effectively be idle, so just a few seconds here and there add up fast. The improvements in size- and runtime overhead means even larger programs can fit on small embedded system, which is particularly relevant the context of MC/DC as safety critical systems often run on small computers, ECUs, microcontrollers. This goes beyond just edit-compile-test friction; if instrumented programs cannot run on the device it is not just a mere inconvenience, it is a barrier to progress. The compile time and size tests show that the overhead of coverage instrumentation greatly depends on the structure of the program. Note that compilation is a complicated process and the 2-3x reduction in overhead is an estimate. To make this improvement in GCC happen, send an email to j@patch.no and I will send you a formal proposal and discuss the terms. After we sign the contract I will implement these optimizations and take care of upstreaming and integrating the changes into GCC.📝patch – Blog