Skip to content

Releases: ARM-software/astc-encoder

4.1.0

17 Aug 20:46
Compare
Choose a tag to compare

Status: August 2022

The 4.1.0 release is a maintenance release. There is no performance or image quality change in this release.

  • General:
    • Change: Command line decompressor no longer uses the legacy GL_LUMINANCE or GL_LUMINANCE_ALPHA format enums when writing KTX output files. Luminance textures now use the GL_RED format and luminance_alpha textures now use the GL_RG format.
    • Change: Command line tool gains a new -dimage option to generate diagnostic images showing aspects of the compression encoding. The output file name with its extension stripped is used as the stem of the diagnostic image file names.
    • Bug-fix: Library decompressor builds for SSE no longer use masked store maskmovdqu, as they can generate faults on masked lanes.
    • Bug-fix: Command line decompressor now correctly uses sized type enums for the internal format when writing output KTX files.
    • Bug-fix: Command line compressor now correctly loads 16 and 32-bit per component input KTX files.
    • Bug-fix: Fixed GCC9 compiler warnings on Arm aarch64.

Binary release sha256 checksums

31ffdd64c9a8fc21313bc039e5a7a3c29350ce6e24004f3649dcd1b128d2d6a2  astcenc-4.1.0-linux-x64.zip
8ed8bea302542d28b37a1026f80e54afeada30b1876b00cdf9034eaf384b491c  astcenc-4.1.0-macos-aarch64.zip
99c174c7f8f7660fc265d13ce6fe97379130ada05f1fbbcf5a82da646439ba97  astcenc-4.1.0-macos-x64.zip
3dbc32f146e00999e48c9dc29be483ec70ec7adf13dc879775f99bebaf8a2627  astcenc-4.1.0-windows-x64.zip

4.0.0

22 Jul 08:25
Compare
Choose a tag to compare

Status: July 2022

The 4.0.0 release introduces some major performance enhancements, and a number of larger changes to the heuristics used in the codec to find a more effective cost:quality trade off. The core compressor library is between 1.2x (4x4 blocks) and 1.6x (6x6 blocks) faster than the previous 3.7 release. The core decompressor library is 1.25x faster than the 3.7 release.

  • General:
    • Change: The -array option for specifying the number of image planes for ASTC 3D volumetric block compression been renamed to -zdim.
    • Change: The build root package directory is now bin instead of astcenc, allowing the CMake install step to write binaries into
      /usr/local/bin if the user wishes to do so.
    • Feature: A new -ssw option for specifying the shader sampling swizzle has been added as convenience alternative to the -cw option. This is needed to correct error weighting during compression if not all components are read in the shader. For example, to extract and compress two components from an RGBA input image, weighting the two components equally when sampling through texture().ra in the shader, use -esw ggga -ssw ra. In this example -ssw ra is equivalent to the alternative -cw 1 0 0 1 encoding.
    • Feature: The -a alpha weighting option has been re-enabled in the backend, and now again applies alpha scaling to the RGB error metrics when encoding. This is based on the maximum alpha in each block, not the individual texel alpha values used in the earlier implementation.
    • Feature: The command line tool now has -repeats <count> for testing, which will iterate around compression and decompression count times. Reported performance metrics also now separate compression and decompression scores.
    • Feature: The core codec is now warning clean up to /W4 for both MSVC cl.exe and clangcl.exe compilers.
    • Feature: The core codec now supports arm64 for both MSVC cl.exe and clangcl.exe compilers.
    • Feature: NO_INVARIANCE builds will enable the -ffp-contract=fast option for all targets when using Clang or GCC. In addition AVX2 targets will also set the -mfma option. This reduces image quality by up to 0.2dB (normally much less), but improves performance by up to 5-20%.
    • Optimization: Angular endpoint min/max weight selection is restricted to weight QUANT_11 or lower. Higher quantization levels assume default 0-1 range, which is less accurate but much faster.
    • Optimization: Maximum weight quantization for later trials is selected based on the weight quantization of the best encoding from the 1 plane 1 partition trial. This significantly reduces the search space for the later trials with more planes or partitions.
    • Optimization: Small data tables now use in-register SIMD permutes rather than gathers (AVX2) or unrolled scalar lookups (SSE/NEON). This can be a significant optimization for paths that are load unit limited.
    • Optimization: Decompressed image block writes in the decompressor now use a vectorized approach to writing each row of texels in the block, including to ability to exploit masked stores if the target supports them.
    • Optimization: Weight scrambling has been moved into the physical layer; the rest of the codec now uses linear order weights.
    • Optimization: Weight packing has been moved into the physical layer; the rest of the codec now uses unpacked weights in the 0-64 range.
    • Optimization: Consistently vectorize the creation of unquantized weight grids when they are needed.
    • Optimization: Remove redundant per-decimation mode copies of endpoint and weight structures, which were really read-only duplicates.
    • Optimization: Early-out the same endpoint mode color calculation if it cannot be applied.
    • Optimization: Numerous type size reductions applied to arrays to reduce both context working buffer size usage and stack usage.

Binary release sha256 checksums

7899f73ee2820b0e7d73b38fda6feb58948a30b511dd123ec4b7c89f3aae531a  astcenc-4.0.0-linux-x64.zip
9697d0ad63910cd4f1fd8be51203e9c1873aa6d219ac70540ad0f37e19dd3e63  astcenc-4.0.0-macos-aarch64.zip
26866ad79d80a6153f1746516004e5a95df07c55c6866e44ed63bb7953c7532e  astcenc-4.0.0-macos-x64.zip
b83a7d0c88d7e720828bf91655ad4acccee95b2d8086835ef26879b905777172  astcenc-4.0.0-windows-x64.zip

3.7

28 Apr 22:01
Compare
Choose a tag to compare
3.7

The 3.7 release contains another round of performance optimizations, including significant improvements to the command line front-end (faster PNG loader) and the arm64 build of the codec (faster NEON implementation).

Change log

General:

  • Feature: The command line tool PNG loader has been switched to use the Wuffs library, which is significantly faster than the current implementation.
  • Feature: Support for non-invariant builds returns. Opt-in to slightly faster, but not bit-exact, builds by setting -DNO_INVARIANCE=ON for the CMake configuration. This improves performance by around 2%.
  • Optimization: Changed SIMD select() so that it matches the default NEON behavior (bitwise select), rather than the default x86-64 behavior (lane select on MSB). Specialization select_msb() added for the one case we want to select on a sign-bit, where NEON needs a different implementation. This provides a significant (>25%) performance uplift on NEON implementations.

Binary release sha256 checksums

f69c2acbb3b07386cc95001c253cddfa567e71b9618682856f0ff600955cc2ba  astcenc-3.7-linux-x64.zip
41c691613e15d844bac56e97a042cc8aea7bc7e8e76fc767ffe875a4d8c5e995  astcenc-3.7-macos-aarch64.zip
5608f4c0b3e1d56a30070cdf61aff42050576833ac7e2719031cec461aa4d102  astcenc-3.7-macos-x64.zip
ecb0e1a5dcbfbaca8a38630e427638380b9d337c266660b39738260e1df5244a  astcenc-3.7-windows-x64.zip

3.6

10 Apr 18:21
Compare
Choose a tag to compare
3.6

The 3.6 release contains another round of performance optimizations. There are no interface changes in this release, but in general the API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h header.

Change log

General:

  • Feature: Compressor configurations no longer require the SELF_DECOMPRESS_ONLY flag to use defragmented data tables. Tables are always sorted to store active data table entries at the start of the list, with inactive entries stores at the end. If the SELF_DECOMPRESS_ONLY flag is specified the inactive entries are not created, reducing context creation time and memory footprint.
  • Feature: Image quality for 4x4 -fastest compression has been improved.
  • Optimization: Decimation modes are reliably excluded from processing when they are only partially selected in the compressor configuration (e.g. if used for single plane, but not dual plane modes). This is a significant performance optimization for all quality levels.
  • Optimization: Fast-path block load function variant added for 2D LDR images with no swizzle. This is a moderate performance optimization for the -fast and -fastest quality levels.

Binary release sha256 checksums

cd0392bccb64da8177f3e3869c5ac37aea6d096cbbf804ecc0b62b3581a2b56b  astcenc-3.6-linux-x64.zip
a04c3e930ba7397fb0c33fa6a70bb64ff798cd42a031124164a164f81634930c  astcenc-3.6-macos-aarch64.zip
b51f7f53c7c113f708666dca11dcaec545c535c081bc9b27030cf2eee73a279a  astcenc-3.6-macos-x64.zip
2a9c1edf98e1d28d542fb728c88f9afc03c9de18e5662b348e44be3f1a227059  astcenc-3.6-windows-x64.zip

3.5

18 Mar 09:24
Compare
Choose a tag to compare
3.5

The 3.5 release contains another round of performance optimizations. There are no interface changes in this release, but in general the API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h header.

Change log

General:

  • Feature: Compressor configurations using the SELF_DECOMPRESS_ONLY flag store compacted partition tables, which significantly improves both context creation time and compressor runtime performance.
  • Feature: Bilinear infill for decimated weight grids supports a new variant for half-decimated grids which are only decimated in one axis, which gives a small performance boost across all quality levels.

Binary release sha256 checksums

08e4dc8e263c0dcb0dc8d9d63b194ff881b8737de8f554458b2d66935da9c4fe  astcenc-3.5-linux-x64.zip
ced2d434cbade742ef1f3879f90918e983381ddec1cea794ea30e6d4b2fa9573  astcenc-3.5-macos-aarch64.zip
806d86f69d84d4c30d0e6f8ca7cfe222005e51ca86684e265dd5aa4b68f9ddcf  astcenc-3.5-macos-x64.zip
b531b89a677a41535f2704d50677904ae7ffefe83b403dfbbf6bfe6b028d573d  astcenc-3.5-windows-x64.zip

3.4

27 Feb 09:33
Compare
Choose a tag to compare
3.4

Status: Released

The 3.4 release introduces another round of optimizations, removing a number of power-user configuration options to simplify the core compressor data path. It is expected that there is some minor loss of image quality at the same compressor effort setting, but the improved performance allows a higher compressor effort level to be used for the same runtime as earlier releases.

Reminder for users of the library interface - the API is not designed to be binary compatible across versions, and this release is not compatible with earlier releases. Please update and rebuild your client-side code using the updated astcenc.h header.

  • General:
    • Feature: Many memory allocations have been moved off the stack into dynamically allocated working memory. This significantly reduces the peak stack usage, allowing the compressor to run in systems with 128KB stack limits.
    • Feature: Builds now support -DBLOCK_MAX_TEXELS=<count> to allow a compressor to support a subset of block sizes. This can reduce binary size and runtime memory footprint, and improve performance.
    • Feature: The -v and -va options to set a per-texel error weight function are no longer supported.
    • Feature: The -b option to set a per-texel error weight boost for block border texels is no longer supported.
    • Feature: The -a option to set a per-texel error weight based on texel alpha value is no longer supported as an error weighting tool, but is still supported for providing sprite-sheet RDO.
    • Feature: The -mask option to set an error metric for mask map textures is still supported, but is currently a no-op in the compressor.
    • Feature: The -perceptual option to set a perceptual error metric is still supported, but is currently a no-op in the compressor for mask map and normal map textures.
    • Bug-fix: Corrected decompression of error blocks in some cases, so now returning the expected error color (magenta for LDR, NaN for HDR). Note that astcenc determines the error color to use based on the output image data type not the decoder profile.

Binary release sha256 checksums

Note: Due to a delay publishing the binaries, caused by a DevOps pipeline upgrade, these binaries have actually been built from a slightly newer git hash (c82259f) than the 3.4 tag. This includes a number of performance optimizations which didn't land in time for the 3.4 release tag, as well as a change to use ClangCL to compile the Windows builds.

ceb268d8eb393281bf5646450c17f64c7fbfeddad7f02c52c28ab1b74f129fce  astcenc-3.4-linux-x64.zip
0619c92b1ba3c24f209c7affb21b56d29aeaef6bdbb1a73793cad560853e53c9  astcenc-3.4-macos-aarch64.zip
faf389fa734855fdf5c070920ad682c083d354d0b8e5310a6d4134cea1921485  astcenc-3.4-macos-x64.zip
d85c89f6a6251ccda71b3aace66502a7b54668b8bc23632180bfc48cebeb74de  astcenc-3.4-windows-x64.zip

3.3

08 Nov 14:59
Compare
Choose a tag to compare
3.3

Status: Released

The 3.3 release improves image quality for normal maps, and two component textures. Normal maps are expected to compress 25% slower than the 3.2 release, although it should be noted that they are still faster to compress in 3.3 than when using the 2.5 series. This release also fixes one reported stability issue.

  • General:
    • Feature: Normal map image quality has been improved.
    • Feature: Two component image quality has been improved, provided that unused components are correctly zero-weighted using e.g. -cw on the command line.
    • Bug-fix: Improved stability when trying to compress complex blocks that could not beat even the starting quality threshold. These will now always compress in to constant color blocks.

Binary release sha256 checksums

d686330476b104b94ff74eadc79b1778bf36aa42eb6f3904dca36af05a4c8c87  astcenc-3.3-linux-x64.zip
0b9eb51e75a433e68b5f8659d95b6e315b12a2263617beb2bca4cc5fbd396c00  astcenc-3.3-macos-aarch64.zip
c7f150dc54bd26f177cc94dd9531c2f9ec94f419047d943be8822d9d62e2e65e  astcenc-3.3-macos-x64.zip
71644196c3e8a9205844fbf407e9f41df1911c787ac055b664307b60c3e9155d  astcenc-3.3-windows-x64.zip

3.2

22 Aug 14:54
Compare
Choose a tag to compare
3.2

Status: Released

The 3.2 release is the third release in the 3.x series. This release is a maintenance release, fixing two issues in the 3.1 release.

  • General:
    • Bug fix: Multi-context usage of the library could result in poor quality block encodings or decompressed images if a new context was allocated while an image was being compressed or decompressed using another context.
    • Bug fix: Invalid block encodings that could not be encoded in the available bitrate are more consistently rejected during decompression.

Binary release sha256 checksums

a5de27ffd291bb2ae4d1ecd68e563ddfd60d1ead6022abe46d34ef141968ce6b  astcenc-3.2-linuxx64.zip
2f4dee7aa61e9b0728a5f558520c117cd9107753ef5d8a2bbab502761dc21417  astcenc-3.2-macosaarch64.zip
0a76024c92e6205cad661481551051a84673e5dda1d0d83c6d0541d964b06a95  astcenc-3.2-macosx64.zip
1e6b526841f5f721a6c154df18ed5e0493e34539b3149fe1ad46f54e08d6bdf7  astcenc-3.2-windowsx64.zip

3.1

22 Jul 19:49
c616ca4
Compare
Choose a tag to compare
3.1

Status: Released

The 3.1 release is the second release in the 3.x series. This release gives another performance boost, typically between 5 and 25% faster than the 3.0 release, as well as further incremental improvements to image quality. A number of build system improvements make astcenc easier and faster to integrate into other projects as a library, including support for building universal binaries on macOS. The full change list is shown below.

Reminder for users of the library interface - the API is not designed to be binary compatible across versions, and this release is not compatible with earlier releases. Please update and rebuild your client-side code using the updated astcenc.h header.

  • General:
    • Feature: RGB color data now supports -perceptual operation. The current implementation is simple, weighting color channel errors by their contribution to perceived luminance. This mimics the behavior of the human visual system, which is most sensitive to green, then red, then blue.
    • Feature: Codec supports a new low weight search mode, which is a simpler weight assignment for encodings with a low number of weights in the weight grid. The weight threshold can be overridden using the new -lowweightmodelimit command line option.
    • Feature: All platform builds now support building a native binary. Native binaries automatically select the SIMD level based on the default configuration of the compiler in use. Native binaries built on one machine may use different SIMD options than native binaries build on another.
    • Feature: macOS platform builds now support building universal binaries containing both x86_64 and arm64 target support.
    • Feature: Building the command line can be disabled when using as a library in another project. Set -DCLI=OFF during the CMake configure step.
    • Feature: A standalone minimal example of the core codec API usage has been added in the ./Utils/Example/ directory.
  • Core API:
    • Feature: Config flag ASTCENC_FLG_USE_PERCEPTUAL works for color data.
    • Feature: Config option tune_low_weight_count_limit added.
    • Feature: New heuristic added which prunes dual weight plane searches if they are unlikely to help. This heuristic is not user controllable.
    • Feature: Image quality has been improved. In general we see significant improvements (up to 0.2dB) for high bitrate encodings (4x4, 5x4), and a smaller improvement (up to 0.1dB) for lower bitrate encodings.
    • Bug fix: Arm "none" SIMD builds could be invariant with other builds. This fix has also been back-ported to the 2.x LTS branch.

Performance:

This release includes further performance optimizations which improve performance vs the 3.0 release by between 5% and 25%, depending on image and quality search preset used. High bitrate (e.g. 4x4) block sizes benefit the most for -fast searches, medium bitrate (e.g. 6x6) block sizes benefit the most for -medium and -thorough searches.

Image quality:

There are small image quality improvements of between 0.05-0.1dB for most bit rates, but larger improvements of up to 0.2 dB for high-bitrate encodings (eg. 4x4) for -fast and -medium searches.

Binary release sha256 checksums

101d013f2c5f7304be45654222ad7b31e616d1805cfb5a81fac291382f62f361  astcenc-3.1-linuxx64.zip
72c9a9c5fd2fa8eb2875e07900d7fc8fbb1ac1c657ca83705fa51262a918a19d  astcenc-3.1-macosaarch64.zip
8710c23bf71d48077cb84706166fd75402b4114d0f767e637ab36499149a42e1  astcenc-3.1-macosx64.zip
d453fc8af1d971da0a375268fff6fed53c174bd154030a7188eb9046e72340f5  astcenc-3.1-windowsx64.zip

3.0

03 Jun 07:47
Compare
Choose a tag to compare
3.0

Status: Released

The 3.0 release is the first in a series of updates to the compressor that are making more radical changes than we felt we could make with the 2.x series. The primary goals of the 3.x series are to keep the image quality the same or better compared to the 2.5 release, but continue to improve performance.

Reminder for users of the library interface - the API is not designed to be binary compatible across versions, and this release is not compatible with earlier releases. Please update and rebuild your client-side code using the updated astcenc.h header.

  • General:
    • Feature: The code has been significantly cleaned up, with improved comments, API documentation, function naming, and variable naming.
  • Core API:
    • API Change: The core APIs for astcenc_compress_image() and for astcenc_decompress_image() now accept swizzle structures by const pointer, instead of pass-by-value.
    • API Change: Calling the astcenc_compress_reset() and the astcenc_decompress_reset() functions between images is no longer required f the context was created for use by a single thread.
    • Feature: New heuristics have been added for controlling when to search beyond 2 partitions and 1 plane, and when to search beyond 3 partitions and 1 plane. The previous tune_partition_early_out_limit config option has been removed, and replaced with two new options tune_2_partition_early_out_limit_factor and tune_3_partition_early_out_limit_factor. See command line help for more detailed documentation.
    • Feature: New heuristics have been added for controlling when to use dual weight planes. The previous tune_two_plane_early_out_limit has been renamed totune_2_plane_early_out_limit_correlation. See command line help for more detailed documentation.
    • Feature: Support for using dual weight planes has been restricted to single partition blocks; it rarely helps blocks with 2 or more partitions and takes considerable compression search time.

Performance:

This release includes further performance optimizations which improve performance vs the 2.5 release by between 25% and 75%, depending on image and quality search preset used. Smaller block sizes and higher search qualities benefit the most.

Image quality:

The -medium and -fast presets have been tuned to give measurably better image quality. Despite this they are still faster than the equivalent in the 2.5 release.

Binary release sha256 checksums

663f67a2eb85c4eb539857534f32d828aecff770dfb2fe35f2355996cbdf2bdd  astcenc-3.0-linux-x64.zip
97ee6fc61a2c203132ad91c5f065a9ead39b6cf38e5530bebba463492a05449b  astcenc-3.0-macos-aarch64.zip
006d4b14c9914b9793a1843683f29b42fb22cfc17fb74a5bc8450bba09ff119b  astcenc-3.0-macos-x64.zip
40e4f87920c722e5ddd59635a91b651c7e58c352b62864518f52d7e71556b051  astcenc-3.0-windows-x64.zip