Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
[unreleased] - YYYY-MM-DD¶
[unreleased] - Added¶
[unreleased] - Removed¶
[unreleased] - Changed¶
Set
_DeviceDtypeModuleMixin._devicefrom torch’s default device function (#21164)
[unreleased] - Fixed¶
[2.5.5] - 2025-09-05¶
[2.5.5] - Changed¶
[2.5.5] - Fixed¶
[2.5.4] - 2025-08-29¶
[2.5.4] - Changed¶
Added support for NVIDIA H200 GPUs in
get_available_flops(#21119)
[2.5.3] - 2025-08-13¶
[2.5.3] - Changed¶
[2.5.3] - Fixed¶
[2.5.2] - 2025-3-20¶
[2.5.2] - Changed¶
Ensure correct device is used for autocast when mps is selected as Fabric accelerator (#20876)
[2.5.2] - Fixed¶
Fix:
TransformerEnginePrecisionconversion for layers withbias=False(#20805)
[2.5.1] - 2025-03-18¶
[2.5.1] - Changed¶
Added logging support for list of dicts without collapsing to a single key (#19957)
[2.5.1] - Removed¶
Removed legacy support for
lightning run model; usefabric runinstead (#20588)
[2.5.0] - 2024-12-19¶
[2.5.0] - Added¶
Added
stepparameter toTensorBoardLogger.log_hyperparamsto visualize changes during training (#20176)Added timeout to DeepSpeedStrategy (#20474)
Added FP8 + FSDP2 + torch.compile examples for Fabric (#20440)
Added RTX 4080 super to chips dictionary (#20285)
Added device property to lazy load functionality (#20183)
Added
ddp_find_unused_parameters_truealias in Fabric’s DDPStrategy (#20125)
[2.5.0] - Changed¶
[2.5.0] - Fixed¶
Fixed use of
convert_modulein FSDP to avoid using more memory than necessary during initialization (#20323)
[2.4.0] - 2024-08-06¶
[2.4.0] - Added¶
[2.4.0] - Changed¶
[2.4.0] - Removed¶
[2.4.0] - Fixed¶
[2.3.0] - 2024-06-13¶
[2.3.0] - Added¶
Added sanitization for classes before logging them as hyperparameters (#19771)
Enabled consolidating distributed checkpoints through
fabric consolidatein the new CLI (#19560)Added the ability to explicitly mark forward methods in Fabric via
_FabricModule.mark_forward_method()(#19690)Added support for PyTorch 2.3 (#19708)
Added
ModelParallelStrategyto support 2D parallelism (#19846, #19852, #19870, #19872)Added a call to
torch.distributed.destroy_process_groupin atexit handler if process group needs destruction (#19931)Added support for configuring hybrid-sharding by passing a tuple for the
FSDPStrategy(device_mesh=...)argument (#19504)
[2.3.0] - Changed¶
The
Fabric.rank_zero_firstcontext manager now uses a barrier without timeout to avoid long-running tasks to be interrupted (#19448)Fabric now raises an error if you forget to call
fabric.backward()when it is needed by the strategy or precision selection (#19447, #19493)_BackwardSyncControlcan now control what to do when gradient accumulation is disabled (#19577)
[2.3.0] - Removed¶
Removed support for PyTorch 1.13 (#19706)
[2.3.0] - Fixed¶
Fixed a matrix shape mismatch issue when running a model loaded from a quantized checkpoint (bitsandbytes) (#19886)
[2.2.2] - 2024-04-11¶
[2.2.2] - Fixed¶
[2.2.1] - 2024-03-04¶
[2.2.1] - Fixed¶
Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually (#19446)
[2.2.0] - 2024-02-08¶
[2.2.0] - Added¶
Added
lightning.fabric.utilities.ThroughputMonitorandlightning.fabric.utilities.Throughputto track throughput and log it (#18848)Added
lightning.fabric.utilities.AttributeDictfor convenient dict-attribute access to represent state in script (#18943)Added support for meta-device initialization and materialization of 4-bit Bitsandbytes layers (#19150)
Added
TransformerEnginePrecision(fallback_compute_dtype=)to control the dtype of operations that don’t support fp8 (#19082)Added support for clipping gradients by value with FSDP (#19236)
Added a utility function and CLI to consolidate FSDP sharded checkpoints into a single file (#19213)
Added support for re-compiling the model inside
Fabric.setup()over the FSDP/DDP wrappers (#19280)
[2.2.0] - Changed¶
seed_everything()without passing in a seed no longer randomly selects a seed, and now defaults to0(#18846)Changed the
TransformerEnginePrecision(dtype=)argument toweights_dtypeand made it required (#19082)The columns in the
metrics.csvfile produced byCSVLoggerare now sorted alphabetically (#19159)
[2.2.0] - Removed¶
Removed support for PyTorch 1.12 (#19300)
[2.2.0] - Fixed¶
[2.1.4] - 2024-01-31¶
[2.1.4] - Fixed¶
[2.1.3] - 2023-12-21¶
[2.1.3] - Fixed¶
[2.1.2] - 2023-11-15¶
[2.1.2] - Fixed¶
Fixed precision default from environment (#18928)
[2.1.1] - 2023-11-06¶
[2.1.1] - Changed¶
Calling a method other than
forwardthat invokes submodules is now an error when the model is wrapped (e.g., with DDP) (#18819)
[2.1.1] - Fixed¶
[2.1.0] - 2023-10-11¶
[2.1.0] - Added¶
Added support for the TPU-v4 architecture (#17227)
Added support for XLA’s new PJRT runtime (#17352)
Added support for Fully Sharded Data Parallel (FSDP) training with XLA (#18126, #18424, #18430)
Check for invalid TPU device inputs (#17227)
Added
XLAStrategy(sync_module_states=bool)to control whether to broadcast the parameters to all devices (#17522)Added support for joint setup of model and optimizer with FSDP (#17305)
Added support for handling multiple parameter groups in optimizers set up with FSDP (#17305)
Added support for saving and loading sharded model and optimizer state with
FSDPStrategy(#17323)Added a warning when calling methods on
_FabricModulethat bypass the strategy-specific wrappers (#17424)Added
Fabric.init_tensor()context manager to instantiate tensors efficiently directly on device and dtype (#17488)Added
Fabric.init_module()context manager to instantiate large models efficiently directly on device, dtype, and with sharding support (#17462)Creates the model parameters in the desired dtype (
torch.float32,torch.float64,torch.float16, ortorch.bfloat16) depending on the ‘true’ precision choice inFabric(precision='32-true'|'64-true'|'16-true'|'bf16-true')Handles initialization for FSDP models before wrapping and the Zero stage 3 initialization for DeepSpeed before sharding
Added support for empty weight initialization with
Fabric.init_module(empty_init=True)for checkpoint loading (#17627)Added support for meta-device initialization with
Fabric.init_module(empty_init=True)in FSDP (#18122)Added
lightning.fabric.plugins.Precision.module_init_context()andlightning.fabric.strategies.Strategy.module_init_context()context managers to control model and tensor instantiation (#17462)lightning.fabric.strategies.Strategy.tensor_init_context()context manager to instantiate tensors efficiently directly on device and dtype (#17607)Run the DDP wrapper in a CUDA stream (#17334)
Added support for true half-precision as
Fabric(precision="16-true"|"bf16-true")(#17287)Added support for mixed 8-bit precision as
Fabric(precision="transformer-engine")using Nvidia’s Transformer Engine (#17597)Added support for linear layer quantization with
Fabric(plugins=BitsandbytesPrecision())using bitsandbytes (#18655)Added error messaging for missed
.launch()when it is required (#17570)Added support for saving checkpoints with either full state-dict or sharded state dict via
FSDPStrategy(state_dict_type="full"|"sharded")(#17526)Added support for loading a full-state checkpoint file into a sharded model (#17623)
Added support for calling hooks on a LightningModule via
Fabric.call(#17874)Added the parameter
Fabric.load(..., strict=True|False)to enable non-strict loading of partial checkpoint state (#17645)Added the parameter
Fabric.save(..., filter=...)to enable saving a partial checkpoint state (#17845)Added support for loading optimizer states from a full-state checkpoint file (#17747)
Automatically call
xla_model.mark_step()before saving checkpoints with XLA (#17882)Automatically call
xla_model.mark_step()afteroptimizer.step()with XLA (#17883)Added support for all half-precision modes in FSDP precision plugin (#17807)
Added
FSDPStrategy(activation_checkpointing_policy=...)to customize the layer policy for automatic activation checkpointing (requires torch>=2.1) (#18045)Added a callback for spike-detection (#18014)
Added the ability to set the
torch.distributed.fsdp.ShardingStrategyvia string inFSDPStrategy(#18087)Improved error messages when attempting to load a DeepSpeed checkpoint at an invalid path (#17795)
Added
Fabric.load_raw()for loading raw PyTorch state dict checkpoints for model or optimizer objects (#18049)Allowed accessing rank information in the main process before processes are launched when using the
XLAStrategy(#18194)Added automatic process cleanup to avoid zombie child processes and stalls when exceptions are raised (#18218)
Added validation of user input for
devicesandnum_nodeswhen running withSLURMorTorchElastic(#18292)Improved the error messaging and instructions when handling custom batch samplers in distributed settings (#18402)
Added support for saving and loading stateful objects other than modules and optimizers (#18513)
Enabled the default process group configuration for FSDP’s hybrid sharding (#18583)
Added
lightning.fabric.utilities.suggested_max_num_workersto assist with setting a good value in distributed settings (#18591)Added
lightning.fabric.utilities.is_shared_filesystemutility function to automatically check whether the filesystem is shared between machines (#18586)Removed support for PyTorch 1.11 (#18691)
Added support for passing the argument
.load_state_dict(..., assign=True|False)on Fabric-wrapped modules in PyTorch 2.1 or newer (#18690)
[2.1.0] - Changed¶
Allow using iterable-style datasets with TPUs (#17331)
Increased the minimum XLA requirement to 1.13 (#17368)
Fabric argument validation now only raises an error if conflicting settings are set through the CLI (#17679)
DataLoader re-instantiation is now only performed when a distributed sampler is required (#18191)
Improved the formatting of emitted warnings (#18288)
Broadcast and reduction of tensors with XLA-based strategies now preserve the input’s device (#18275)
Due to lack of reliability, Fabric now only runs on one GPU instead of all GPUs in a Jupyter notebook if
devices="auto"(default) (#18291)Enabled launching via
torchrunin a SLURM environment; theTorchElasticEnvironmentnow gets chosen over theSLURMEnvironmentif both are detected (#18618)If not set by the user, Lightning will set
OMP_NUM_THREADStonum_cpus / num_processeswhen launching subprocesses (e.g. when DDP is used) to avoid system overload for CPU-intensive tasks (#18677)
[2.1.0] - Deprecated¶
Deprecated the
DDPStrategy.is_distributedproperty. This strategy is distributed by definition (#17381)Deprecated the
SingleTPUStrategy(strategy="single_tpu") in favor ofSingleDeviceXLAStrategy(strategy="single_xla") (#17383)Deprecated the
TPUAcceleratorin favor ofXLAAccelerator(#17383)Deprecated the
TPUPrecisionin favor ofXLAPrecision(#17383)Deprecated the
TPUBf16Precisionin favor ofXLABf16Precision(#17383)
[2.1.0] - Removed¶
Removed automatic sharding support with
Fabric.runor usingfabric.launch(fn). This only impacts FSDP and DeepSpeed strategy users. Please instantiate your module under the newly addedfabric.init_modulecontext manager (#17832)Removed the unsupported
checkpoint_ioargument from theFSDPStrategy(#18192)
[2.1.0] - Fixed¶
Fixed issue where running on TPUs would select the wrong device index (#17227)
Removed the need to call
.launch()when using the DP-strategy (strategy="dp") (#17931)Fixed FSDP re-applying activation checkpointing when the user had manually applied it already (#18006)
Fixed FSDP re-wrapping the module root when the user had manually wrapped the model (#18054)
Fixed issue where unexpected exceptions would leave the default torch dtype modified when using true precision settings (#18500)
Fixed redundant input-type casting in FSDP precision (#18630)
Fixed an issue with
find_usable_cuda_devices(0)incorrectly returning a list of devices (#18722)Fixed redundant file writes in
CSVLogger(#18567)
[2.0.9] - 2023-09-14¶
[2.0.9] - Fixed¶
Fixed an issue causing the
_FabricOptimizer.stateto remain outdated after loading withload_state_dict(#18488)
[2.0.8] - 2023-08-29¶
[2.0.8] - Changed¶
On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
[2.0.8] - Fixed¶
Fixed model parameters getting shared between processes when running with
strategy="ddp_spawn"andaccelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)Removed false positive warning when using
fabric.no_backward_syncwith XLA strategies (#17761)Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
Fixed FSDP full-precision
param_dtypetraining (16-mixed,bf16-mixedand32-trueconfigurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
[2.0.7] - 2023-08-14¶
[2.0.7] - Changed¶
Disabled the auto-detection of the Kubeflow environment (#18137)
[2.0.7] - Fixed¶
Fixed issue where DDP subprocesses that used Hydra would set hydra’s working directory to current directory (#18145)
Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
Fixed an issue with
Fabric.all_reduce()not performing an inplace operation for all backends consistently (#18235)
[2.0.6] - 2023-07-20¶
[2.0.6] - Fixed¶
Fixed
TensorBoardLogger.log_graphnot unwrapping the_FabricModule(#17844)
[2.0.5] - 2023-07-07¶
[2.0.5] - Added¶
Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)
[2.0.5] - Changed¶
Avoid info message when loading 0 entry point callbacks (#17990)
[2.0.5] - Fixed¶
Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)
Fixed check for FSDP’s flat parameters in all parameter groups (#17914)
Fixed automatic step tracking in Fabric’s CSVLogger (#17942)
Fixed an issue causing the
torch.set_float32_matmul_precisioninfo message to show multiple times (#17960)Fixed loading model state when
Fabric.load()is called afterFabric.setup()(#17997)
[2.0.4] - 2023-06-22¶
[2.0.4] - Fixed¶
[2.0.3] - 2023-06-07¶
Added support for
Callbackregistration through entry points (#17756)
[2.0.3] - Changed¶
[2.0.3] - Fixed¶
[2.0.2] - 2023-04-24¶
[2.0.2] - Changed¶
Enabled precision autocast for LightningModule step methods in Fabric (#17439)
[2.0.2] - Fixed¶
[2.0.1] - 2023-03-30¶
[2.0.1] - Changed¶
Generalized
Optimizervalidation to accommodate both FSDP 1.x and 2.x (#16733)
[2.0.0] - 2023-03-15¶
[2.0.0] - Added¶
Added
Fabric.all_reduce(#16459)Added support for saving and loading DeepSpeed checkpoints through
Fabric.save/load()(#16452)Added support for automatically calling
set_epochon thedataloader.batch_sampler.sampler(#16841)Added support for writing logs to remote file systems with the
CSVLogger(#16880)Added support for frozen dataclasses in the optimizer state (#16656)
Added
lightning.fabric.is_wrappedto check whether a module, optimizer, or dataloader was already wrapped by Fabric (#16953)
[2.0.0] - Changed¶
Fabric now chooses
accelerator="auto", strategy="auto", devices="auto"as defaults (#16842)Checkpoint saving and loading redesign (#16434)
Changed the method signatrue of
Fabric.saveandFabric.loadChanged the method signature of
Strategy.save_checkpointandFabric.load_checkpointFabric.saveaccepts a state that can contain model and optimizer referencesFabric.loadcan now load state in-place onto models and optimizersFabric.loadreturns a dictionary of objects that weren’t loaded into the stateStrategy.save_checkpointandFabric.load_checkpointare now responsible for accessing the state of the model and optimizers
DataParallelStrategy.get_module_state_dict()andDDPStrategy.get_module_state_dict()now correctly extracts the state dict without keys prefixed with ‘module’ (#16487)“Native” suffix removal (#16490)
strategy="fsdp_full_shard_offload"is nowstrategy="fsdp_cpu_offload"lightning.fabric.plugins.precision.native_ampis nowlightning.fabric.plugins.precision.amp
Enabled all shorthand strategy names that can be supported in the CLI (#16485)
Renamed
strategy='tpu_spawn'tostrategy='xla'andstrategy='tpu_spawn_debug'tostrategy='xla_debug'(#16781)Changed arguments for precision settings (from [64|32|16|bf16] to [“64-true”|”32-true”|”16-mixed”|”bf16-mixed”]) (#16767)
The selection
Fabric(strategy="ddp_spawn", ...)no longer falls back to “ddp” when a cluster environment gets detected (#16780)Renamed
setup_dataloaders(replace_sampler=...)tosetup_dataloaders(use_distributed_sampler=...)(#16829)
[2.0.0] - Removed¶
[2.0.0] - Fixed¶
[1.9.4] - 2023-03-01¶
[1.9.4] - Added¶
Added
Fabric(strategy="auto")support (#16916)
[1.9.4] - Fixed¶
[1.9.3] - 2023-02-21¶
[1.9.3] - Fixed¶
[1.9.2] - 2023-02-15¶
[1.9.2] - Fixed¶
Fixed an attribute error and improved input validation for invalid strategy types being passed to Trainer (#16693)
[1.9.1] - 2023-02-10¶
[1.9.1] - Fixed¶
Fixed error handling for
accelerator="mps"andddpstrategy pairing (#16455)Fixed strict availability check for
torch_xlarequirement (#16476)Fixed an issue where PL would wrap DataLoaders with XLA’s MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA’s MpDeviceLoader (#16571)
Fixed an import error when
torch.distributedis not available (#16658)
[1.9.0] - 2023-01-17¶
[1.9.0] - Added¶
Added
Fabric.launch()to programmatically launch processes (e.g. in Jupyter notebook) (#14992)Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the
runmethod (#14992)Added
Fabric.setup_module()andFabric.setup_optimizers()to support strategies that need to set up the model before an optimizer can be created (#15185)Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
Added
lightning.fabric.accelerators.find_usable_cuda_devicesutility function (#16147)Added basic support for LightningModules (#16048)
Added support for managing callbacks via
Fabric(callbacks=...)and emitting events throughFabric.call()(#16074)Added Logger support (#16121)
Added
Fabric(loggers=...)to support different Logger frameworks in FabricAdded
Fabric.logfor logging scalars using multiple loggersAdded
Fabric.log_dictfor logging a dictionary of multiple metrics at onceAdded
Fabric.loggersandFabric.loggerattributes to access the individual logger instancesAdded support for calling
self.logandself.log_dictin a LightningModule when using FabricAdded access to
self.loggerandself.loggersin a LightningModule when using Fabric
Added
lightning.fabric.loggers.TensorBoardLogger(#16121)Added
lightning.fabric.loggers.CSVLogger(#16346)Added support for a consistent
.zero_grad(set_to_none=...)on the wrapped optimizer regardless of which strategy is used (#16275)
[1.9.0] - Changed¶
The
Fabric.run()method is no longer abstract (#14992)The
XLAStrategynow inherits fromParallelStrategyinstead ofDDPSpawnStrategy(#15838)Merged the implementation of
DDPSpawnStrategyintoDDPStrategyand removedDDPSpawnStrategy(#14952)The dataloader wrapper returned from
.setup_dataloaders()now calls.set_epoch()on the distributed sampler if one is used (#16101)Renamed
Strategy.reducetoStrategy.all_reducein all strategies (#16370)When using multiple devices, the strategy now defaults to “ddp” instead of “ddp_spawn” when none is set (#16388)
[1.9.0] - Removed¶
Removed support for FairScale’s sharded training (
strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)
[1.9.0] - Fixed¶
[1.8.6] - 2022-12-21¶
minor cleaning
[1.8.5] - 2022-12-15¶
minor cleaning
[1.8.4] - 2022-12-08¶
[1.8.4] - Fixed¶
Fixed
shuffle=Falsehaving no effect when using DDP/DistributedSampler (#15931)
[1.8.3] - 2022-11-22¶
[1.8.3] - Changed¶
Temporarily removed support for Hydra multi-run (#15737)
[1.8.2] - 2022-11-17¶
[1.8.2] - Fixed¶
Fixed the automatic fallback from
LightningLite(strategy="ddp_spawn", ...)toLightningLite(strategy="ddp", ...)when on an LSF cluster (#15103)