Skip to content

Unet for Grad TTS and pipeline#9

Merged
patil-suraj merged 2 commits into
mainfrom
grad-tts
Jun 15, 2022
Merged

Unet for Grad TTS and pipeline#9
patil-suraj merged 2 commits into
mainfrom
grad-tts

Conversation

@patil-suraj

Copy link
Copy Markdown
Contributor

No description provided.

@patil-suraj patil-suraj merged commit f7d91f8 into main Jun 15, 2022
PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023
williamberman pushed a commit to williamberman/diffusers that referenced this pull request Sep 18, 2023
yiyixuxu pushed a commit that referenced this pull request Jan 21, 2024
sayakpaul added a commit that referenced this pull request Nov 25, 2025
* add vae

* Initial commit for Flux 2 Transformer implementation

* add pipeline part

* small edits to the pipeline and conversion

* update conversion script

* fix

* up up

* finish pipeline

* Remove Flux IP Adapter logic for now

* Remove deprecated 3D id logic

* Remove ControlNet logic for now

* Add link to ViT-22B paper as reference for parallel transformer blocks such as the Flux 2 single stream block

* update pipeline

* Don't use biases for input projs and output AdaNorm

* up

* Remove bias for double stream block text QKV projections

* Add script to convert Flux 2 transformer to diffusers

* make style and make quality

* fix a few things.

* allow sft files to go.

* fix image processor

* fix batch

* style a bit

* Fix some bugs in Flux 2 transformer implementation

* Fix dummy input preparation and fix some test bugs

* fix dtype casting in timestep guidance module.

* resolve conflicts.,

* remove ip adapter stuff.

* Fix Flux 2 transformer consistency test

* Fix bug in Flux2TransformerBlock (double stream block)

* Get remaining Flux 2 transformer tests passing

* make style; make quality; make fix-copies

* remove stuff.

* fix type annotaton.

* remove unneeded stuff from tests

* tests

* up

* up

* add sf support

* Remove unused IP Adapter and ControlNet logic from transformer (#9)

* copied from

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>

* up

* up

* up

* up

* up

* Refactor Flux2Attention into separate classes for double stream and single stream attention

* Add _supports_qkv_fusion to AttentionModuleMixin to allow subclasses to disable QKV fusion

* Have Flux2ParallelSelfAttention inherit from AttentionModuleMixin with _supports_qkv_fusion=False

* Log debug message when calling fuse_projections on a AttentionModuleMixin subclass that does not support QKV fusion

* Address review comments

* Update src/diffusers/pipelines/flux2/pipeline_flux2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* up

* Remove maybe_allow_in_graph decorators for Flux 2 transformer blocks (#12)

* up

* support ostris loras. (#13)

* up

* update schdule

* up

* up (#17)

* add training scripts (#16)

* add training scripts

Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>

* model cpu offload in validation.

* add flux.2 readme

* add img2img and tests

* cpu offload in log validation

* Apply suggestions from code review

* fix

* up

* fixes

* remove i2i training tests for now.

---------

Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>

* up

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-10-53-87-203.ec2.internal>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>
lavinal712 pushed a commit to lavinal712/diffusers that referenced this pull request Dec 23, 2025
dg845 added a commit that referenced this pull request Jan 6, 2026
* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
dg845 added a commit that referenced this pull request Jan 8, 2026
* Initial LTX 2.0 transformer implementation

* Add tests for LTX 2 transformer model

* Get LTX 2 transformer tests working

* Rename LTX 2 compile test class to have LTX2

* Remove RoPE debug print statements

* Get LTX 2 transformer compile tests passing

* Fix LTX 2 transformer shape errors

* Initial script to convert LTX 2 transformer to diffusers

* Add more LTX 2 transformer audio arguments

* Allow LTX 2 transformer to be loaded from local path for conversion

* Improve dummy inputs and add test for LTX 2 transformer consistency

* Fix LTX 2 transformer bugs so consistency test passes

* Initial implementation of LTX 2.0 video VAE

* Explicitly specify temporal and spatial VAE scale factors when converting

* Add initial LTX 2.0 video VAE tests

* Add initial LTX 2.0 video VAE tests (part 2)

* Get diffusers implementation on par with official LTX 2.0 video VAE implementation

* Initial LTX 2.0 vocoder implementation

* Use RMSNorm implementation closer to original for LTX 2.0 video VAE

* start audio decoder.

* init registration.

* up

* simplify and clean up

* up

* Initial LTX 2.0 text encoder implementation

* Rough initial LTX 2.0 pipeline implementation

* up

* up

* up

* up

* Add imports for LTX 2.0 Audio VAE

* Conversion script for LTX 2.0 Audio VAE Decoder

* Add Audio VAE logic to T2V pipeline

* Duplicate scheduler for audio latents

* Support num_videos_per_prompt for prompt embeddings

* LTX 2.0 scheduler and full pipeline conversion

* Add script to test full LTX2Pipeline T2V inference

* Fix pipeline return bugs

* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__

* Fix more bugs in LTX2Pipeline.__call__

* Improve CPU offload support

* Fix pipeline audio VAE decoding dtype bug

* Fix video shape error in full pipeline test script

* Get LTX 2 T2V pipeline to produce reasonable outputs

* Make LTX 2.0 scheduler more consistent with original code

* Fix typo when applying scheduler fix in T2V inference script

* Refactor Audio VAE to be simpler and remove helpers (#7)

* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Add support for I2V (#8)

* start i2v.

* up

* up

* up

* up

* up

* remove uniform strategy code.

* remove unneeded code.

* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)

* test i2v.

* Move Video and Audio Text Encoder Connectors to Transformer (#12)

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* precompute run_connectors,.

* fixes

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* Make connectors a separate module (#18)

* remove text_encoder.py

* address yiyi's comments.

* up

* up

* up

* up

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

* up (#19)

* address initial feedback from lightricks team (#16)

* cross_attn_timestep_scale_multiplier to 1000

* implement split rope type.

* up

* propagate rope_type to rope embed classes as well.

* up

* When using split RoPE, make sure that the output dtype is same as input dtype

* Fix apply split RoPE shape error when reshaping x to 4D

* Add export_utils file for exporting LTX 2.0 videos with audio

* Tests for T2V and I2V (#6)

* add ltx2 pipeline tests.

* up

* up

* up

* up

* remove content

* style

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* up

* up

* i2v tests.

* up

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* revert unneded changes.

* up

* up

* update to split style rope.

* up

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* up

* use export util funcs.

* Point original checkpoint to LTX 2.0 official checkpoint

* Allow the I2V pipeline to accept image URLs

* make style and make quality

* remove function map.

* remove args.

* update docs.

* update doc entries.

* disable ltx2_consistency test

* Simplify LTX 2 RoPE forward by removing coords is None logic

* make style and make quality

* Support LTX 2.0 audio VAE encoder

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Remove print statement in audio VAE

* up

* Fix bug when calculating audio RoPE coords

* Ltx 2 latent upsample pipeline (#12922)

* Initial implementation of LTX 2.0 latent upsampling pipeline

* Add new LTX 2.0 spatial latent upsampler logic

* Add test script for LTX 2.0 latent upsampling

* Add option to enable VAE tiling in upsampling test script

* Get latent upsampler working with video latents

* Fix typo in BlurDownsample

* Add latent upsample pipeline docstring and example

* Remove deprecated pipeline VAE slicing/tiling methods

* make style and make quality

* When returning latents, return unpacked and denormalized latents for T2V and I2V

* Add model_cpu_offload_seq for latent upsampling pipeline

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* Fix latent upsampler filename in LTX 2 conversion script

* Add latent upsample pipeline to LTX 2 docs

* Add dummy objects for LTX 2 latent upsample pipeline

* Set default FPS to official LTX 2 ckpt default of 24.0

* Set default CFG scale to official LTX 2 ckpt default of 4.0

* Update LTX 2 pipeline example docstrings

* make style and make quality

* Remove LTX 2 test scripts

* Fix LTX 2 upsample pipeline example docstring

* Add logic to convert and save a LTX 2 upsampling pipeline

* Document LTX2VideoTransformer3DModel forward pass

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
@github-actions github-actions Bot mentioned this pull request Apr 22, 2026
6 tasks
Carlofkl added a commit to Carlofkl/diffusers that referenced this pull request Jun 4, 2026
- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review huggingface#1, huggingface#2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review huggingface#3, huggingface#4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review huggingface#5)
- Wire max_sequence_length into the tokenizer call for generate mode (review huggingface#6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review huggingface#7, huggingface#8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review huggingface#9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review huggingface#10, huggingface#11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review huggingface#12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13)
Carlofkl added a commit to Carlofkl/diffusers that referenced this pull request Jun 5, 2026
- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite)
  into unet_dreamlite.py and delete the standalone module (review huggingface#1)
- Move DreamLiteAttnProcessor2_0 from attention_processor.py into
  unet_dreamlite.py to keep all DreamLite-specific code in one place;
  update docs autodoc reference accordingly (review huggingface#2)
- Drop the PyTorch 2.0 hasattr/ImportError check in
  DreamLiteAttnProcessor2_0.__init__ (diffusers already requires
  torch>=2.0; matches Wan deprecation) (review huggingface#3)
- Drop the deprecated `scale` argument handling from
  DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers)
  (review huggingface#4)
- Switch SDPA call to dispatch_attention_fn so all diffusers attention
  backends (FlashAttention, FlashAttention-3, sageattention, etc.) are
  selectable (review huggingface#5)
- Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to
  match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D /
  DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D /
  DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D);
  default down/up/mid block_types in DreamLiteUNetModel and the test
  fixtures are updated to the new keys (review huggingface#6, huggingface#7); the
  carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are
  being updated in lock-step
- Localize retrieve_latents inside pipeline_dreamlite.py with a
  `# Copied from` marker, removing the cross-pipeline import; mirrors
  the mobile pipeline (review huggingface#8)
- Add a check_inputs() method to both DreamLitePipeline and
  DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from
  __call__; pulls the image-type validation out of prepare_image_latents
  and adds prompt-type and h/w-divisibility checks (review huggingface#9)
dg845 added a commit that referenced this pull request Jun 12, 2026
)

* feat(pipelines): add DreamLite text-to-image and image-edit pipelines

Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:

* DreamLitePipeline           - full 3-branch dual CFG (negative,
                                reference, prompt); supports T2I and
                                I2I editing at 1024x1024.
* DreamLiteMobilePipeline     - distilled single-branch variant for
                                on-device inference; no CFG.

New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):

* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
  transformer block.
* models/unets/unet_dreamlite.py                  - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py        - DreamLite-specific
  down/up/mid blocks.
* models/resnet_dreamlite.py                      - DreamLite ResNet
  variants.
* models/attention_processor.py                   - add
  DreamLiteAttnProcessor2_0 (pure addition, no existing processor
  modified).

Pipeline + tests + docs:

* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
  pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
  test_pipeline_dreamlite_mobile.py} with the standard
  PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
  with a fake so MagicMock text encoders work without per-test
  boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
  serialisation, custom attention processor, encode_prompt return
  shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
  (alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.

Two real bugs surfaced by the mixin test suite are fixed in this
commit:

* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
  are now repeated along the batch dimension in both pipelines'
  T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
  lookup so encode_prompt can be tested in isolation per
  PipelineTesterMixin convention.

SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).

Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.

* docs+tests(pipelines/dreamlite): pin Hub repos to `diffusers` branch

The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:

* `main` branch      - keeps `model_index.json` pointing at ByteDance's
                       internal package path so the original (non-diffusers)
                       reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
                       `["diffusers", "DreamLiteUNetModel"]` so this
                       integration loads correctly from `diffusers`.

This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.

* chore(pipelines/dreamlite): sync `# Copied from` blocks + dummy objects after rebase

Mechanical changes after rebasing onto current `main`:

* `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from
  `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604
  type hints, expanded docstring, plus the new
  `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's
  default code path uses `num_inference_steps` (uniform schedule) and never
  passes custom `timesteps` / `sigmas`, so the added guards are dead-code
  for this pipeline — behaviour is unchanged.
* `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` —
  registered the dummy classes auto-generated by `make fix-copies` for
  `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`,
  `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`.

Generated by `make fix-copies`. No hand edits.

* docs(dreamlite): register attention processor + split combined docstring entries

- Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md
  (fixes check_support_list.py).
- Split combined 'height / width' and 'guidance_scale / image_guidance_scale'
  entries in the two pipeline docstrings; add a complete Args block to
  DreamLiteTransformer2DModel.forward
  (fixes check_forward_call_docstrings.py).

No behavioral change.

* refactor(dreamlite): address review feedback from #13815

- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review #1, #2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review #3, #4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review #5)
- Wire max_sequence_length into the tokenizer call for generate mode (review #6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review #7, #8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review #9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review #10, #11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review #12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review #13)

* refactor(dreamlite): address dg845 follow-up review

- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite)
  into unet_dreamlite.py and delete the standalone module (review #1)
- Move DreamLiteAttnProcessor2_0 from attention_processor.py into
  unet_dreamlite.py to keep all DreamLite-specific code in one place;
  update docs autodoc reference accordingly (review #2)
- Drop the PyTorch 2.0 hasattr/ImportError check in
  DreamLiteAttnProcessor2_0.__init__ (diffusers already requires
  torch>=2.0; matches Wan deprecation) (review #3)
- Drop the deprecated `scale` argument handling from
  DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers)
  (review #4)
- Switch SDPA call to dispatch_attention_fn so all diffusers attention
  backends (FlashAttention, FlashAttention-3, sageattention, etc.) are
  selectable (review #5)
- Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to
  match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D /
  DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D /
  DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D);
  default down/up/mid block_types in DreamLiteUNetModel and the test
  fixtures are updated to the new keys (review #6, #7); the
  carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are
  being updated in lock-step
- Localize retrieve_latents inside pipeline_dreamlite.py with a
  `# Copied from` marker, removing the cross-pipeline import; mirrors
  the mobile pipeline (review #8)
- Add a check_inputs() method to both DreamLitePipeline and
  DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from
  __call__; pulls the image-type validation out of prepare_image_latents
  and adds prompt-type and h/w-divisibility checks (review #9)

* fix(dreamlite): correct Q/K/V layout for dispatch_attention_fn

dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).

* test(dreamlite): swap MagicMock for tiny real Qwen3-VL fixture

Address dg845's review: rebuild the DreamLite fast-test fixture around a
real (tiny) Qwen3VLForConditionalGeneration + Qwen3VLProcessor so the
standard PipelineTesterMixin save/load, dtype, and offload tests run
end-to-end against the actual encode_prompt code path. Override
DreamLiteUNetModel.set_default_attn_processor to reinstall the GQA
processor so mixin utilities that round-trip through it keep working.

* Apply style fixes

* fix(dreamlite): address blocking review issues from #13815

- Override _no_split_modules / _repeated_blocks on DreamLiteUNetModel
  with the actual DreamLite class names (BasicTransformerBlockDreamLite,
  ResnetBlock2DDreamLite, DreamLiteCrossAttnUpBlock2D,
  DreamLiteUpBlock2D) so device_map="auto" and compile_repeated_blocks()
  match correctly.

- Keep attention masks as bool tensors in DreamLiteTransformer2DModel
  instead of converting them to dense additive float biases. The dense
  format hard-raises on flash / _flash_3 / _sage backends in
  dispatch_attention_fn (which requires dtype == torch.bool).

- Add explicit parentheses around each clause in check_inputs's mixed
  and/or condition (both pipelines) for readability.

- Replace nn.Module.__init__(self) with ModelMixin.__init__(self) in
  DreamLiteUNetModel.__init__ so mixin state (e.g.
  _gradient_checkpointing_func) is properly initialised. ConfigMixin /
  PushToHubMixin don't define their own __init__, so this covers the
  full chain without re-running UNet2DConditionModel.__init__.

* fix(dreamlite): forward all processor outputs to Qwen3VL text encoder

Recent versions of Qwen3VLProcessor add an mm_token_type_ids output, and
Qwen3VLModel.compute_3d_position_ids raises ValueError whenever
multimodal inputs are present (image_grid_thw is not None) but
mm_token_type_ids is None.

encode_prompt previously forwarded only input_ids / attention_mask /
pixel_values / image_grid_thw, dropping the new field and breaking the
fast pipeline tests against transformers main.

Switch to ``self.text_encoder(**tk_out, output_hidden_states=True)``
(matching NucleusMoEImagePipeline) so all processor outputs are
forwarded automatically and future additions don't regress this path.

* Apply style fixes

* docs(dreamlite): address final review nits from #13815

- Replace broken cat.png URL in editing examples (both base and mobile)
  with the standard `huggingface/documentation-images` source used
  elsewhere in the diffusers docs.
- Promote the recommended guidance_scale=3.5 / image_guidance_scale=1.5
  to the default values of DreamLitePipeline.__call__, and drop the
  now-redundant explicit args from the docs examples.
- Switch the EXAMPLE_DOC_STRING examples in both pipelines from
  torch.float16 to torch.bfloat16 for consistency with the rest of the
  docs.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant