Skip to content

2.9.0

Compare
Choose a tag to compare
@xenova xenova released this 21 Nov 14:00
· 115 commits to main since this release

What's new?

😍 Exciting new tasks!

Transformers.js v2.9.0 adds support for three new tasks: (1) Depth estimation, (2) Zero-shot object detection, and (3) Optical document understanding.

πŸ•΅οΈβ€β™‚οΈ Depth Estimation

The task of predicting the depth of objects present in an image. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create depth estimation pipeline
let depth_estimator = await pipeline('depth-estimation', 'Xenova/dpt-hybrid-midas');

// Predict depth for image
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let output = await depth_estimator(url);
Input Output
input output
Raw output
// {
//   predicted_depth: Tensor {
//     dims: [ 384, 384 ],
//     type: 'float32',
//     data: Float32Array(147456) [ 542.859130859375, 545.2833862304688, 546.1649169921875, ... ],
//     size: 147456
//   },
//   depth: RawImage {
//     data: Uint8Array(307200) [ 86, 86, 86, ... ],
//     width: 640,
//     height: 480,
//     channels: 1
//   }
// }

🎯 Zero-shot Object Detection

The task of identifying objects of classes that are unseen during training. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create zero-shot object detection pipeline
let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

// Predict bounding boxes
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
let candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
let output = await detector(url, candidate_labels);

image

Raw output
// [
//   {
//     score: 0.24392342567443848,
//     label: 'human face',
//     box: { xmin: 180, ymin: 67, xmax: 274, ymax: 175 }
//   },
//   {
//     score: 0.15129457414150238,
//     label: 'american flag',
//     box: { xmin: 0, ymin: 4, xmax: 106, ymax: 513 }
//   },
//   {
//     score: 0.13649864494800568,
//     label: 'helmet',
//     box: { xmin: 277, ymin: 337, xmax: 511, ymax: 511 }
//   },
//   {
//     score: 0.10262022167444229,
//     label: 'rocket',
//     box: { xmin: 352, ymin: -1, xmax: 463, ymax: 287 }
//   }
// ]

πŸ“ Optical Document Understanding (image-to-text)

This task involves translating images of scientific PDFs to markdown, enabling easier access to them. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create image-to-text pipeline
let pipe = await pipeline('image-to-text', 'Xenova/nougat-small');

// Generate markdown
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
let output = await pipe(url, {
  min_length: 1,
  max_new_tokens: 40,
  bad_words_ids: [[pipe.tokenizer.unk_token_id]],
});
// [{ generated_text: "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: lblecher@meta.com\n\nGuillem Cucur" }]
See input image

image

πŸ’» New architectures: Nougat, DPT, GLPN, OwlViT

We added support for 4 new architectures, bringing the total up to 61!

  • DPT for depth estimation. See here for the list of available models.
  • GLPN for depth estimation. See here for the list of available models.
  • OwlViT for zero-shot object detection. See here for the list of available models.
  • Nougat for optical understanding of academic documents (image-to-text). See here for the list of available models.

πŸ”¨ Other improvements

  • Add support for Grouped Query Attention on Llama Model by @felladrin in #393
  • Implement max character check by @samlhuillier in #398
  • Add CLIPFeatureExtractor (and tests) in #387
  • Add jsDelivr stats to README in #395
  • Update sharp dependency version in #400

πŸ› Bug fixes

  • Move tensor clone to fix Worker ownership NaN issue by @kungfooman in #404
  • Add default token_type_ids for multilingual-e5-* models by @do-me in #403
  • Ensure WASM fallback does not crash in GH actions in #402

πŸ€— New contributors

Full Changelog: 2.8.0...2.9.0