nlp_zero_shot_classify

BETA

This component is mostly stable but breaking changes could still be made outside of major version releases if a fundamental problem with the component is found.

Performs zero-shot text classification using a Hugging Face 🤗 NLP pipeline with an ONNX Runtime model.

Introduced in version v1.11.0.

Common
Advanced

# Common config fields, showing default values
label: ""
nlp_zero_shot_classify:
  name: "" # No default (optional)
  path: /path/to/models/my_model.onnx # No default (required)
  labels: [] # No default (required)
  multi_label: false
  hypothesis_template: This example is {}.

# All config fields, showing default values
label: ""
nlp_zero_shot_classify:
  name: "" # No default (optional)
  path: /path/to/models/my_model.onnx # No default (required)
  enable_download: false
  download_options:
    repository: KnightsAnalytics/distilbert-NER # No default (required)
    onnx_filepath: model.onnx
  labels: [] # No default (required)
  multi_label: false
  hypothesis_template: This example is {}.

Zero Shot Text Classification

Zero-shot text classification allows you to classify text into any labels without training on those specific labels. It uses Natural Language Inference (NLI) models to determine if a text entails each candidate label. This is more flexible than regular text classification as labels can be chosen at runtime, but may be slower. Common use cases include sentiment analysis, topic classification, intent detection, and content moderation. This component uses Hugot, a library that provides an interface for running Open Neural Network Exchange (ONNX) models and transformer pipelines, with a focus on NLP tasks.

Currently, Bento only implements:

What is a pipeline?

From HuggingFace docs:

A pipeline in 🤗 Transformers is an abstraction referring to a series of steps that are executed in a specific order to preprocess and transform data and return a prediction from a model. Some example stages found in a pipeline might be data preprocessing, feature extraction, and normalization.

warning

While, only models in ONNX format are supported, exporting existing formats to ONNX is both possible and straightforward in most standard ML libraries. For more on this, check out the ONNX conversion docs. Otherwise, check out using HuggingFace Optimum for easy model conversion.

Examples

Emotion Classification
Multi-Label State Classification

Classify text emotions using zero-shot approach with any custom labels.

pipeline:
  processors:
    - nlp_zero_shot_classify:
        path: "KnightsAnalytics/deberta-v3-base-zeroshot-v1"
        labels: ["fun", "dangerous", "boring"]
        multi_label: false
# In: "I am going to the park"
# Out: {"sequence": "I am going to the park", "labels": ["fun", "boring", "dangerous"], "scores": [0.77, 0.15, 0.08]}```

</TabItem>
<TabItem value="Multi-Label State Classification">

Classify emotional states with multiple labels enabled.

```yaml
pipeline:
  processors:
    - nlp_zero_shot_classify:
        path: "KnightsAnalytics/deberta-v3-base-zeroshot-v1"
        labels: ["busy", "relaxed", "stressed"]
        multi_label: true
        hypothesis_template: "This person is {}."
# In: "Please don't bother me, I'm in a rush"
# Out: {"sequence": "Please don't bother me, I'm in a rush", "labels": ["stressed", "busy", "relaxed"], "scores": [0.89, 0.11, 0.007]}

Fields

`name`

Name of the hugot pipeline. Defaults to a random UUID if not set.

Type: string

`path`

Path to the ONNX model file, or directory containing the model. When downloading (enable_download: true), this becomes the destination and must be a directory.

Type: string

# Examples

path: /path/to/models/my_model.onnx

path: /path/to/models/

`enable_download`

When enabled, attempts to download an ONNX Runtime compatible model from HuggingFace specified in repository.

Type: bool
Default: false

`download_options`

Options used to download a model directly from HuggingFace. Before the model is downloaded, validation occurs to ensure the remote repository contains both an.onnx and tokenizers.json file.

Type: object

`download_options.repository`

The name of the huggingface model repository.

Type: string

# Examples

repository: KnightsAnalytics/distilbert-NER

repository: KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english

repository: sentence-transformers/all-MiniLM-L6-v2

`download_options.onnx_filepath`

Filepath of the ONNX model within the repository. Only needed when multiple .onnx files exist.

Type: string
Default: "model.onnx"

# Examples

onnx_filepath: onnx/model.onnx

onnx_filepath: onnx/model_quantized.onnx

onnx_filepath: onnx/model_fp16.onnx

`labels`

The set of possible class labels to classify each sequence into.

Type: array

# Examples

labels:
  - positive
  - negative
  - neutral

`multi_label`

Whether multiple labels can be true. If false, scores sum to 1. If true, each label is scored independently.

Type: bool
Default: false

`hypothesis_template`

Template to turn each label into an NLI-style hypothesis. Must include where the label will be inserted.

Type: string
Default: "This example is {}."

Zero Shot Text Classification​

What is a pipeline?​

Examples​

Fields​

name​

path​

enable_download​

download_options​

download_options.repository​

download_options.onnx_filepath​

labels​

multi_label​

hypothesis_template​

Zero Shot Text Classification

What is a pipeline?

Examples

Fields

`name`

`path`

`enable_download`

`download_options`

`download_options.repository`

`download_options.onnx_filepath`

`labels`

`multi_label`

`hypothesis_template`