segmentador.optimize.quantize

Apply quantization and hardware-specific optimizations in segmenter models.

Module Contents

Functions

quantize_bert_model_as_onnx(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', check_cached: bool = True, verbose: bool = False) → QuantizationOutputONNX

Create a quantized BERTSegmenter model as ONNX format.

quantize_lstm_model_as_onnx(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False) → QuantizationOutputONNX

Create a quantized LSTMSegmenter model as ONNX format.

quantize_bert_model_as_torch(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) → QuantizationOutputTorch

Create a quantized BERTSegmenter model as Torch format.

quantize_lstm_model_as_torch(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) → QuantizationOutputTorch

Create a quantized LSTMSegmenter model as Torch format.

quantize_model(model: Union[segmentador.segmenter.BERTSegmenter, segmentador.segmenter.LSTMSegmenter], quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', model_output_format: str = 'onnx', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False, **kwargs: Any) → QuantizationOutput

Generate a quantized segmenter model from an existing segmenter model.

segmentador.optimize.quantize.quantize_bert_model_as_onnx(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', check_cached: bool = True, verbose: bool = False) QuantizationOutputONNX

Create a quantized BERTSegmenter model as ONNX format.

Models created from this format can be loaded for inference as:

>>> optimize.ONNXBERTSegmenter(  
...     uri_model='<quantized_model_uri>',
...     uri_tokenizer=...,
...     ...,
... )
Parameters
  • model (segmenter.BERTSegmenter) – BERTSegmenter model to be quantized.

  • quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.

  • quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.

  • intermediary_onnx_model_name (str or None, default=None) – Name to save intermediary model in ONNX format in quantized_model_dirpath. This transformation is necessary to perform all necessary optimization and quantization. If None, a name will be derived from quantized_model_filename.

  • check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.

  • verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

References

1

Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html

2

ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md

segmentador.optimize.quantize.quantize_lstm_model_as_onnx(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False) QuantizationOutputONNX

Create a quantized LSTMSegmenter model as ONNX format.

Models created from this format can be loaded for inference as:

>>> optimize.ONNXLSTMSegmenter(  
...     uri_model='<quantized_model_uri>',
...     uri_tokenizer=...,
...     ...,
... )
Parameters
  • model (segmenter.LSTMSegmenter) – LSTMSegmenter model to be quantized.

  • quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.

  • quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.

  • intermediary_onnx_model_name (str or None, default=None) – Name to save intermediary model in ONNX format in quantized_model_dirpath. This transformation is necessary to perform all necessary optimization and quantization. If None, a name will be derived from quantized_model_filename.

  • onnx_opset_version (int, default=17) – ONNX operator set version. Used only if model_output_format=’onnx’. Check [2] for more information.

  • check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.

  • verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

References

1

Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html

2

ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md

segmentador.optimize.quantize.quantize_bert_model_as_torch(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) QuantizationOutputTorch

Create a quantized BERTSegmenter model as Torch format.

Models created from this format can be loaded for inference as:

>>> optimize.TorchJITBERTSegmenter(  
...     uri_model='<quantized_model_uri>',
...     ...,
... )
Parameters
  • model (segmenter.BERTSegmenter) – BERTSegmenter model to be quantized.

  • quantized_model_filename (t.Optional[str], default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.

  • quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.

  • modules_to_quantize (t.Tuple[t.Type[torch.nn.Module], ...], default=(torch.nn.Embedding, torch.nn.Linear)) –

  • check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.

  • verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

segmentador.optimize.quantize.quantize_lstm_model_as_torch(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) QuantizationOutputTorch

Create a quantized LSTMSegmenter model as Torch format.

Models created from this format can be loaded for inference as:

>>> optimize.TorchJITLSTMSegmenter(  
...     uri_model='<quantized_model_uri>',
...     ...,
... )
Parameters
  • model (segmenter.LSTMSegmenter) – LSTMSegmenter model to be quantized.

  • quantized_model_filename (t.Optional[str], default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.

  • quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.

  • modules_to_quantize (t.Tuple[t.Type[torch.nn.Module], ...], default=(torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear)) –

  • check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.

  • verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

segmentador.optimize.quantize.quantize_model(model: Union[segmentador.segmenter.BERTSegmenter, segmentador.segmenter.LSTMSegmenter], quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', model_output_format: str = 'onnx', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False, **kwargs: Any) QuantizationOutput

Generate a quantized segmenter model from an existing segmenter model.

This function will derive the correct quantization function from the provided model type (BERT or LSTM), and the model_output_format parameter value. Check See Also section for a list of specific quantization functions.

Parameters
  • model (segmenter.BERTSegmenter or segmenter.LSTMSegmenter) – Segmenter model to quantize.

  • quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.

  • quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.

  • model_output_format ({'onnx', 'torch_jit'}, default='onnx') – Output format of quantized model. This option also determines how exactly inference with the quantized model will be done. See See Also section for information about specific configuratins of model types and output formats.

  • onnx_opset_version (int, default=17) – ONNX operator set version. Used only if model_output_format=’onnx’. Check [2] for more information.

  • check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.

  • verbose (bool, default=False) – If True, print information regarding the results.

  • **kwargs (dict) – Additional parameters passed to quantization function.

Returns

paths – Named tuple with all paths of files generated during the full quantization procedure.

Return type

t.Tuple[str, …]

See also

quantize_lstm_model_as_onnx

quantize LSTMSegmenter as model_output_format=’onnx’.

quantize_lstm_model_as_torch

quantize LSTMSegmenter as model_output_format=’torch_jit’.

quantize_bert_model_as_onnx

quantize BERTSegmenter as model_output_format=’onnx’.

quantize_bert_model_as_torch

quantize BERTSegmenter as model_output_format=’torch_jit’.

References

1

Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html

2

ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md