`segmentador.optimize.quantize`

Apply quantization and hardware-specific optimizations in segmenter models.

Module Contents

Functions

`quantize_bert_model_as_onnx`(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', check_cached: bool = True, verbose: bool = False) → QuantizationOutputONNX	Create a quantized BERTSegmenter model as ONNX format.
`quantize_lstm_model_as_onnx`(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False) → QuantizationOutputONNX	Create a quantized LSTMSegmenter model as ONNX format.
`quantize_bert_model_as_torch`(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) → QuantizationOutputTorch	Create a quantized BERTSegmenter model as Torch format.
`quantize_lstm_model_as_torch`(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) → QuantizationOutputTorch	Create a quantized LSTMSegmenter model as Torch format.
`quantize_model`(model: Union[segmentador.segmenter.BERTSegmenter, segmentador.segmenter.LSTMSegmenter], quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', model_output_format: str = 'onnx', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False, **kwargs: Any) → QuantizationOutput	Generate a quantized segmenter model from an existing segmenter model.

segmentador.optimize.quantize.quantize_bert_model_as_onnx(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', check_cached: bool = True, verbose: bool = False) → QuantizationOutputONNX

Create a quantized BERTSegmenter model as ONNX format.

Models created from this format can be loaded for inference as:

>>> optimize.ONNXBERTSegmenter(  
...     uri_model='<quantized_model_uri>',
...     uri_tokenizer=...,
...     ...,
... )

Parameters

model (segmenter.BERTSegmenter) – BERTSegmenter model to be quantized.
quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
intermediary_onnx_model_name (str or None, default=None) – Name to save intermediary model in ONNX format in quantized_model_dirpath. This transformation is necessary to perform all necessary optimization and quantization. If None, a name will be derived from quantized_model_filename.
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

References

1: Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html
2: ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md

segmentador.optimize.quantize.quantize_lstm_model_as_onnx(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False) → QuantizationOutputONNX

Create a quantized LSTMSegmenter model as ONNX format.

Models created from this format can be loaded for inference as:

>>> optimize.ONNXLSTMSegmenter(  
...     uri_model='<quantized_model_uri>',
...     uri_tokenizer=...,
...     ...,
... )

Parameters

model (segmenter.LSTMSegmenter) – LSTMSegmenter model to be quantized.
quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
intermediary_onnx_model_name (str or None, default=None) – Name to save intermediary model in ONNX format in quantized_model_dirpath. This transformation is necessary to perform all necessary optimization and quantization. If None, a name will be derived from quantized_model_filename.
onnx_opset_version (int, default=17) – ONNX operator set version. Used only if model_output_format=’onnx’. Check [2] for more information.
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

References

1: Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html
2: ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md

segmentador.optimize.quantize.quantize_bert_model_as_torch(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) → QuantizationOutputTorch

Create a quantized BERTSegmenter model as Torch format.

Models created from this format can be loaded for inference as:

>>> optimize.TorchJITBERTSegmenter(  
...     uri_model='<quantized_model_uri>',
...     ...,
... )

Parameters

model (segmenter.BERTSegmenter) – BERTSegmenter model to be quantized.
quantized_model_filename (t.Optional[str], default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
modules_to_quantize (t.Tuple[t.Type[torch.nn.Module], ...], default=(torch.nn.Embedding, torch.nn.Linear)) –
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

segmentador.optimize.quantize.quantize_lstm_model_as_torch(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) → QuantizationOutputTorch

Create a quantized LSTMSegmenter model as Torch format.

Models created from this format can be loaded for inference as:

>>> optimize.TorchJITLSTMSegmenter(  
...     uri_model='<quantized_model_uri>',
...     ...,
... )

Parameters

model (segmenter.LSTMSegmenter) – LSTMSegmenter model to be quantized.
quantized_model_filename (t.Optional[str], default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
modules_to_quantize (t.Tuple[t.Type[torch.nn.Module], ...], default=(torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear)) –
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.

Returns

paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.

Return type

t.Tuple[str, …]

segmentador.optimize.quantize.quantize_model(model: Union[segmentador.segmenter.BERTSegmenter, segmentador.segmenter.LSTMSegmenter], quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', model_output_format: str = 'onnx', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False, **kwargs: Any) → QuantizationOutput

Generate a quantized segmenter model from an existing segmenter model.

This function will derive the correct quantization function from the provided model type (BERT or LSTM), and the model_output_format parameter value. Check See Also section for a list of specific quantization functions.

Parameters

model (segmenter.BERTSegmenter or segmenter.LSTMSegmenter) – Segmenter model to quantize.
quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
model_output_format ({'onnx', 'torch_jit'}, default='onnx') – Output format of quantized model. This option also determines how exactly inference with the quantized model will be done. See See Also section for information about specific configuratins of model types and output formats.
onnx_opset_version (int, default=17) – ONNX operator set version. Used only if model_output_format=’onnx’. Check [2] for more information.
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.
**kwargs (dict) – Additional parameters passed to quantization function.

Returns

paths – Named tuple with all paths of files generated during the full quantization procedure.

Return type

t.Tuple[str, …]

segmentador.optimize.quantize

Module Contents

Functions

`segmentador.optimize.quantize`