segmentador.optimize.quantize
Apply quantization and hardware-specific optimizations in segmenter models.
Module Contents
Functions
|
Create a quantized BERTSegmenter model as ONNX format. |
|
Create a quantized LSTMSegmenter model as ONNX format. |
|
Create a quantized BERTSegmenter model as Torch format. |
|
Create a quantized LSTMSegmenter model as Torch format. |
|
Generate a quantized segmenter model from an existing segmenter model. |
- segmentador.optimize.quantize.quantize_bert_model_as_onnx(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', check_cached: bool = True, verbose: bool = False) QuantizationOutputONNX
Create a quantized BERTSegmenter model as ONNX format.
Models created from this format can be loaded for inference as:
>>> optimize.ONNXBERTSegmenter( ... uri_model='<quantized_model_uri>', ... uri_tokenizer=..., ... ..., ... )
- Parameters
model (segmenter.BERTSegmenter) – BERTSegmenter model to be quantized.
quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
intermediary_onnx_model_name (str or None, default=None) – Name to save intermediary model in ONNX format in quantized_model_dirpath. This transformation is necessary to perform all necessary optimization and quantization. If None, a name will be derived from quantized_model_filename.
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.
- Returns
paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.
- Return type
t.Tuple[str, …]
References
- 1
Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html
- 2
ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md
- segmentador.optimize.quantize.quantize_lstm_model_as_onnx(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, intermediary_onnx_model_name: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False) QuantizationOutputONNX
Create a quantized LSTMSegmenter model as ONNX format.
Models created from this format can be loaded for inference as:
>>> optimize.ONNXLSTMSegmenter( ... uri_model='<quantized_model_uri>', ... uri_tokenizer=..., ... ..., ... )
- Parameters
model (segmenter.LSTMSegmenter) – LSTMSegmenter model to be quantized.
quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
intermediary_onnx_model_name (str or None, default=None) – Name to save intermediary model in ONNX format in quantized_model_dirpath. This transformation is necessary to perform all necessary optimization and quantization. If None, a name will be derived from quantized_model_filename.
onnx_opset_version (int, default=17) – ONNX operator set version. Used only if model_output_format=’onnx’. Check [2] for more information.
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.
- Returns
paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.
- Return type
t.Tuple[str, …]
References
- 1
Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html
- 2
ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md
- segmentador.optimize.quantize.quantize_bert_model_as_torch(model: segmentador.segmenter.BERTSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) QuantizationOutputTorch
Create a quantized BERTSegmenter model as Torch format.
Models created from this format can be loaded for inference as:
>>> optimize.TorchJITBERTSegmenter( ... uri_model='<quantized_model_uri>', ... ..., ... )
- Parameters
model (segmenter.BERTSegmenter) – BERTSegmenter model to be quantized.
quantized_model_filename (t.Optional[str], default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
modules_to_quantize (t.Tuple[t.Type[torch.nn.Module], ...], default=(torch.nn.Embedding, torch.nn.Linear)) –
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.
- Returns
paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.
- Return type
t.Tuple[str, …]
- segmentador.optimize.quantize.quantize_lstm_model_as_torch(model: segmentador.segmenter.LSTMSegmenter, quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', modules_to_quantize: Union[Set[Type[torch.nn.Module]], Tuple[Type[torch.nn.Module], Ellipsis]] = (torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear), check_cached: bool = True, verbose: bool = False) QuantizationOutputTorch
Create a quantized LSTMSegmenter model as Torch format.
Models created from this format can be loaded for inference as:
>>> optimize.TorchJITLSTMSegmenter( ... uri_model='<quantized_model_uri>', ... ..., ... )
- Parameters
model (segmenter.LSTMSegmenter) – LSTMSegmenter model to be quantized.
quantized_model_filename (t.Optional[str], default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
modules_to_quantize (t.Tuple[t.Type[torch.nn.Module], ...], default=(torch.nn.Embedding, torch.nn.LSTM, torch.nn.Linear)) –
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.
- Returns
paths – File URIs related from generated files during the quantization procedure. The final model URI can be accessed from the output_uri attribute.
- Return type
t.Tuple[str, …]
- segmentador.optimize.quantize.quantize_model(model: Union[segmentador.segmenter.BERTSegmenter, segmentador.segmenter.LSTMSegmenter], quantized_model_filename: Optional[str] = None, quantized_model_dirpath: str = './quantized_models', model_output_format: str = 'onnx', onnx_opset_version: int = 17, check_cached: bool = True, verbose: bool = False, **kwargs: Any) QuantizationOutput
Generate a quantized segmenter model from an existing segmenter model.
This function will derive the correct quantization function from the provided model type (BERT or LSTM), and the model_output_format parameter value. Check
See Alsosection for a list of specific quantization functions.- Parameters
model (segmenter.BERTSegmenter or segmenter.LSTMSegmenter) – Segmenter model to quantize.
quantized_model_filename (str or None, default=None) – Output filename. If None, a long and descriptive name will be derived from model’s parameters.
quantized_model_dirpath (str, default='./quantized_models') – Path to output file directory, which the resulting quantized model will be stored, alongside any possible coproducts also generated during the quantization procedure.
model_output_format ({'onnx', 'torch_jit'}, default='onnx') – Output format of quantized model. This option also determines how exactly inference with the quantized model will be done. See
See Alsosection for information about specific configuratins of model types and output formats.onnx_opset_version (int, default=17) – ONNX operator set version. Used only if model_output_format=’onnx’. Check [2] for more information.
check_cached (bool, default=True) – If True, check whether a model with the same model exists before quantization. If this happens to be the case, this function will not produce any new models.
verbose (bool, default=False) – If True, print information regarding the results.
**kwargs (dict) – Additional parameters passed to quantization function.
- Returns
paths – Named tuple with all paths of files generated during the full quantization procedure.
- Return type
t.Tuple[str, …]
See also
quantize_lstm_model_as_onnxquantize LSTMSegmenter as model_output_format=’onnx’.
quantize_lstm_model_as_torchquantize LSTMSegmenter as model_output_format=’torch_jit’.
quantize_bert_model_as_onnxquantize BERTSegmenter as model_output_format=’onnx’.
quantize_bert_model_as_torchquantize BERTSegmenter as model_output_format=’torch_jit’.
References
- 1
Graph Optimizations in ONNX Runtime. Available at: https://onnxruntime.ai/docs/performance/graph-optimizations.html
- 2
ONNX Operator Schemas. Available at: https://github.com/onnx/onnx/blob/main/docs/Operators.md