Model:

Prompt:

null

Optional parameters

If set to False (the default), the generation method will be greedy search, which selects the most probable continuation sequence after the prompt you provide. Greedy search is deterministic, so the same results will always be returned from the same input. When do_sample is True, tokens will be sampled from a probability distribution and will therefore vary across invocations.

Do Sample

Whether to include the input sequence in the output returned by the endpoint. The default used by InferenceClient is False, but the endpoint itself uses True by default.

Max new tokens:

Controls the amount of variation we desire from the generation. A temperature of 0 is equivalent to greedy search. If we set a value for temperature, then do_sample will automatically be enabled. The same thing happens for top_k and top_p. When doing code-related tasks, we want less variability and hence recommend a low temperature. For other tasks, such as open-ended text generation, we recommend a higher one.

Temperature:

Whether to include the input sequence in the output returned by the endpoint. The default used by InferenceClient is False, but the endpoint itself uses True by default.

Return full Text

Enables "Top-K" sampling: the model will choose from the K most probable tokens that may occur after the input sequence. Typical values are between 10 to 50.

Top K:

A list of sequences that will cause generation to stop when encountered in the output.

Stop sequences:

Separate each sequence with a comma

Response

{}

No input provided