Rule Configuration Guide
Rule Granularity
There are two levels of rule configuration available for every customer.
Default Rules: Often described as global rules. Default rules are auto-applied to all of your Tasks.
Task Rules: Task rules are rules that apply to just one single Task. A Task is a specific LLM application or use case.
Advanced Access Control Configuration - User Permissions
Teams may be looking for more advanced access control configuration around user permissions. Based on the way Shield is called in your application, teams are able to set up these more advanced user permission structures by mapping users to different Tasks.
Default Rules
Default Rules exist globally within an instance of Arthur Shield. Default rules are applied universally across existing Tasks and any subsequently created new Tasks. However, default rules can be enabled or disabled for a given Task as needed.
Using the Shield APIs, you can view existing default rules or to create a new default rule. Once a default rule is created, it is immutable.
Task Rules
Task Rules only apply to a single Task. A Task is a specific LLM application or use case. When you Create a Task, all existing default rules will be auto-applied for this new Task. Once a Task is created, you can create a rule to be applied only to this Task. You can also enable or disable any rule for a given Task as needed.
Rule Configurations
In order to create a new Task Rule or create a new Default Rule endpoint, you will need to include the following fields in the RequestBody:
- name: a human-readable string used to identify what this Task is for
- type: the type of rule you want to create. Type should be one of
KeywordRule
,RegexRule
,ModelSensitiveDataRule
,ModelHallucinationRule
,PromptInjectionRule
,ToxicityRule
- apply_to_prompt: boolean indicating whether the rule should be applied to prompts when the
validate_prompt
endpoint is called - apply_to_response: boolean indicating whether the rule should be applied to responses when the
validate_response
endpoint is called - config: configuration object depending on the type of rule.
See below examples to understand how specifically the rule should be configured by type or rule.
For API documentation please head to https://<your-shield-instance>/docs
.
Model Hallucination Rule
Hallucination rules only apply to responses. When creating a Hallucination rule, the following fields should be set as:
- type:
ModelHallucinationRule
- apply_to_prompt: should always be false
- apply_to_response: should always be true
- config: No additional configuration is necessary. This field should be omitted or left blank.
Example
{
"name": "Example Hallucination Rule",
"type": "ModelHallucinationRule",
"apply_to_prompt": false,
"apply_to_response": true
}
Model Hallucination Rule V2
Hallucination rules only apply to responses. V2 differs from V1 by evaluating claims made in the response verbatim. As a result any claim extracted by V2 can be found in the response text whereas claims extracted from V1 may not. V1 will also return upon finding any hallucination while V2 will evaluate every claim in the response, potentially returning multiple instances. Additionally, V2 is less expensive to run from a token consumption perspective. When creating a Hallucination V2 rule, the following fields should be set as:
- type:
ModelHallucinationRuleV2
- apply_to_prompt: should always be false
- apply_to_response: should always be true
- config: No additional configuration is necessary. This field should be omitted or left blank.
Example
{
"name": "Example HallucinationV2 Rule",
"type": "ModelHallucinationRuleV2",
"apply_to_prompt": false,
"apply_to_response": true
}
{
"id": "00000000-0000-0000-0000-000000000000",
"name": "",
"scope": "task",
"result": "Pass",
"details":
{
"score": true,
"message": "All claims were valid!",
"claims":
[
{
"claim": "The Milky Way, our home galaxy, is estimated to contain over 100 billion stars and is just one of billions of galaxies in the observable universe.",
"valid": true,
"reason": ""
},
{
"claim": "Astronomers use a unit of measurement called a \"light-year\" to describe vast cosmic distances, which is the distance that light travels in one year, roughly 5.88 trillion miles (9.46 trillion kilometers).",
"valid": true,
"reason": ""
},
{
"claim": "Black holes are incredibly dense regions in space where gravity is so strong that nothing, not even light, can escape their grasp, making them invisible to direct observation.",
"valid": true,
"reason": ""
},
{
"claim": "The study of cosmic microwave background radiation, leftover from the Big Bang, provides crucial evidence supporting the Big Bang theory as the origin of the universe.",
"valid": true,
"reason": ""
},
{
"claim": "Exoplanets are planets located outside our solar system, and thousands have been discovered to date, sparking interest in the search for extraterrestrial life.",
"valid": true,
"reason": ""
}
]
}
}
Prompt Injection
Prompt Injection rules only apply to prompts. When creating a Prompt Injection rule, the following fields should be set as:
- type:
PromptInjectionRule
- apply_to_prompt: should always be true
- apply_to_response: should always be false
- config: No additional configuration is necessary. This field should be omitted or left blank.
Example
{
"name": "Example Prompt Injection Rule",
"type": "PromptInjectionRule",
"apply_to_prompt": true,
"apply_to_response": false
}
Toxicity Rule
Toxicity detection rules are utilized to identify toxic text in prompt or response data. Toxicity detection rules can be applied to prompts and/or responses. When creating a Toxicity rule, the following fields should be set as:
- type:
ToxicityRule
- apply_to_prompt: can be true or false
- apply_to_response: can be true or false
Example
{
"name": "Example Toxicity Rule",
"type": "ToxicityRule",
"apply_to_prompt": true,
"apply_to_response": true
}
{
"id": "00000000-0000-0000-0000-000000000000",
"name": "",
"scope": "task",
"result": "Fail",
"details":
{
"score": null,
"toxicity_score": 0.05,
"message": "No toxicity detected!",
}
}
PII Rule
PII detection rules are utilized to identify PII in prompt or response data. PII detection rules can be applied to prompts and/or responses. When creating a PII rule, the following fields should be set as:
- type:
PIIDataRule
- apply_to_prompt: can be true or false
- apply_to_response: can be true or false
- config: No additional configuration is necessary, but accepted!
- disabled_pii_entities: A list of PII entities you wish to disable for the PII Rule you are creating. Entity types here will not be evaluated and will not return PII failures. Accepted values are below:
CREDIT_CARD,CRYPTO,DATE_TIME,EMAIL_ADDRESS,IBAN_CODE,IP_ADDRESS,NRP,LOCATION,PERSON,PHONE_NUMBER,MEDICAL_LICENSE,URL,US_BANK_NUMBER,US_DRIVER_LICENSE,US_ITIN,US_PASSPORT,US_SSN
- confidence_threshold:
A value between 0 (low confidence) and 1 (high confidence). Any PII with a confidence score below the threshold will not be returned.DEPRECATED FIELD, this is a no-op - allow_list: String values you want to explicitly allow and not flag as PII even if the entity type it falls under is enabled.
Example
{
"name": "PII Rule",
"type": "PIIDataRule",
"apply_to_prompt": true,
"apply_to_response": true,
"config": {
"disabled_pii_entities": [
"EMAIL_ADDRESS",
"PHONE_NUMBER"
],
"confidence_threshold": "0.5",
"allow_list": [
"arthur.ai",
"Arthur"
]
}
}
{
"id": "00000000-0000-0000-0000-000000000000",
"name": "",
"scope": "task",
"result": "Fail",
"details":
{
"score": null,
"message": "PII found in data: IP_ADDRESS,PHONE_NUMBER",
"pii_results":
[
"IP_ADDRESS",
"PHONE_NUMBER"
]
}
}
Model Sensitive Data Rule
As sensitive data is use-case-specific, teams must provide some default examples of sensitive data leakage to inform Arthur Shield's sensitive data rule. Sensitive Data rules can be applied to prompts and/or responses. When creating a Sensitive Data rule, the following fields should be set as:
- type:
ModelSensitiveDataRule
- apply_to_prompt: can be true or false
- apply_to_response: can be true or false
- config:
- examples: a list of examples that should be used to define what is or isn't sensitive for this use case. Each item in this list should follow the following schema:
- example: example text
- result: true if this example text contains sensitive data, false if this example text does not contain sensitive data
- hint: (optional) a string descriptor of the type of sensitive data to be caught
- examples: a list of examples that should be used to define what is or isn't sensitive for this use case. Each item in this list should follow the following schema:
Example
{
"name":"Example Sensitive Data Rule",
"type": "ModelSensitiveDataRule",
"apply_to_prompt": true,
"apply_to_response": true,
"config": {
"examples": [
{
"example": "John has O negative blood type.",
"result": true
},
{
"example": "67% of users have O negative blood type.",
"result": false
}
],
"hint": "specific individual's medical information"
}
}
Since these configurations are specified for each use case by working with the teams to provide examples, we recommend setting up sensitive data at the task level for your Arthur Shield instance.
Regex Rule
Regex type rules are utilized to create PII rules or Custom rules. Regex rules can be applied to prompts and/or responses. When creating a Regex rule, the following fields should be set as:
- type:
RegexRule
- apply_to_prompt: can be true or false
- apply_to_response: can be true or false
- config:
- regex_patterns: list of regex patterns that will be run against the input
Example
{
"name": "Number Pattern Regex Rule",
"type": "RegexRule",
"apply_to_prompt": true,
"apply_to_response": true,
"config": {
"regex_patterns": [
"\\d{3}-\\d{2}-\\d{4}",
"\\d{5}-\\d{6}-\\d{7}"
]
}
}
Keyword Rule
Keyword type rules are utilized to create Custom rules for detecting key words or key phrases. Keyword rules can be applied to prompts and/or responses. When creating a Keyword rule, the following fields should be set as:
- type:
KeywordRule
- apply_to_prompt: can be true or false
- apply_to_response: can be true or false
- config:
- keywords: List of keywords or key phrases
Example
{
"name": "Custom Keyword Rule",
"type": "KeywordRule",
"apply_to_prompt": true,
"apply_to_response": true,
"config": {
"keywords": [
"Blocked_Keyword_1",
"Blocked_Keyword_2"
]
}
}
Understanding Rule Results
Currently, all rules return a binary result (meaning the rule either passes or fails).
Both the validate_prompt
and validate_response
endpoints return the outcome of each rule in the rule_results
field.
{
"inference_id": "4dd1fae1-34b9-4aec-8abe-fe7bf12af31d",
"rule_results": [
{
"id": "90f18c69-d793-4913-9bde-a0c7f3643de0",
"name": "Example Hallucination Rule",
"scope": "default",
"result": "Pass"
},
{
"id": "946c4a44-b367-4229-84d4-1a8e461cb132",
"name": "Example Sensitive Data Rule",
"scope": "default",
"result": "Fail"
},
{
"id": "123c4a44-b367-4229-84d4-1a8e461cb132",
"name": "Example Regex Rule",
"scope": "default",
"result": "Pass"
}
]
}
Updated 5 months ago