GuidesChangelog
Log In
Guides

Sensitive Data Leakage

Sensitive data leakage refers to the inadvertent exposure or disclosure of confidential or private information. There are 2 scenarios where you may want to detect sensitive data to protect your company and users:

  1. Detecting Sensitive Data in the Prompt - If your application is using a public or externally-hosted LLM, all data that is sent to that model may be fed back into the training data or shared across contexts with other users. In this case, you want to ensure that your user is not accidentally including sensitive company information in the prompts to the LLM.
  2. Detecting Sensitive Data in the Response - Teams develop systems to add private data as context to LLMs (through fine-tuning or retrieval) to maximize the types of questions that the LLMs can answer. However, your end users may not have the right data permissions to view that private data granularly. In this case, your application should not return information considered sensitive to that end user.

Types of Sensitive Data

Broadly defined, sensitive data is confidential information stored, processed, or managed by an organization that should only be accessible by authorized users. The types of data that fall under this categorization are use-case specific.

For example, imagine we have augmented an LLM with client investment portfolio data. We want our internal users to be able to take advantage of the productivity increase by utilizing our application to ask broad questions across portfolios. However, we would want to block responses that return specific user information or client holdings they should not have access to.

Another common type of evaluation that teams may consider to be sensitive data is personal identifying information (PII). This can be in the form of common PII (such as social security numbers) or use-case-specific identifiers (e.g., API-key tokens). Please refer to our PII Leakage documentation or our Custom Rules documentation for more information on those Shield rules.

The Shield Approach

Arthur's approach leverages LLM-prompting with few-shot examples as a general purpose tool for detecting sensitive data that would be otherwise hard to capture via concrete regex patterns.

To use this rule, prepare a set of examples that you expect to PASS or FAIL according to your category of sensitive data you need to detect. Once this example set is prepared, you configure your sensitive data rule with your examples, and the rule is then used to detect your category of sensitive data from then on.

Preparing examples to configure your sensitive data rule

Positive and negative examples

It is helpful to collect both positive and negative examples to configure your rule (for example, if you wanted to detect medical data with your sensitive data rule, positive examples would be instances of medical text and negative examples would be instances of some other type of non-medical text). Finding negative examples to go with your positive examples reduces the false positive rate of your rule by giving the LLM more examples of what not to flag.

Sufficient examples

It is helpful to collect roughly 2 to 20 examples of the type of sensitive data you need to detect when you are configuring your sensitive data rule.

Typically performance improves by adding more examples, but we occasionally observed a performance dropoff adding too many examples - this dropoff tended to begin at 10 to 15 examples.

Be sure to test your rule on a validation set before productionizing your rule to determine which set of examples works best for your use case!

Adding Hints

It is helpful to add a description of the type of sensitive data you are looking for as a hint when you are creating the sensitive data rule.

Adding a hint performs quite well on our benchmarks overall - it lifts the performance of the check across benchmark datasets while not using that many tokens.

Benchmarks

The benchmark datasets we assembled contain samples from the AI4Privacy PII-Masking dataset. Each dataset is focused on one category of sensitive data from the types included in the AI4Privacy dataset: Passwords, Usernames, Company Names, and Job Titles (we are benchmarking on categories of sensitive data not covered by the shield PII Leakage rule)

When running the benchmarks, we sample N = 30 rows from each dataset.

Each row from our benchmark datasets contains a text and a label, which is a boolean. If label == True, this indicates that the text contains the type of sensitive data the dataset is focused on. For example, if text contains a username in the Username dataset we label it True, but if text contains a username and it is in the Company Names dataset (and text doesn't contain a company namy) we label it False

Our Recommendation for Your On-Premises Shield Deployment

Based on these benchmarks, we recommend using gpt-3.5-turbo-0125 with ~8 examples and a hint - this seems to be a high performing and relatively low-cost and low-latency configuration.

Consult the Azure OpenAI Model documentation to see which models work best for your deployment.

Benchmark Data


Note on Cost & Latency: Numbers represent the total calculated for an entire benchmark set of N = 30. Cost is computed as the # of prompt & completion tokens used, multiplied by their respective prices.

DatasetModel# ExamplesHintCost (USD)Latency (s)PrecisionRecallF1
usernamegpt-4-1106-preview14yes0.47728.230.681.00.81
usernamegpt-4-1106-preview14no0.34126.760.450.770.57
usernamegpt-4-1106-preview8yes0.28934.310.821.00.9
usernamegpt-4-1106-preview8no0.20122.990.620.560.59
usernamegpt-4-1106-preview2yes0.11419.990.571.00.72
usernamegpt-4-1106-preview2no0.08726.790.460.920.62
usernamegpt-4-0125-preview14yes0.47734.080.651.00.79
usernamegpt-4-0125-preview14no0.34120.50.430.920.59
usernamegpt-4-0125-preview8yes0.26420.730.71.00.83
usernamegpt-4-0125-preview8no0.23720.160.620.880.73
usernamegpt-4-0125-preview2yes0.11425.770.651.00.79
usernamegpt-4-0125-preview2no0.08717.740.410.920.57
usernamegpt-3.5-turbo-16k-061314yes0.14314.240.610.850.71
usernamegpt-3.5-turbo-16k-061314no0.10212.770.440.850.58
usernamegpt-3.5-turbo-16k-06138yes0.07910.210.670.920.77
usernamegpt-3.5-turbo-16k-06138no0.05610.320.480.850.61
usernamegpt-3.5-turbo-16k-06132yes0.0349.630.50.770.61
usernamegpt-3.5-turbo-16k-06132no0.0268.620.430.920.59
usernamegpt-3.5-turbo-110614yes0.04816.320.550.850.67
usernamegpt-3.5-turbo-110614no0.03415.260.420.770.54
usernamegpt-3.5-turbo-11068yes0.02612.640.591.00.74
usernamegpt-3.5-turbo-11068no0.03512.280.640.740.68
usernamegpt-3.5-turbo-11062yes0.01110.220.481.00.65
usernamegpt-3.5-turbo-11062no0.00910.110.440.920.6
usernamegpt-3.5-turbo-012514yes0.02419.480.610.850.71
usernamegpt-3.5-turbo-012514no0.01717.840.410.850.55
usernamegpt-3.5-turbo-01258yes0.01312.50.681.00.81
usernamegpt-3.5-turbo-01258no0.00912.240.670.90.77
usernamegpt-3.5-turbo-01252yes0.00611.140.481.00.65
usernamegpt-3.5-turbo-01252no0.00411.870.430.920.59
passwordgpt-4-1106-preview14yes0.43134.360.90.820.86
passwordgpt-4-1106-preview14no0.30231.260.60.860.71
passwordgpt-4-1106-preview8yes0.28323.880.891.00.94
passwordgpt-4-1106-preview8no0.20222.470.820.880.85
passwordgpt-4-1106-preview2yes0.11222.540.81.00.89
passwordgpt-4-1106-preview2no0.08524.590.650.940.77
passwordgpt-4-0125-preview14yes0.4821.380.91.00.95
passwordgpt-4-0125-preview14no0.29521.190.70.780.74
passwordgpt-4-0125-preview8yes0.28320.430.941.00.97
passwordgpt-4-0125-preview8no0.20219.190.830.940.88
passwordgpt-4-0125-preview2yes0.11220.110.891.00.94
passwordgpt-4-0125-preview2no0.08519.10.680.940.79
passwordgpt-3.5-turbo-16k-061314yes0.13614.570.51.00.67
passwordgpt-3.5-turbo-16k-061314no0.10813.010.650.730.69
passwordgpt-3.5-turbo-16k-06138yes0.08513.870.880.940.91
passwordgpt-3.5-turbo-16k-06138no0.0610.60.730.690.71
passwordgpt-3.5-turbo-16k-06132yes0.0349.250.850.690.76
passwordgpt-3.5-turbo-16k-06132no0.0259.330.60.560.58
passwordgpt-3.5-turbo-110614yes0.04615.620.590.930.72
passwordgpt-3.5-turbo-110614no0.0319.40.520.860.65
passwordgpt-3.5-turbo-11068yes0.02811.00.761.00.86
passwordgpt-3.5-turbo-11068no0.02115.60.640.880.74
passwordgpt-3.5-turbo-11062yes0.0119.90.790.940.86
passwordgpt-3.5-turbo-11062no0.00911.930.60.560.58
passwordgpt-3.5-turbo-012514yes0.0220.30.761.00.87
passwordgpt-3.5-turbo-012514no0.01718.960.60.690.64
passwordgpt-3.5-turbo-01258yes0.01414.030.891.00.94
passwordgpt-3.5-turbo-01258no0.0114.890.630.750.69
passwordgpt-3.5-turbo-01252yes0.00616.60.891.00.94
passwordgpt-3.5-turbo-01252no0.00413.080.60.560.58
jobtitlegpt-4-1106-preview14yes0.40125.20.881.00.93
jobtitlegpt-4-1106-preview14no0.27826.070.620.50.55
jobtitlegpt-4-1106-preview8yes0.25826.940.881.00.94
jobtitlegpt-4-1106-preview8no0.18124.90.310.310.31
jobtitlegpt-4-1106-preview2yes0.26128.320.841.00.91
jobtitlegpt-4-1106-preview2no0.23225.680.640.430.51
jobtitlegpt-4-0125-preview14yes0.45224.010.911.00.95
jobtitlegpt-4-0125-preview14no0.31622.50.270.20.23
jobtitlegpt-4-0125-preview8yes0.28322.520.811.00.9
jobtitlegpt-4-0125-preview8no0.17820.890.50.80.62
jobtitlegpt-4-0125-preview2yes0.26124.190.811.00.89
jobtitlegpt-4-0125-preview2no0.23226.820.650.520.58
jobtitlegpt-3.5-turbo-16k-061314yes0.13616.880.860.710.77
jobtitlegpt-3.5-turbo-16k-061314no0.09614.730.220.170.19
jobtitlegpt-3.5-turbo-16k-06138yes0.10110.670.870.870.87
jobtitlegpt-3.5-turbo-16k-06138no0.0499.480.560.60.58
jobtitlegpt-3.5-turbo-16k-06132yes0.07810.11.00.620.76
jobtitlegpt-3.5-turbo-16k-06132no0.0710.480.670.190.3
jobtitlegpt-3.5-turbo-110614yes0.04218.520.650.870.74
jobtitlegpt-3.5-turbo-110614no0.02717.730.580.740.65
jobtitlegpt-3.5-turbo-11068yes0.02814.650.71.00.82
jobtitlegpt-3.5-turbo-11068no0.02116.530.460.350.4
jobtitlegpt-3.5-turbo-11062yes0.02610.630.850.810.83
jobtitlegpt-3.5-turbo-11062no0.02311.460.620.240.34
jobtitlegpt-3.5-turbo-012514yes0.02118.730.750.80.77
jobtitlegpt-3.5-turbo-012514no0.01420.710.450.60.51
jobtitlegpt-3.5-turbo-01258yes0.01314.510.690.920.79
jobtitlegpt-3.5-turbo-01258no0.00814.510.580.610.59
jobtitlegpt-3.5-turbo-01252yes0.01314.520.860.90.88
jobtitlegpt-3.5-turbo-01252no0.01214.560.570.190.29
companynamegpt-4-1106-preview14yes0.43222.950.91.00.95
companynamegpt-4-1106-preview14no0.28725.030.520.610.56
companynamegpt-4-1106-preview8yes0.27722.140.951.00.97
companynamegpt-4-1106-preview8no0.19129.610.50.610.55
companynamegpt-4-1106-preview2yes0.11426.140.861.00.92
companynamegpt-4-1106-preview2no0.08519.790.520.610.56
companynamegpt-4-0125-preview14yes0.43221.870.951.00.97
companynamegpt-4-0125-preview14no0.28729.410.540.720.62
companynamegpt-4-0125-preview8yes0.27723.740.91.00.95
companynamegpt-4-0125-preview8no0.19119.740.50.610.55
companynamegpt-4-0125-preview2yes0.11419.30.520.670.59
companynamegpt-4-0125-preview2no0.08517.40.540.720.62
companynamegpt-3.5-turbo-16k-061314yes0.12912.90.890.940.92
companynamegpt-3.5-turbo-16k-061314no0.08610.310.60.670.63
companynamegpt-3.5-turbo-16k-06138yes0.08310.431.00.940.97
companynamegpt-3.5-turbo-16k-06138no0.05711.150.650.610.63
companynamegpt-3.5-turbo-16k-06132yes0.0349.380.781.00.88
companynamegpt-3.5-turbo-16k-06132no0.0259.520.560.780.65
companynamegpt-3.5-turbo-110614yes0.04313.610.751.00.86
companynamegpt-3.5-turbo-110614no0.02915.510.540.780.64
companynamegpt-3.5-turbo-11068yes0.02812.930.781.00.88
companynamegpt-3.5-turbo-11068no0.01912.040.550.670.6
companynamegpt-3.5-turbo-11062yes0.01110.590.641.00.78
companynamegpt-3.5-turbo-11062no0.00910.810.50.670.57
companynamegpt-3.5-turbo-012514yes0.02219.130.91.00.95
companynamegpt-3.5-turbo-012514no0.01414.310.570.670.62
companynamegpt-3.5-turbo-01258yes0.01412.940.781.00.88
companynamegpt-3.5-turbo-01258no0.0111.930.630.670.65
companynamegpt-3.5-turbo-01252yes0.00611.130.721.00.84
companynamegpt-3.5-turbo-01252no0.00410.280.590.940.72