# Distribution

The Distribution acts as a sub-resource to Features, and are used define how data is distributed across multiple examples when test data are generated.

Each Distribution is a JSON object with an attribute type defining the distribution family, and distribution-specific attributes which are outlined below.

# Gaussian

Sample from a Gaussian/normal distribution

{"type": "GAUSSIAN", "mu": 10, "sigma": 5}

# Attributes

  • mu: Mean of distribution
  • sigma: Standard deviation of distribution

# Mixture

Sample from a mixture of multiple Gaussian/normal distributions

{
  "type": "MIXTURE",
  "components": [
    {"mu": 1, "sigma": 3, "weight": 1},
    {"mu": 8, "sigma": 1.2, "weight": 2},
    {"mu": 15, "sigma": 5, "weight": 1}
  ]
}

# Attributes

  • components: List of JSON objects definiing each mixture component
    • mu: Mean of distribution
    • sigma: Standard deviation of distribution
    • weight: Weight factor determining how many rows to generate for this mixture component

# Uniform

Sample from a uniform distribution (i.e all values have equal probability)

{"type": "UNIFORM", "min": 5, "max": 10}

# Attributes

  • min: Minimum value inclusive
  • max: Maximum value inclusive

# Constant

Return a constant value for every case

{"type": "CONSTANT", "value": 5}

# Categorical

Sample from a set of discrete values, each with an associated probability.

{
  "type": "CATEGORICAL",
  "categories": [
    {"value": "True", "probability": 0.7},
    {"value": "False", "probability": 0.3}
  ]
}

# Attributes

  • categories: List of JSON objects definiing categories
    • value: Value returned
    • probability: Probability [0, 1] of value being returned

Note that the sum of all probabilities assigned to each category should sum to 1, otherwise the distribution will be invalid