About DataHub Properties
About DataHub Properties
DataHub Custom Properties and Structured Properties are powerful tools to collect meaningful metadata for Assets that might not perfectly fit into other Aspects within DataHub, such as Glossary Terms, Tags, etc. Both types can be found in an Asset's Properties tab:
This guide will explain the differences and use cases of each property type.
What are Custom Properties and Structured Properties?
Here are the differences between the two property types at a glance:
Custom Properties | Structured Properties |
---|---|
Map of key-value pairs stored as strings | Validated namespaces and data types |
Added to assets during ingestion and via API | Defined via YAML; created and added to assets via CLI |
No support for UI-based Edits | Support for UI-based edits |
Custom Properties are key-value pairs of strings that capture additional information about assets that is not readily available in standard metadata fields. Custom Properties can be added to assets automatically during ingestion or programmatically via API and cannot be edited via the UI.
Example of Custom Properties assigned to a Dataset
Structured Properties are an extension of Custom Properties, providing a structured and validated way to attach metadata to DataHub Assets. Available as of v0.13.1, Structured Properties have a pre-defined type (Date, Integer, URN, String, etc.). They can be configured to only accept a specific set of allowed values, making it easier to ensure high levels of data quality and consistency. Structured Properties are defined via YAML, added to assets via CLI, and can be edited via the UI.
Example of Structured Properties assigned to a Dataset
Use Cases for Custom Properties and Structured Properties
Custom Properties are useful for capturing raw metadata from source systems during ingestion or programmatically via API. Some examples include:
- GitHub file location of code which generated a dataset
- Data encoding type
- Account ID, cluster size, and region where a dataset is stored
Structured Properties are useful for setting and enforcing standards of metadata collection, particularly in support of compliance and governance initiatives. Values can be added programmatically via API, then manually via the DataHub UI as necessary. Some examples include:
- Deprecation Date
- Type: Date, Single Select
- Validation: Must be formatted as 'YYYY-MM-DD'
- Data Retention Period
- Type: String, Single Select
- Validation: Adheres to allowed values "30 Days", "90 Days", "365 Days", or "Indefinite"
- Consulted Compliance Officer, chosen from a list of DataHub users
- Type: DataHub User, Multi-Select
- Validation: Must be valid DataHub User URN
By using Structured Properties, compliance and governance officers can ensure consistency in data collection across assets.
Creating, Assigning, and Editing Structured Properties
Structured Properties are defined via YAML, then created and assigned to DataHub Assets via the DataHub CLI.
Here's how we would define the above examples in YAML:
- Deprecation Date
- Data Retention Period
- Consulted Compliance Officer(s)
- id: deprecation_date
qualified_name: deprecation_date
type: date # Supported types: date, string, number, urn, rich_text
cardinality: SINGLE # Supported options: SINGLE, MULTIPLE
display_name: Deprecation Date
description: "Scheduled date when resource will be deprecated in the source system"
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
- id: retention_period
qualified_name: retention_period
type: string # Supported types: date, string, number, urn, rich_text
cardinality: SINGLE # Supported options: SINGLE, MULTIPLE
display_name: Data Retention Period
description: "Predetermined storage duration before being deleted or archived
based on legal, regulatory, or organizational requirements"
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
allowed_values:
- value: "30 Days"
description: "Use this for datasets that are ephemeral and contain PII"
- value: "90 Days"
description: "Use this for datasets that drive monthly reporting but contain PII"
- value: "365 Days"
description: "Use this for non-sensitive data that can be retained for longer"
- value: "Indefinite"
description: "Use this for non-sensitive data that can be retained indefinitely"
- id: compliance_officer
qualified_name: compliance_officer
type: urn # Supported types: date, string, number, urn, rich_text
cardinality: MULTIPLE # Supported options: SINGLE, MULTIPLE
display_name: Consulted Compliance Officer(s)
description: "Member(s) of the Compliance Team consulted/informed during audit"
type_qualifier: # Define the type of Asset URNs to allow
- corpuser
- corpGroup
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
To learn more about creating and assigning Structured Properties via CLI, please see the Create Structured Properties tutorial.
Once a Structured Property is assigned to an Asset, Users with the Edit Properties
Metadata Privilege will be able to change Structured Property values via the DataHub UI.
Example of editing the value of a Structured Property via the UI
Videos
Deep Dive: UI-Editable Properties
API
Please see the following API guides related to Custom and Structured Properties:
FAQ and Troubleshooting
Why can't I edit the value of a Structured Property from the DataHub UI?
- Your version of DataHub does not support UI-based edits of Structured Properties. Confirm you are running DataHub v0.13.1 or later.
- You are attempting to edit a Custom Property, not a Structured Property. Confirm you are trying to edit a Structured Property, which will have an "Edit" button visible. Please note that Custom Properties are not eligible for UI-based edits to minimize overwrites during recurring ingestion.
- You do not have the necessary privileges. Confirm with your Admin that you have the
Edit Properties
Metadata Privilege.