PlyDB CLI skill
The plydb CLI can be used to query across heterogenous data sources.
Dependencies
The plydb binary must be available on the system.
If it is not, installation instructions can be found
here
Instructions
Configure data sources
First, the data sources to make available to PlyDB must be configured in a
config file as per the specification in references\config_schema.md.
Query with SQL
Once you have a data source config file, PlyDB can query across all of the
configured data sources. Use fully qualified table names: catalog.schema.table.
CODEBLOCK0
Fetching semantic context of the data
To provide context to understand the domain and write correct SQL - PlyDB can
build and provide semantic context from database COMMENT metadata alongside
column types and foreign keys as structured YAML that follows the
Open Semantic Interchange (OSI)
specification.
CODEBLOCK1
Enriching auto-scanned context with overlays
When the database lacks comments or you need to add relationships and metrics
not captured from source metadata, use --semantic-context-overlay to supply
one or more OSI YAML files that are merged on top of the auto-scanned model:
CODEBLOCK2
The flag is repeatable; overlays are applied in the order given:
CODEBLOCK3
Overlay files must be valid
Open Semantic Interchange (OSI)
YAML.
Overlays can add descriptions to existing datasets and fields, define
relationships between existing datasets, and add or update metrics. They cannot
introduce new datasets or fields - only enrich what was already discovered by
the auto-scanner.
Good opportunities to create or edit an overlay file are when encountering a new
dataset or after a session of data analysis with the user. These are great
opportunities to distill your learnings about the data's semantics and record
them into an overlay file for future sessions. Ask the user first.
Embedding overlays in the config file
Overlays can also be specified in the config file under
semantic_context.overlays instead of (or in addition to) the CLI flag:
CODEBLOCK4
With overlays in the config, no extra flags are needed:
CODEBLOCK5
Config-file overlays are applied before any --semantic-context-overlay flags.
Troubleshooting
PlyDB CLI 技能
plydb CLI 可用于跨异构数据源进行查询。
依赖项
系统中必须存在 plydb 二进制文件。
如果不存在,可在此处找到安装说明:
此处
使用说明
配置数据源
首先,必须按照 references\config_schema.md 中的规范,在配置文件中配置可供 PlyDB 使用的数据源。
使用 SQL 查询
拥有数据源配置文件后,PlyDB 可跨所有已配置的数据源进行查询。请使用完全限定的表名:catalog.schema.table。
sh
plydb query \
--config path/to/config/file/config.json \
SELECT * FROM customers.default.customers c
JOIN orders.default.orders o
ON c.id = o.customer_id
获取数据的语义上下文
为了提供理解领域和编写正确 SQL 所需的上下文,PlyDB 可以构建并提供来自数据库 COMMENT 元数据的语义上下文,同时附带列类型和外键,以遵循 开放语义交换 (OSI) 规范的结构化 YAML 格式呈现。
sh
plydb semantic-context --config path/to/config/file/config.json
使用覆盖层丰富自动扫描的上下文
当数据库缺少注释,或者需要添加未从源元数据中捕获的关系和指标时,请使用 --semantic-context-overlay 提供一个或多个 OSI YAML 文件,这些文件将合并到自动扫描模型之上:
sh
plydb semantic-context \
--config path/to/config/file/config.json \
--semantic-context-overlay path/to/overlay.yaml
该标志可重复使用;覆盖层按给定的顺序应用:
sh
plydb semantic-context \
--config path/to/config/file/config.json \
--semantic-context-overlay base_overlay.yaml \
--semantic-context-overlay team_overlay.yaml
覆盖层文件必须是有效的 开放语义交换 (OSI) YAML 格式。
覆盖层可以为现有数据集和字段添加描述,定义现有数据集之间的关系,以及添加或更新指标。它们不能引入新的数据集或字段——只能丰富自动扫描器已发现的内容。
创建或编辑覆盖层文件的最佳时机是在遇到新数据集时,或在与用户进行数据分析会话之后。这些是提炼您对数据语义的理解并将其记录到覆盖层文件中以供将来会话使用的绝佳机会。请先询问用户。
在配置文件中嵌入覆盖层
覆盖层也可以在配置文件的 semantic_context.overlays 下指定,以替代(或补充)CLI 标志:
json
{
databases: { ... },
semantic_context: {
overlays: [
path/to/base_overlay.yaml,
path/to/team_overlay.yaml
]
}
}
配置文件中包含覆盖层后,无需额外标志:
sh
plydb semantic-context --config path/to/config.json
配置文件中的覆盖层会在任何 --semantic-context-overlay 标志之前应用。
故障排除