Document Extraction
This skill allows users to extract and recognize text from documents, including PDF and DOCX files, using an external GITEE AI API.
Usage
Ensure you have installed the required dependencies (pip install requests requests-toolbelt). Use the bundled script to perform document extraction.
CODEBLOCK0
Options
No additional parameters are required for this skill.
Workflow
- 1. Execute the performdocextraction.py script with the parameters from the user.
- Parse the script output and find the line starting with
EXTRACTION_RESULT:. - Extract the OCR result from that line (format:
EXTRACTION_RESULT: ...). - Display the OCR result to the user using markdown syntax:
📖[EXTRACTION_RESULT Result].
Notes
- - If GITEEAIAPIKEY is none, you should remind user to provide --api-key argument
- Please handle the output of the script carefully, ensuring that you only extract and display the relevant information without adding any extra commentary or interpretation.
- You should optimize the output format to make it more concise and user-friendly, but do not change or ignore the content of the result.
- The script prints
EXTRACTION_RESULT: in the output - extract this result and display it using markdown image syntax:📖[EXTRACTION_RESULT Result]. - Always look for the line starting with
EXTRACTION_RESULT: in the script output.
文档提取
该技能允许用户使用外部GITEE AI API从文档(包括PDF和DOCX文件)中提取并识别文本。
使用方法
确保已安装所需依赖项(pip install requests requests-toolbelt)。使用捆绑脚本执行文档提取。
bash
python {baseDir}/scripts/performdocextraction.py --file /path/to/document.pdf --api-key YOUR_API
选项
该技能无需额外参数。
工作流程
- 1. 使用用户提供的参数执行performdocextraction.py脚本。
- 解析脚本输出,找到以EXTRACTIONRESULT:开头的行。
- 从该行中提取OCR结果(格式:EXTRACTIONRESULT: ...)。
- 使用Markdown语法向用户显示OCR结果:📖[EXTRACTION_RESULT Result]。
注意事项
- - 如果GITEEAIAPIKEY为空,应提醒用户提供--api-key参数。
- 请谨慎处理脚本输出,确保仅提取并显示相关信息,不添加任何额外评论或解释。
- 应优化输出格式,使其更简洁易读,但不得更改或忽略结果内容。
- 脚本输出中包含EXTRACTIONRESULT: - 提取该结果并使用Markdown图片语法显示:📖[EXTRACTIONRESULT Result]。
- 始终在脚本输出中查找以EXTRACTION_RESULT:开头的行。