willsonlincake 发表于 2022-4-7 20:52:48

PyTesseract详解

pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'
设置Tesseract的绝对路径
print(pytesseract.get_languages(config=''))
获取Tesseract已安装的语言包
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
获取法语识别结果,lang可以是任意支持语言
get_tesseract_version()获取Tesseract版本

willsonlincake 发表于 2022-4-7 20:54:06

以上lang如果是多语言也可以lang='eng+fra'

willsonlincake 发表于 2022-4-7 20:55:35

image_to_alto_xml()返回值记录在Tesseract的Alto XML格式文件中

willsonlincake 发表于 2022-4-7 20:57:18

支持OpenCV格式
import cv2

img_cv = cv2.imread(r'/<path_to_image>/digits.png')

# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
img_rgb = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
print(pytesseract.image_to_string(img_rgb))
页: [1]
查看完整版本: PyTesseract详解