Abstract: The paper presents an analysis of modern methods and tools for extracting text from documents in docx, pptx, and pdf formats, as well as images with text that require the use of OCR ...