Python Script To Convert A Docx File To Text File

1 min read

In this article, we will learn how to convert a docx file into plain text and then save the content to txt file.

For the conversion, we are going to use a third party package named docx2txt

This tool attempts to generate equivalent plain text files from Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII or utf-8) text experience.

It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to a fair extent. It can very conveniently be used to build a Web based docx document conversion service.

So first install the package on your machine using pip or any package manager.

pip install docx2txt

Convert A Docx File To Text

import docx2txt

# replace following line with location of your .docx file
MY_TEXT = docx2txt.process("test.docx")

print(MY_TEXT)

Note that test.docx is test file on my desktop having just one line inside for the sake of this tutorial.

Running this script will simply print the content of the file in the terminal.

Output

A line from my awesome docx file

Convert A Docx File To Text File

import docx2txt

# replace following line with location of your .docx file
MY_TEXT = docx2txt.process("test.docx")


with open("Output.txt", "w") as text_file:
    print(MY_TEXT, file=text_file)

This script will convert the docx file's content into text and then write on a file named Output.txt using Python's context manager.


PROGRAMS
author's image
Abhijeet Pal Author and Editor in Chief @djangocentral

Abhijeet is a full-stack software developer from India with a strong focus on backend and system design. He is driven by the need to create impactful solutions that add value to the internet in any way possible.

LinkedIn Twitter Github

Latest Articles

Latest from djangocentral

Django 4.1 adds async-compatible interface to QuerySet

The much-awaited pull request for an async-compatible interface to Queryset just got merged into the main branch of Django.Pull Request - https://github.com/django/django/pull/14843 The Django core team has been progressively adding async suppor…
Read more →

3 min read

Making Django Admin Jazzy With django-jazzmin

Django admin is undoubtedly one of the most useful apps of Django. Over the years there has been very little change in the admin app as far as the UX is concerned and it's not a bad thing at all. Django admin was designed to provide a simple and minimali…
Read more →

4 min read