Support Our Site

To ensure we can continue delivering content and maintaining a free platform for all users, we kindly request that you disable your adblocker. Your contribution greatly supports our site's growth and development.

Python Script To Convert A Docx File To Text File

1 min read

In this article, we will learn how to convert a docx file into plain text and then save the content to txt file.

For the conversion, we are going to use a third party package named docx2txt

This tool attempts to generate equivalent plain text files from Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII or utf-8) text experience.

It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to a fair extent. It can very conveniently be used to build a Web based docx document conversion service.

So first install the package on your machine using pip or any package manager.

pip install docx2txt

Convert A Docx File To Text

import docx2txt

# replace following line with location of your .docx file

MY_TEXT = docx2txt.process("test.docx")
print(MY_TEXT)

Note that test.docx is test file on my desktop having just one line inside for the sake of this tutorial.

Running this script will simply print the content of the file in the terminal.

Output

A line from my awesome docx file

Convert A Docx File To Text File

import docx2txt

# replace following line with location of your .docx file
MY_TEXT = docx2txt.process("test.docx")
with open("Output.txt", "w") as text_file:
    print(MY_TEXT, file=text_file)

This script will convert the docx file's content into text and then write on a file named Output.txt using Python's context manager.


PROGRAMS

Latest Articles

Latest from djangocentral

How to Use Subquery() in Django With Practical Examples

In the realm of web development, Django stands as a powerful and versatile framework for building robust applications. One of the key aspects of developing efficient and optimized web applications is handling database queries effectively. In this article…
Read more →

4 min read

DRF Serializer: Handling OrderedDict and Converting It to a Dictionary or JSON

In Django Rest Framework (DRF) tests, when you access serializer.data, you might encounter an OrderedDict instead of a regular dictionary. This behavior is intentional and reflects the design of DRF's serialization process.Understanding the Problem The u…
Read more →

3 min read

Django Rest Framework CheetSheet: Mastering API Development

Django Rest Framework (DRF) is a powerful toolkit that makes building robust and scalable web APIs with Django a breeze. Whether you're a seasoned Django developer or a newcomer, having a comprehensive cheat sheet at your disposal can be a game-changer. …
Read more →

5 min read

How to Perform NOT Queries in Django ORM

In Django, performing NOT queries allows you to exclude certain records from the query results based on specific conditions. The NOT operator, represented by the tilde (~) when used in conjunction with the Django ORM's Q object, helps you construct compl…
Read more →

3 min read