Reading all Files in a Directory with Python
In this Python tutorial you will learn about reading all files in a directory using Python.
If you happen to have a lot of files (e.g., .txt files) it often useful to be able to read all files in a directory into Python. In this post, you will learn 1) to list all the files in a directory with Python, and 2) to read all the files in the directory to a list or a dictionary. Finally, you will also learn how to read all the .cs vfiles in a directory with Python and Pandas read_csv method.
How to Read all Files in a Folder with the Pathlib module
There are generally, two steps for reading all files in a directory. First, we need to list all files in the directory:
1. List all Files in the Directory
To get all files in a directory we can use pathlib:
from pathlib import Path
txt_folder = Path('C:/PyDad/Reading/TXT/').rglob('*.txt')
files = [x for x in txt_folder]
Now, there are more methods, that I am going to cover. However, using the Pathlib module makes things much easier. Especially, you’re working with Paths across operating systems. See this excellent post about why you should use Pathlib, for more information.
2. Reading the Files in the Directory
To read all the files in the directory you can, now, use a for loop together with the open method, and the readlines method.
for name in files:
f = open(name, 'r')
content = f.readlines()
print(f'Content of %s:\n %s' %(name, content))
f.close()
Let me explain, here you are looping through each file in the lilst (i.e., files), you are then opening the file with open, and reading the file with readlnes. Then, on the next line, the code print the content of the file. Finally, you need to close the file using the close method. Just printing the results, like we did above, is not convenient if you plan to use the content of all the text files you have read with Python. You will learn how to read all files to a list, in the last section of this blog post.
Reading all Files to a Python List
Here is how to read all the files to a list using Python:
content = []
for name in files:
f = open(name, 'r')
content.append(f.readlines()[0])
f.close()
Note, how you first created a Python list and, then, you used the append method to add the content, form each file, to the list.
Reading all Files to a Python Dictionary
Here’s how read all the files in a directory to a Python dictionary:
content = {}
for name in files:
f = open(name, 'r')
name = name.stem
content[name] = f.readlines()[0]
f.close()
Now, let me explain what we did in the code chunk above. First, a dictionary was created. Second, you have used the same code, as in the above reading all files in a directory with Python examples. Now, in there are two lines that differ. You get the filename without the extension (or the path) by using the stem method. Finally, before closing the file, you read the lines to the dictionary. Here the file name (without the file extension) is the key.
Reading All .csv Files in a Directory using Pandas
In this final example, you will learn how to read all .csv files in a folder using Python and the Pandas package. Here’s how to read all the CSV files in a directory with Python and Pandas read_csv:
import pandas as pd
col_names = ['word1', 'word2', 'word3', 'word4',
'word5', 'word6', 'word7', 'word8']
dfs = [pd.read_csv(csv_file, names=col_names) for csv_file in files]
First, you imported pandas. Next, you created a list with column names (only do this IF your .csv files does not contain this information). Finally, using Python list comprehension you read all the files using pd.read_csv. Note, that you get a list, again, containing all the data from the csv files. You can access data, from each file, using list indices (e.g., dfs[0] will get you the first item in the list).
Summary
In this post, you have learned about reading all the files in a folder with Python. Spefically, you learned how to read, and print, all files; how to add the content of the files to a list and a dictionary. Finally, you have learned about reading all the .csv files in a directory with Pandas, as well.
As a final note: it’s also possible to use the glob method, to read all files in a folder in Python.