If you have been programming in Python, you may have encountered the need to list the files in a directory or to iterate over the files in a directory. This can be confusing for beginners, as there are many different methods discussed online. If you are unsure which library to use, you have come to the right place. By following this tutorial to the end, you will have a clear understanding of why each library is used.
Modules to Work with Files
There are multiple ways to list files in a directory. We will discuss two of the main modules:
- OS module,
- Glob module.
In the code examples below, we are going to work with the following directory.
List Files Using the OS Module
There are 3 methods in the os module named:
- os.listdir(),
- os.walk(),
- os.scandir().
1. os.listdir():
The os.listdir() method gives a list of all the files and folders in the specified directory, and by default, the directory is the current directory. This module doesn’t walk you through the subfolders. So if you are looking to go into the subfolder/subdirectories of the current directory, then this will not help you.
Syntax
os.listdir(path)
where:
- path – is the path of the directory.
Return type
It returns a list of all files and directories in the directory given in the path. A list, in simple words, is an array. We call array a list in Python.
Example:
from os import listdir mypath = r"C:\Users\Misbah Shoukat\OneDrive\Desktop" #if you want files and folders allelements = [f for f in listdir(mypath)] print("Files and directories in path: ") print(allelements) #if you want files only no folder onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))] print("\nOnly files in a directory: ") print(onlyfiles) #to get the files ending with specific extension e.g. .docx ls=[] for x in listdir(mypath): if x.endswith(".docx"): ls.append(x) print("\nOnly files ending with .docx") print(ls)
Output:
Files and directories in path: ['desktop.ini', 'Doc1.pdf', 'Fjwu id and password.pdf', 'imag3.PNG', 'Internship letter.jpg', 'Internship letterbw.jpg', 'internship report.pdf', 'internship.docx', 'journeyy.pdf', 'scientific editor of thesis.PNG', 'setup', 'skype id pass.txt'] Only files in a directory: ['desktop.ini', 'Doc1.pdf', 'Fjwu id and password.pdf', 'imag3.PNG', 'Internship letter.jpg', 'Internship letterbw.jpg', 'internship report.pdf', 'internship.docx', 'journeyy.pdf', 'scientific editor of thesis.PNG', 'skype id pass.txt'] Only files ending with .docx ['internship.docx']
2. os.walk():
The os.walk()
function in Python is used to generate a list of files in a directory tree by recursively traversing all subdirectories. When this function is called, it returns a generator that yields a tuple containing the current path being visited, a list of subdirectories in the current path, and a list of files in the current path. This generator continues to loop through all of the directories in the tree until there are no more subdirectories to visit.
Syntax
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
Refer to a os.walk() parameters page for official documentation.
Return Type
Returns a tuple.
Example:
from os import walk mypath = r"C:\Users\Misbah Shoukat\OneDrive\Desktop" #this command return the tuple i.e. current_path, directories in current_path, files in current_path tuple_walk = next(walk(mypath)) print("The tuple that os.walk returns") print(tuple_walk) #if you only want the files of current directories plus the subdirectories f = [] for (dirpath, dirnames, filenames) in walk(mypath): f.extend(filenames) print("\nAll the files including the files of subdirectory") print(f)
Output:
The tuple that os.walk returns ('C:\\Users\\Misbah Shoukat\\OneDrive\\Desktop', ['setup'], ['desktop.ini', 'Doc1.pdf', 'Fjwu id and password.pdf', 'imag3.PNG', 'Internship letter.jpg', 'Internship letterbw.jpg', 'internship report.pdf', 'internship.docx', 'journeyy.pdf', 'scientific editor of thesis.PNG', 'skype id pass.txt']) All the files including the files of subdirectory ['desktop.ini', 'Doc1.pdf', 'Fjwu id and password.pdf', 'imag3.PNG', 'Internship letter.jpg', 'Internship letterbw.jpg', 'internship report.pdf', 'internship.docx', 'journeyy.pdf', 'scientific editor of thesis.PNG', 'skype id pass.txt', 'audacity-win-3.1.3-64bit.exe', 'BraveBrowserSetup.exe', 'ChromeSetup.exe', 'GrammarlyAddInSetup.exe', 'GrammarlyInstaller.cdpERIb6vsng6thmp2a90og2.exe', 'LSBSetup.exe', 'MBSetup-8D6037DB-37335.37335.exe', 'n1fww12w.exe', 'OfficeSetup.exe', 'ProtonVPN_win_v2.0.1.exe', 'telegram-for-desktop-4-0-2.exe', 'UpworkSetup64 (1).exe', 'UpworkSetup64.exe', 'VSCodeUserSetup-x64-1.68.1.exe', 'WacomTablet_6.3.45-1.exe', 'ZoomInstallerFull.exe']
All the .exe files were in the folder named setup.
3. os.scandir():
Python 3.5 and later versions support the os.scandir() method, which can greatly increase the speed of os.walk() by 2-20 times, depending on the operating system and file system. It is recommended to use os.scandir() if you have large amounts of data to process.
Syntax
os.scandir(path = ‘.’)
Return Type
Returns an iterator of os.DirEntry object.
Example:
from os import scandir mypath = r"C:\Users\Misbah Shoukat\OneDrive\Desktop" obj = scandir(mypath) # List all files and directories in the specified path ls=[] for entry in obj: if entry.is_dir() or entry.is_file(): ls.append(entry.name) print("Files and Directories: ") print(ls)
Output:
Files and Directories: ['desktop.ini', 'Doc1.pdf', 'Fjwu id and password.pdf', 'imag3.PNG', 'Internship letter.jpg', 'Internship letterbw.jpg', 'internship report.pdf', 'internship.docx', 'journeyy.pdf', 'scientific editor of thesis.PNG', 'setup', 'skype id pass.txt']
List Files Using Glob Module
The glob module is used to retrieve files/path names matching a specified pattern. With glob, we can use wild cards (“*, ?, [ranges]). So if you are in need to retrieve only file names having a specific pattern in them, then glob is the library to use.
With glob, you can specify any pattern of the names and the type. For example, you want to pick only those files with the word “pass” in them. You will specify the name of the file as *pass*.*. It will pick up all the files, as we will see, and if you want the files to have the word “pass” in them and also you want it to be a “.pdf” file, then you can access it by “*pass*.pdf”. It’s an interesting library for those who deal with directories and files. If you are the one, then do play around with this library.
Example:
from glob import glob print("Files having word pass in their names: ") ls = glob(path+"\*pass*.*") print(ls) print("\nOnly pdf files having word pass in their names: ") ls1 = glob(path+"\*pass*.pdf") print(ls1)
Output:
Files having word pass in their names: ['C:\\Users\\Misbah Shoukat\\OneDrive\\Desktop\\Fjwu id and password.pdf', 'C:\\Users\\Misbah Shoukat\\OneDrive\\Desktop\\skype id pass.txt'] Only pdf files having word pass in their names: ['C:\\Users\\Misbah Shoukat\\OneDrive\\Desktop\\Fjwu id and password.pdf']
I hope this tutorial was helpful to you. If you have any questions, please leave them in the comments section. Please let me know if you would like a more detailed tutorial on each of these libraries, and I would be happy to assist. Please also consider visiting the Python tutorials page, where we regularly post content for both beginner and advanced developers. You are sure to find something of interest.