File Descriptors

In UNIX/Linux everything is considered a file. Generally there are seven types of file:

Regular Files
Directories
Character Device Files
Block Device Files
Local Domain Sockets
Named Pipes(FIFO) and
Symbolic Links

…and as per the convention, Each process when created (usually from a shell running within a terminal), by default is associated with three files with File Descriptor(FD) numbers 0, 1 and 2. These three files are:

Keyboard (read-only file) or Standard Input (stdin) with FD0
Monitor Screen or Terminal (write-only file) or Standard Output(stdout) with FD1
Monitor Screen or Terminal (write-only file) or Standard Error(stderr) with FD2

Redirection01

Note: When xterm or any other teminal application runs, it first initializes itself. Before running the user’s shell process like Bash, xterm opens the terminal device(/dev/pts/N or something similar) three times. After this when bash or any other shell process(child process) is started, it inherits the three file descriptors and each command(child process) run by Bash inherits them in turn, except when your redirect the command.

You may already know, that each process is uniquely identified by Process Identification Number(PID) in a Process Table maintained by Kernel but what you must also know is that, these File Descriptors are bound to a process PID. What I mean to say is that, Kernel maintains a data structure called Process Table which stores information about all currently running processes. The process table contains Process IDs and corresponding relevant data like Memory Usage, File Descriptors numbers for opened files, Process State, Process Priority etc.

Whenever a process opens an existing file or create a new file successfully using system calls like open() or creat(), kernel returns an unsigned integer value or non-negative integer value called File Descriptor number which points to an entry in the kernel’s global file table. The file table entry contains information such as the inode of the file, byte offset, and the access restrictions for that data stream (read-only, write-only, etc.). This File Descriptor number is further passed as an argument by the process with functions like read(), write(), close() etc. to perform different operations on the file. FDs are allocated in the sequential order i.e. the lowest possible unallocated integer value is used first. If a file open operation fails, kernel returns -1 as FD value. When a file is closed, FD gets freed and is available for further allotment.

FD01

In layman terms, Whenever a process opens a file, Operating System (or Kernel) creates an entry to represent that file in the Process Table parallel to that process PID. This entry is a non-negative integer value (i.e. 0, 1, 2 onwards) and is called file descriptor number. This integer number uniquely represents an opened file for the process. If a process opens 10 files, then the Process Table will have 10 entries for file descriptors field of that process.

As per the man open page:

The open() system call opens the file specified by pathname. If the specified file does not exist, it may optionally (if O_CREAT is specified in flags) be created by open().

The return value of open() is a file descriptor, a small, nonnegative integer that is used in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.) to refer to the open file. The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.

open(), openat(), and creat() return the new file descriptor, or -1 if an error occurred (in which case, errno is set appropriately).

Because computers are good with numbers whereas us human beings are good with names. We refer to each file with their name, whereas a process refer to each opened files with corresponding file descriptor number. Be it domain names or users and groups in Linux Systems, everything get mapped to a number. Domain names get mapped to IP Addresses, Users and Groups gets mapped to UIDs and GIDs in Linux.

Similarly, when a network socket is opened successfully creating an endpoint for communication using socket() system call, a file descriptor is returned which refers to that endpoint. On error, FD -1 is returned.

Let’s do a small practical:

Open a file using vim editor in one terminal:
$ vim newFile
Fetch the PID of process "vim" in second terminal:
$ ps -ef | grep vim or
$ pgrep -f "vim newFile"
Change directory to /proc/PID/fd where PID is value fetched in second step.
$ cd /proc/$(pgrep -f "vim newFile")/fd or
$ cd /proc/PID/fd
List the contents of directory: $ ls -l

FD02

As evident from the image above, process vim has total 4 files opened, three of them are the default ones, i.e. stdin(FD0), stdout(FD1), stderr(FD2) and the fourth one points to the newFile(FD3)

Now list open files by process vim:
$ lsof -p $(pgrep -f vim newFile) | less or
$ lsof -p PID | less

FD03

As evident from the image above, column FD list File Descriptors and column TYPE list File Type opened by process vim.

Some of the values for the FD are:
cwd – current working directory
rtd – root directory
mem – memory-mapped file
txt – program text(code and data)

FD04

But as evident from the image above, the actual FD Numbers are listed down below. You may find that the FD number is followed by one of the following characters, describing the mode under which the file is open:
r for read access mode
w for write access mode
u for read and write access mode

The default FD(0, 1, 2) points to file type CHR i.e. Character Device File, because the pseudo terminal /dev/tty/0 is a Character Device File. You may think, how am I saying that it is a character device file for sure. Well for that execute following command:
$ ls -l /dev/pts/0
‘c’ in crw–w—- represent Character Device File.

FD(3) points to file type REG, because the file opened (newFile) is a regular text file.

Note: (See the image below)For listing only the relevant file descriptors, you may choose to execute following command: $ lsof -a -p PID -d0,1,2,3 where
-a option is used to AND the selection
-p select process PID
-d specifies a list of file descriptors

FD05

So, in Linux/Unix Systems, there are two ways to list File Descriptors corresponding to a process:

$ ls -l /proc/PID/fd
$ lsof -p PID

Let’s do another small practical:

In this we’ll see that, command lsof can also be used to figure out the process responsible for keeping your removable devices (USB Flash Drives) busy. We already know that Linux consider devices as files, therefore we have to figure out what process is keeping your device file opened(or busy).

Let’s take an example with a usb stick that is mounted as /mnt

First open a terminal with root privileges and get inside your usb stick:
~# cd /mnt
Now create a text file within your usb stick, say pdFile:
/mnt# touch pdFile
Open the file using a vim text editor:
/mnt# vim pdFile
Now open another terminal try to unmount your usb stick by executing command shown below:

$ sudo umount /mnt
umount: /mnt: target is busy.

While unmounting the device, if you find the message shown above, that means your device is being used by some process (or your device file is being kept opened by some process)

Now to find the process responsible for making your device busy, execute the following command:

$lsof | grep "/mnt"
COMMAND   PID                           USER  FD        TYPE              DEVICE    SIZE/OFF   NODE NAME
bash      2850                          root  cwd       DIR               8,16      4096          1 /mnt
vim       3355                          root  cwd       DIR               8,16      4096          1 /mnt
vim       3355                          root   6u       REG               8,16     12288        181 /mnt/.pdFile.swp

Output shown above implies that bash shell with PID 2850 and editor vim with PID 3355 are the processes keeping your device busy. By simply closing or killing these process, you can free the File Descriptor(s) corresponding to your usb storage device, thereby freeing your device.

For understanding the application/usage of file descriptors move on to the topic:
I/O Redirection – process of capturing output from a file, command, program, script, or even code block within a script and sending that output as input to another file, command, program, or script with the help of file descriptors and redirection operators.