file extension (file format)
What is a file extension (file format)?
In computing, a file extension is a suffix added to the name of a file to indicate the file's layout, in terms of how the data within the file is organized. A file's data must be organized in the correct format to ensure that it can be accessed by the software program associated with the specific file type. File extensions also provide users with quick insight into the types of files they're working with.
A file extension comes after the period in a filename and is typically made up of three or four alphanumeric characters that identify the file's format. For example, in a file named testfile1.txt, the extension is txt, which indicates that the underlying file is a plain text document. However, in a file named testfile2.jpeg, the extension is jpeg, which indicates that this is a graphic file that conforms to the Joint Photographic Experts Group (JPEG) format.
A filename can include multiple periods, as in testfile.3.2.csv. In most cases, the extension includes only the characters after the final period. There are exceptions, however, such as the extension tar.gz, which is used for a certain type of compressed archive file. Sometimes, a file might appear to have a two-part extension, as in testfile4.xlsx.exe, but this is often a ploy used by hackers to send what appears to be a legitimate file that is actually an executable file whose purpose is to damage or infiltrate a system.
A file extension can be as short as one or two characters, or it can be much longer than average, such as the .catproduct extension. Whatever the extension, an operating system must be able to recognize it in order to associate it with the correct program. If the OS cannot determine the correct program, the user must specify which one to use.
Operating systems and file extensions
An operating system might rely solely on the file extension to determine which application to use, or it might also rely on file metadata. Each OS varies in terms of how it uses extensions when matching files to applications and the degree to which it uses them. Windows, for example, relies heavily on file extensions and cannot open files without them. Linux relies on extensions when they're available, but it can also use the Multipurpose Internet Mail Extensions (MIME) identifier that is associated with each file.
MIME provides a system for identifying different file formats so the files can be exchanged across the internet and opened on different systems. For instance, when a web browser accesses a document, it can tell from the MIME type how to display that document even if the file was created by an application running on another OS.
The MIME identifiers make it possible for Linux to open a file in the appropriate application even if the filename lacks an extension. For example, the MIME identifier for a text document is text/plain. If Linux comes across this identifier in a file without an extension, the OS knows to open the file in the default text editor. If an extension is provided, however, Linux will use that when determining which application to use, rather than the MIME type.
The macOS operating system takes a similar approach to Linux but adds another layer: the Uniform Type Identifiers (UTI) framework. The UTI provides a system for uniquely identifying each file type and mapping them to MIME identifiers. The UTI also helps to address issues that come with handling files created under legacy file-tagging systems. Like Linux, macOS still relies on file extensions to a certain degree, but not to the extent that Windows relies on them, which means that macOS can also open files without extensions.
Regardless of how an OS handles file extensions, the extensions themselves do nothing more than indicate what a file's underlying format is supposed to be. An extension does not guarantee a file's actual format, nor does changing the extension affect that format. If the name of a PDF file contains a .pdf extension, the OS will open that file in the default PDF viewer. If the file's extension is then changed to .txt, the OS will instead try to open the file in a text editor. Even if it succeeds, however, most of the file's content will be displayed as gibberish.
Types of file extensions
The world of computing is full of file extensions, too many to list in a single article. Each one attempts to telegraph the format of the underling file so the OS knows how to handle that file. Here is just a small sampling of some of the more common file extensions:
- Text and word processing files. doc, docx, odt, pages, rtf, txt, wpd, wps.
- Spreadsheet files. csv, numbers, ods, xls, xlsx.
- Web-related files. asp, aspx, css, htm, html, jsp, php, xml.
- Image files. bmp, gif, ico, jpeg, jpg, png, raw, tif, tiff.
- Audio and video files. aif, mov, mp3, mp4, mpg, wav, wma, wmv.
- Draw program files. afdesign, ai, cad, cdr, drw, dwg, eps, odg, svg, vsdx.
- Page layout files. afpub, indd, pdf, pdfxml, pmd, pub, qxp.
- Programming files. c, cpp, cs, java, js, json, py, sql, swift, vb.
- Compression and archive files. 7z, rar, tar, tar.gz, zip.
- System files. bak, cfg, conf, ini, msi, sys, tmp.
- Executable program files. app, bat, bin, cmd, com, exe, vbs, x86.
There are thousands of other file extensions as well. They're used for databases, vector images, disk images, presentation software, email programs, virtual environments, file encoding, GPS software and a variety of other purposes. FileInfo.com maintains a searchable database that contains over 10,000 file extensions. Developers can register their file extensions on this site if they're building applications that require unique file formats.
There are also thousands of software programs, so it's not surprising that some file extensions are associated with multiple file formats and applications. For instance, the .prf extension might be used for Microsoft Outlook, Windows system files, QuarkXPress, Apple ClarisWorks, IBM FileNet eForms or another type of software.