What's the REAL difference between .h and .cpp files?

2020-03-26 c++

This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?

Answers

Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.

Let's take your example of defining functions or variables. Suppose your header file contains the following line:

header.h:

int x = 10;

code.cpp:

#include "header.h"

Now, if you only have one code file and one header file this probably works just fine:

g++ code.cpp -o outputFile

However, if you have two code files this breaks:

header.h:

int x = 10;

code1.cpp:

#include "header.h"

code2.cpp:

#include "header.h"

And then:

g++ code1.cpp -c  (produces code1.o)
g++ code2.cpp -c  (produces code2.o)
g++ code1.o code2.o -o outputFile

This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:

code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status

Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.

Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.

For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).

But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.

And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.

To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.

So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.

.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.

What's the REAL difference between .h and .cpp files?

They are both fundamentally just text files. From certain perspective, their only difference is the filename.

However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.

For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.

I've noticed that simply defining functions in .h files works just fine.

Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.

Does it really reduce compile time?

Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.

This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.

Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.

What else?

Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.

There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.

Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.

The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.

The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.

If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".

I have to do this kind of thing when working in a constrained environment.

Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.

It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.

Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.

.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.

We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.

Related