To elaborate on Joeri's answer a little:
The International Organisation for Standardization defines encapsulation as, 'The property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.'
Thus, as some information is accessible via these interfaces, some information must be hidden and inaccessible within the object. The property such information exhibits is called information hiding, which Parnas defined by arguing that modules should be designed to hide both difficult decisions and decisions that are likely to change.
Note that word: change. Information hiding concerns potential events, such as the changing of difficult design decisions in the future.
Consider a class with two methods: method a() which is information hidden within the class, and method b() which is public and thus accessible directly by other classes.
There is a certain probability that a future change to method a() will require changes in methods in other classes. There is also a certain probability that a future change to method b() will require changes in methods in other classes. The probability that such ripple changes will occur for method a(), however, will usually be lower than that for method b() simply because method b() may be depended upon by more classes.
This reduced probability of ripple impacts is a key benefit of encapsulation.
Consider the maximum potential number of source code dependencies (MPE - the acronym is from graph theory) in any program. Extrapolating from the definitions above, we can say that, given two programs delivering identical functionality to users, the program with the lowest MPE is better encapsulated, and that statistically the more well-encapsulated program will be cheaper to maintain and develop, because the cost of the maximum potential change to it will be lower than the maximum potential change to the less well-encapsulated system.
Consider, furthermore, a language with just methods and no classes and hence no means of information hiding methods from one another. Let's say our program has 1000 methods. What is the MPE of this program?
Encapsulation theory tells us that, given a system of n public nodes, the MPE of this system is n(n-1). Thus the MPE of our 1000 public methods is 999,000.
Now let's break that system into two classes, each having 500 methods. As we now have classes, we can choose to have some methods public and some methods private. This will be the case unless every method is actually dependent on every other method (which is unlikely). Let's say that 50 methods in each class is public. What would the MPE of the system be?
Encapsulation theory tells us it's: n((n/r) -1 + (r-1)p) where r is the number of classes, and p is the number of public methods per class. This would give our two-class system an MPE of 499,000. Thus the maximum potential cost of a change in this two-class system is already substantially lower than that of the unencapsulated system.
Let's say you break your system into 3 classes, each having 333 classes (well, one will have 334), and again each with 50 public methods. What's the MPE? Using the above equation again, the MPE would be approximately 482,000.
If the system is broken into 4 classes of 250 methods each, the MPE will would be 449,000.
If may seem that increasing the number of classes in our system will always decrease its MPE, but this is not so. Encapsulation theory shows that the number of classes into which the system should be decomposed to minimise MPE is: r = sqrt(n/p), which for our system is actually 4. A system with 6 classes, for example, would have an MPE of 465,666.
The Principle of Burden takes two forms.
The strong form states that the burden of transforming a collection of entities is a function of the number of entities transformed. The weak form states that the maximum potential burden of transforming a collection of entities is a function of the maximum potential number of entities transformed.
In slightly more detail, the burden of creating or modifying any software system is a function of the number of program units created or modified.
Program units that depend on a particular, modified program unit have a higher probability of being impacted than program units that do not depend on the modified program unit.
The maximum potential burden an modified program unit can impose is the impacting of all program units that depend on it.
Reducing the dependencies on an modified program unit therefore reduces the probability that its update will impact other program units and so reduces the maximum potential burden that that program unit can impose.
Reducing the maximum potential number of dependencies between all program units in a system therefore reduces the probability that an impact to a particular program unit will cause updates to other program units, and thus reduces the maximum potential burden of all updates.
Thus, encapsulation is a foundation stone of object orientation and encapsulation helps us to reduce the maximum potential number of dependencies between all program units and to mitigate the weak form of the Principle of Burden.