"Persistence" versus data management
In the very names "JPA", and "Hibernate" there is a lot of subtext. The subtext is that the problem of managing data in a Java application is one of "persistence", that storing and retrieving data is the chief technical problem and that a database's role in the Java application space is to do only that. To the person who holds this view of software architecture it is very clear that the application domain and its business logic are best modeled in an object-oriented fashion and expressed in terms of classes and design patterns in the Java space, and the job of the database is merely to support and store the entities described in the world of objects.
But current database technology has its origins in another viewpoint, one which classes dramatically with this point of view.
The relational database model reviewed
In the classical relational database perspective a database management system isn't really just a thing that stores data. It is a foundation for your application which stores facts about datum, their relationships to each other, and provides operations on those data and their relationships. Its base is the relation (unfortunately called a table in SQL), a set of sets, manipulated using the relational algebra, itself founded on first-order predicate logic; and applied properly it is a reasoning system for your data, somewhat akin to declarative programming languages like Prolog.
A database properly formulated is a set of facts about the portion of the world you are interested in. By putting together your schema you have actually constructed a series of statements about the application domain; i.e. there's customers who have relationships with addresses, who have relationships with postal codes, and so on.
When data is properly normalized in this fashion the relational algebra can act on it like a sharp knife in the hands of a fine chef; it is capable of taking this "flat" model of sets and performing elaborate slices which dynamically reconstruct the data into specified forms. And given the right implementation technology it can do this reliably and with adequate performance.
The object-oriented model is entirely different.
In it, data (the objects, the nouns) is never far from its actions (the methods, the verbs), or its taxonomy (inheritance structure.) The object-oriented developer spends a lot of time on developing the taxonomic relationships of classes of objects to other classes; this is important because inheritance is the least-handicapped form of polymorphism in mainstream object-oriented languages, and polymorphism is their main form of code re-use.
This model is well suited for the application domains where it was originally invented: simulation (Simula) and in the development of graphic user interfaces (Smalltalk).
But it is not very good for general purpose data management, and here's some reasons why:
Problem #1: Object identity through pointers
How do we find things in our heap of data? And once we've found them, how do we continue to identify them? And having identified them, how do we reference them from other places?
In the object-oriented world we do so through the use of references, or to use another term for the same thing, pointers. A given object reference always refers to the same object. To find something, we have to get copy of this reference from somebody. The identity of an object is entirely its reference, not its attributes.
In the relational world, however, we can identify a given tuple ("row" in SQL) in a number of ways. The most common way is through its primary key, but there are often many possible candidate keys to identify a given tuple. Furthermore, a primary key can itself be a composite of a number of attributes. There are no pointers or references in the relational world; a key is not a pointer, address or reference; it is a potentially meaningful on its own, and can even be partial, and it is not the only way to find something.
This is important for a number of reasons; it makes it easier to find things, and it means there need not be a broker to find this reference and give it to us, we are free to compose a lookup of this object anywhere, and, because the relational model only ever deals with sets, we always deal with groups of similar data, so even if we have only a partial match (because we have incomplete criterion) we are still quite able to find and manipulate our data.
Furthermore, the identity of a tuple is entirely composed its attributes; in a properly constrained database two customers with identical attributes must be the same tuple in a relation ("row" in a "table").
We can write functions to find data by its attributes or combinations of its attributes in the object-oriented model; but when we do so we find that the sets of operations we are creating start to look more and more like those already present in the relational model.
Not so with the object-oriented model, where we can potentially have two objects with identical attributes but with separate identity by nature of the fact that there are two references, two areas of memory, that hold this data.
Problem #2: Fixed attributes, hierarchical addressing of attributes
A given class in the object-oriented model has a defined set of attributes, which are either values of some basic type such as a String, Integer, etc. or instances of some other composite class. For example, a given customer may have a name, and some address information, but also have directly attached to it a list of Orders, which are themselves composed of Products, shipping dates, quantities, etc.
This relationship is often a tree, sometimes a directed graph.
The problem here is that once this relationship is encoded the "path" to the data is set in stone; in order to find a list of orders, we must go through one of the pre-conceived paths, and (as mentioned above) we must do it starting from some object reference we have already acquired.
In the relational model we are free to find a tuple through any set of attributes, and using projection operations we can recompose that tuple into any structure we want. Furthermore through joins we can take two sets of tuples and turn them into one set. In our example above, if I need for some purpose a mixture of data from both the customer and the order, I do not have to define a new class; I just write a new query.
Problem #3: Lack of constraints
In the example given above, assume the tree of relationships of attributes is Customer->Order->Product
While it is possible to construct this relationship bidirectionally and turn this into a directed graph, there is typically nothing in the implementation language (such as Java) to either enforce or automatically manage this relationship.
So, in our example, while Product may itself have an "order" attribute, the object-oriented model in Java lacks the ability to automatically update the products list on the order when the order attribute on the product is changed. Furthermore it lacks (at the higher level) the ability to declare any constraints as to the validity of this data or to prevent hanging references.
Problem #4: Poverty of high-level operations
Java is an imperative language with fairly low level constructs for dealing with data; it has a fairly handicapped data structures library, fairly poor date and calendar handling, etc.
It was not designed as a data manipulation language, but as a general purpose language for set-top TV boxes and then later a vague sense of "Internet applications" and so on.
Take a simple operation like this: given a list of customers, find all their orders, sort them by the price of the product in the order, and then group them by quantity and the customer that made the order.
Given lists of all these pieces of data, and using Java's collection library, try to do this in under a page of code.
In the relational algebra this is a single expression, and is possible with only 5 operations. It's one SQL query.
Conclusion
Basically I'd argue that Java and languages like it are not yet at the right abstraction level for dealing with so-called business logic and data management.
In my ideal world, we stop having a separate language and system for querying; the client-server interface to our central data store system becomes invisible; but the system stays relational. Schema manipulation and inspection, and all of the relational algebraic query operators become available to the applications programmer as first-class citizens.
In this world, the splitting of development roles between DBA and programmer dichotomy (a very unconstructive relationship) breaks down as the 'data modeler' hat has to be worn by all developers. The goal: to think about data in a declarative way, to reduce the amount of drudge work, and to make sure our data is clean, intact, and well expressed.
