Persistence Context by Example

Table of Contents

Context
#

While working on a project using Spring Data JPA + Hibernate I came across a section that looked something like this:

// SomeClass.java file
@Component
public SomeClass {
	private final CoolJpaRepository coolJpaRepository;

	private final AnotherClass anotherClass;

	public SomeClass(
		final CoolJpaRepository coolJpaRepository,
		final AnotherClass anotherClass
	) {
		this.coolJpaRepository = coolJpaRepository;
		this.anotherClass = anotherClass;
	}

	@Transactional
	public void doSomething() {
		// ...
		coolJpaRepository.findByCustomField();
		anotherClass.doAnotherThing();
		// ...
	}
}

// AnotherClass.java file
@Component
public AnotherClass {
	private final CoolJpaRepository coolJpaRepository;

	public AnotherClass(final CoolJpaRepository coolJpaRepository) {
		this.coolJpaRepository = coolJpaRepository;
	}

	@Transactional(propagation = Propagation.REQUIRES_NEW)
	public void doAnotherThing() {
		// ...
		coolJpaRepository.findByCustomField();
		// ...
	}
}

Breaking down the example
#

First, we have a class called SomeClass that depends on both a JPA repository called CoolJpaRepository and another Spring managed class called AnotherClass.

@Component
public SomeClass {
	private final CoolJpaRepository coolJpaRepository;

	private final AnotherClass anotherClass;

	public SomeClass(
		final CoolJpaRepository coolJpaRepository,
		final AnotherClass anotherClass
	) {
		this.coolJpaRepository = coolJpaRepository;
		this.anotherClass = anotherClass;
	}

	// ...
}

This class has a method that runs within a transaction that calls findByCustomField from the JPA repository and doAnotherThing from AnotherClass.

@Transactional
public void doSomething() {
	// ...
	coolJpaRepository.findByCustomField();
	anotherClass.doAnotherThing();
	// ...
}

When we look at the AnotherClass class we see that it too depends on CoolJpaRepository.

@Component
public AnotherClass {
	private final CoolJpaRepository coolJpaRepository;

	public AnotherClass(final CoolJpaRepository coolJpaRepository) {
		this.coolJpaRepository = coolJpaRepository;
	}

	// ...
}

And it calls findByCustomField on a new transaction (because of the @Transactional(propagation = Propagation.REQUIRES_NEW)).

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void doAnotherThing() {
	// ...
	coolJpaRepository.findByCustomField();
	// ...
}

In essence we have the same repository method call (findByCustomField) on different classes that, even though are running on separate transactions, are parts of the same flow.

The problem with this is that every time we execute this flow the query related to the findByCustomField method will be executed two times, this of course presuming that JPA/Hibernate does not have some kind of performance optimization in place…

Enter the Persistence Context
#

The Persistence Context acts as a cache between the application code and the database, for that reason being also known as the first-level cache.

This cache is created per transaction boundary and guarantees that, given an identifier, one and only one related entity will be inside of it, ensuring consistent entity changes and enabling repeatable reads (AKA reading from the cache) to work.

It implements a cache strategy known as write-behind, which basically means that entity changes are first stored on the cache and, on a second moment, are translated to write operations that are sent in batch to the database.

When using the JPA specification the EntityManager manages the Persistence Context while when working with Hibernate directly we use the Session object.

Does this help with our initial problem?
#

Given that repeatable reads are a feature of the Persistence Context we can use it to our advantage on our problem, right? Sadly, not for now.

As explained above, we have a new Persistence Context per transaction boundary and on our example we have two different transactions on the same flow thanks to the @Transactional(propagation = Propagation.REQUIRES_NEW). So lets get rid of it:

@Component
public SomeClass {
	private final CoolJpaRepository coolJpaRepository;

	private final AnotherClass anotherClass;

	public SomeClass(
		final CoolJpaRepository coolJpaRepository,
		final AnotherClass anotherClass
	) {
		this.coolJpaRepository = coolJpaRepository;
		this.anotherClass = anotherClass;
	}

	@Transactional
	public void doSomething() {
		// ...
		coolJpaRepository.findByCustomField();
		anotherClass.doAnotherThing();
		// ...
	}
}

@Component
public AnotherClass {
	private final CoolJpaRepository coolJpaRepository;

	public AnotherClass(final CoolJpaRepository coolJpaRepository) {
		this.coolJpaRepository = coolJpaRepository;
	}

	@Transactional // REMOVED THE PROPAGATION
	public void doAnotherThing() {
		// ...
		coolJpaRepository.findByCustomField();
		// ...
	}
}

With the code above both doSomething and doAnotherThing will share the same transaction boundary and, as a consequence, the same Persistence Context.

With these changes, we might expect the following behavior: the first findByCustomField call would populate the cache, and the second call would trigger a repeatable read instead of hitting the database. In actuality, we will still see two queries being issued to the database when running the example above.

It’s never that simple…

Custom queries and the first-level cache
#

When dealing with custom JPQL/HQL or native SQL queries Hibernate does not check the first-level cache for entities related to these queries, instead going straight to the second-level cache (if enabled) or to the database.

This explains why on our example the method findByCustomField() issues two queries even when both calls happen on the same transactional boundary. Since under the hood Spring Data JPA is generating a JPQL query from the method it does not have the benefit of getting the entity from the Persistence Context, even if this entity is already loaded!

The exception to this is when we do not attempt to get entities through queries, but instead using methods like EntityManager.find or Session.load . Both this methods interact directly with the Persistence Context and get the entity through the id associated with it. On Spring Data JPA we can use the findById method to achieve the same thing, since it’s built on top of EntityManager.find.

To make it more concrete, if we had something like:

@Component
public SomeClass {
	private final CoolJpaRepository coolJpaRepository;

	private final AnotherClass anotherClass;

	public SomeClass(
		final CoolJpaRepository coolJpaRepository,
		final AnotherClass anotherClass
	) {
		this.coolJpaRepository = coolJpaRepository;
		this.anotherClass = anotherClass;
	}

	@Transactional
	public void doSomething() {
		// ...
		coolJpaRepository.findById(); // CHANGED TO findById
		anotherClass.doAnotherThing();
		// ...
	}
}

@Component
public AnotherClass {
	private final CoolJpaRepository coolJpaRepository;

	public AnotherClass(final CoolJpaRepository coolJpaRepository) {
		this.coolJpaRepository = coolJpaRepository;
	}

	@Transactional
	public void doAnotherThing() {
		// ...
		coolJpaRepository.findById(); // CHANGED TO findById
		// ...
	}
}

The database would receive the first findById query and save the entity on the first-level cache. On the second findById call the Persistence Context would be checked for existing entities and, since it would find the one we just loaded, no queries would be issued to the database.

Conclusion
#

As we can see the Persistence Context can help us to decrease reading times by avoiding issuing queries, but only for specific use cases where we directly check the first-level cache.

If we need custom queries the loaded entities are ignored and we need a second-level cache to not request data from the database.

So why do we have repeatable reads? In the next article I intended to discuss this question, how entities are actually managed on the first-level cache, and the possible risks of having repeatable reads.

Stay tuned 👀.

Context #

Breaking down the example #

Enter the Persistence Context #

Does this help with our initial problem? #

Custom queries and the first-level cache #

Conclusion #