Applied Ai Enterprise Java ER Red Hat Developer
Applied Ai Enterprise Java ER Red Hat Developer
Applied Ai Enterprise Java ER Red Hat Developer
Development
How to Successfully Leverage Generative AI,
Large Language Models, and Machine Learning
in the Java Enterprise
Editors: Melissa Potter and Brian Guerin Cover Designer: Karen Montgomery
Production Editor: Katherine Tozer Illustrator: Kate Dullea
Interior Designer: David Futato
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Applied AI for Enterprise Java Develop‐
ment, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use
of or reliance on this work. Use of the information and instructions contained in this work is at your
own risk. If any code samples or other technology this work contains or describes is subject to open
source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
This work is part of a collaboration between O’Reilly and Red Hat. See our statement of editorial
independence.
978-1-098-17444-6
[FILL IN]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
2. Inference API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
What is an Inference API? 28
Examples of Inference APIs 29
Deplying Inference Models in Java 33
Inferencing models with DJL 34
Under the hood 42
Inferencing Models with gRPC 43
Next Steps 49
v
3. Accessing the Inference Model with Java. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Connecting to an Inference API with Quarkus 51
Architecture 52
The Fraud Inference API 53
Creating the Quarkus project 53
REST Client interface 53
REST Resource 54
Testing the example 55
Connecting to an inference API with Spring Boot WebClient 56
Adding WebClient Dependency 56
Using the WebClient 56
Connecting to the Inference API with Quarkus gRPC client 57
Adding gRPC Dependencies 58
Implementing the gRPC Client 58
Going Beyond 61
vi | Table of Contents
Brief Table of Contents (Not Yet Final)
vii
Preface
ix
• Java developers looking to expand their skill set into AI and machine learning
• IT professionals seeking to understand the practical implementation of the busi‐
ness value that AI promises to deliver
As the title already implies, we intend to keep this book practical and development
centric. This book isn’t a perfect fit but will still benefit:
x | Preface
Chapter 3: Models: Serving, Inference, and Architectures - Architectural Concepts for
AI-Infused Applications
Now that we have the basics in place, we move into the architectural aspects of
AI applications. This chapter walks you through best practices for integrating
AI into existing systems, focusing on modern enterprise architectures like APIs,
microservices, and cloud-native applications. We’ll start with a simple scenario
and build out more complex solutions, adding one conceptual building block at a
time.
Chapter 4: Public Models - Exploring AI Models and Model Serving Infrastructure
This chapter talks about the most prominent AI models and their unique special‐
ties. We help you understand the available models and you’ll learn how to choose
the right model for your use case. We also cover model serving infrastructure—
how to deploy, scale, and manage AI models in both cloud and local environ‐
ments. This chapter equips you with the knowledge to serve models efficiently in
production.
Chapter 5: Inference API - Inference and Querying AI Models with Java
We take a closer look at the process of “querying” AI models, often referred to
as inference or asking a model to make a prediction. We introduce the standard
APIs that allow you to perform inference and walk through practical Java exam‐
ples that show how to integrate AI models seamlessly into your applications. By
the end of this chapter, you’ll be proficient in writing Java code that interacts with
AI models to deliver real-time results.
Chapter 6: Accessing the Inference Model with Java - Building a Full Quarkus-Based AI
Application
This hands-on chapter walks you through the creation of a full AI-infused appli‐
cation using Quarkus, a lightweight Java framework. You’ll learn how to integrate
a trained model into your application using both REST and gRPC protocols and
explore testing strategies to ensure your AI components work as expected. By the
end, you’ll have your first functional AI-powered Java application.
Chapter 7: Introduction to LangChain4J
LangChain4J is a powerful library that simplifies the integration of large language
models (LLMs) into Java applications. In this chapter, we introduce the core
concepts of LangChain4J and explain its key abstractions.
Chapter 8: Image Processing - Stream-Based Processing for Images and Video
This chapter takes you through stream-based data processing, where you’ll learn
to work with complex data types like images and videos. We’ll walk you through
image manipulation algorithms and cover video processing techniques, including
optical character recognition (OCR).
Preface | xi
Chapter 9: Enterprise Use Cases
Chapter nine covers enterprise use cases. We’ll discuss real life examples that we
have seen and how they make use of either generative or predictive AI. It is a
selection of experiences you can use to extend your problem solving toolbox with
the help of AI.
Chapter 10: Architecture AI Patterns
In this final chapter, we shift focus from foundational concepts and basic
implementations to the patterns and best practices you’ll need for building AI
applications that are robust, efficient, and production-ready. While the previous
chapters provided clear, easy-to-follow examples, real-world AI deployments
often require more sophisticated approaches to address the unique challenges
that arise at scale which you will experience a selection of in this chapter.
We are assuming that you’ll run the examples from this book on your laptop and you
have a solid understanding of Java already. The models we are going to work with are
publicly accessible and we will help you download, install, and access them when we
get to later chapters. If you have a GPU at hand, perfect. But it won’t be neccessary
for this book. Just make sure you have a reasonable amount of disc space available on
your machine.
xii | Preface
CHAPTER 1
The Enterprise AI Conundrum
Artificial Intelligence (AI) has rapidly become an essential part of modern enterprise
systems. We witness how it is reshaping industries and transforming the way busi‐
nesses operate. And this includes the way Developers work with code. However,
understanding the landscape of AI and its various classifications can be overwhelm‐
ing, especially when trying to identify how it fits into the enterprise Java ecosystem
and existing applications. In this chapter, we aim to provide you with a foundation by
introducing the core concepts, methodologies, and terminologies that are critical to
building AI-infused applications.
While the focus of this chapter is on setting the stage, it is not just about abstract
definitions or acronyms. The upcoming sections will cover:
A Technical Perspective All the Way to Generative AI While large language models
(LLMs) are getting most of the attention today, the field of artificial intelligence has
a much longer history. Understanding how AI has developed over time is important
when deciding how to use it in your projects. AI is not just about the latest trends
13
—it’s about recognizing which technologies are reliable and ready for real-world
applications. By learning about AI’s background and how different approaches have
evolved, you will be able to separate what is just hype from what is actually useful
in your daily work. This will help you make smarter decisions when it comes to
choosing AI solutions for your enterprise projects.
Open-Source Models and Training Data AI is only as good as the data it learns from.
High-quality, relevant, and well-organized data is crucial to building AI systems
that produce accurate and reliable results. In this chapter, you’ll learn why using
open-source models and data is a great advantage for your AI projects. The open-
source community shares tools and resources that help everyone, including smaller
companies, access the latest advancements in AI.
Ethical and Sustainability Considerations As AI becomes more common in business,
it’s important to think about the ethical and environmental impacts of using these
technologies. Building AI systems that respect privacy, avoid bias, and are transparent
in how they make decisions is becoming more and more important. And training
large models requires significant computing power, which has an environmental
impact. We’ll introduce some of the key ethical principles you should keep in mind
when building AI systems, along with the importance of designing AI in ways that are
environmentally friendly.
The Lifecycle of LLMs and Ways to Influence Their Behavior If you’ve used AI chatbots
or other tools that respond to your questions, you’ve interacted with large language
models (LLMs). But these models don’t just work by magic—they follow a lifecycle,
from training to fine-tuning for specific tasks. In this chapter, we’ll explain how LLMs
are created and how you can influence their behavior. You’ll learn the very basics
about prompt tuning, prompt engineering, and alignment tuning, which are ways to
guide a model’s responses. By understanding how these models work, you’ll be able to
select the right technique for your projects.
DevOps vs. MLOps As AI becomes part of everyday software development, it’s impor‐
tant to understand how traditional DevOps practices interact with machine learning
operations (MLOps). DevOps focuses on the efficient development and deployment
of software, while MLOps applies similar principles to the development and deploy‐
ment of AI models. These two areas are increasingly connected, and development
teams need to understand how they complement each other. We’ll briefly outline the
key similarities and differences between DevOps and MLOps, and show how both are
necessary and interconnected to successfully deliver AI-powered applications.
Fundamental Terms AI comes with a lot of technical terms and abbreviations, and it
can be easy to get lost in all the jargon. Throughout this book, we will introduce you
to important AI terms in simple, clear language. From LLMs to MLOps, we’ll explain
everything in a way that’s easy to understand and relevant to your projects. You’ll
also find a glossary at the end of the book that you can refer to whenever you need a
Figure 1-1. What is Gen AI and how is it positioned within the AI Stack.
What initially sounds like individual disciplines can be summarized under the gen‐
eral term Artificial Intelligence (AI). And AI itself is a multidisciplinary field within
Computer Science that boldly strives to create systems capable of emulating and
surpassing human-level intelligence. While traditional AI can be looked at as a mostly
rule-based system the next evolution step is ML, which we’ll dig into next.
You’ve already heard about training data, so it should come as no surprise that at the
heart of the lifecycle lies something called the training phase. This is where LLMs
are fed unbelievable amounts of data to learn from and adapt to. Once an LLM has
been trained it is somewhat a general purpose model. Usually, those models are also
refered to as foundation models. In particularly if we look at very large models like
Llama3 example, their execution requires hughe amounts of resources and they are
generally exceptionally good at general purpose tasks. The next phase a model usually
goes through is known as tuning. This is where we adjust the model’s parameters
to optimize its performance on specific tasks or datasets. Through the process of
hyperparameter tuning, model architects can fine-tune models for greater accuracy,
efficiency, and scalability. This is generally called “hyperparameter optimization” and
includes techniques like: grid search, random search, and Bayesian methods. We do
not dive deeper into this in this book as both training and traditional fine-tuning are
more a Data Scientist’s realm. You can learn more about this in Natural Language
Processing with Transformers, Revised Edition. However we do cover two very spe‐
MLOps vs DevOps
Two important terms have been coined during the last few years when we look at
the way modern software-development and production setting is happening. The first
is DevOps, a term coined in 2009 by Patrick Debois to refer to “development” and
“operations”. The second is Machine Learning Operations or MLOps initialy used
by David Aronchick, in 2017. MLOps is a derived term and basically describes the
application of DevOps principles to the Machine Learning field. The most ovious
difference is the central artifact they are grouped around. The DevOps team is
focused on business applications and the MLOps team is more focused on Machine
Learning models. Both describe the process of developing an artifact and making it
ready for consumption in production.
MLOps vs DevOps | 23
DevOps and MLOps share many similarities, as both are focused on streamlining and
automating workflows to ensure continuous integration (CI), continuous delivery
(CD), and reliable deployment in production environments. Figure 1-3 describes one
possible combination of DevOps and MLOps.
Conclusion
In conclusion, the development and deployment of Large Language Models (LLMs)
require a solid understanding of the training, tuning, and inference processes
involved. As the field of MLOps continues to evolve, it is essential to recognize
the key differences between DevOps and MLOps, with the latter focusing on
the specific needs of machine learning model development and deployment. By
acknowledging the intersection approaches required for cloud-native application-
and model-development, teams can effectively collaborate across disciplines and
bring AI-infused applications successfully into production.
Chapter Two will introduce you to various classifications of LLMs and unveal more
of their inner workings. We’ll provide an overview of the most common taxonomies
used to describe these models. We will also dive into the mechanics of tuning these
models, breaking down the differences between alignment tuning, prompt tuning,
and prompt engineering.
Conclusion | 25
CHAPTER 2
Inference API
You’ve already expanded your knowledge about AI, and the many types of models.
Moreover, you deployed these models locally (if possible) and test them with some
queries. But when it is time to use models, you need to expose them properly,
follow your organization’s best practices, and provide developers with an easy way to
consume the model.
An Inference API helps solve these problems, making models accessible to all devel‐
opers.
This chapter will explore how to expose an AI/ML model using an Inference API in
Java.
27
What is an Inference API?
An Inference API allows developers to send data (in any protocol, such as HTTP,
gRPC, Kafka, etc.) to a server with a machine learning model deployed and receive
the predictions or classifications as a result.
Practically, every time you access cloud models like OpenAI or Gemini or models
deployed locally using ollama, you do so through their Inference API.
Even though it is common these days to use big models trained by big corporations
like Google, IBM, or Meta, mostly for LLM purposes, you might need to use small
custom-trained models to solve one specific problem for your business.
Usually, these models are developed by your organization’s data scientists, and you
must develop some code to infer them.
Let’s take a look at the following example:
Suppose you are working for a bank, and data scientists have trained a custom model
to detect whether a credit card transaction can be considered fraud.
The model is in onnx format with six input parameters and one output parameter of
type float.
As input parameters:
distance_from_last_transaction
The distance from the last transaction that happened. For example,
0.3111400080477545.
ratio_to_median_price
Ratio of purchased price transaction to median purchase price. For example,
1.9459399775518593.
used_chip
Is the transaction through the chip. 1.0 if true, `0.0 if false.
used_pin_number
Is the transaction that happened by using a PIN number. 1.0 if true, 0.0 if false.
online_order
Is the transaction an online order. 1.0 if true, 0.0 if false.
And the output parameter:
prediction
The probability the transaction is fraudulent. For example, 0.9625362.
A few things you might notice here are:
This is a typical use case for creating an Inference API for the model to add an
abstraction layer that makes consuming the model easier.
The Figure 2-1 shows the transformation between a JSON document and the model
parameters done by the Inference API:
• The models are easily scalable. The model has a standard API, and because of the
stateless nature of models, you can scale up and down as any other application of
your portfolio.
• The models are easy to integrate with any service as they offer a well-known API
(REST, Kafka, gRPC, …)
• It offers an abstraction layer to add features like security, monitoring, logging, …
Now that we understand why having an Inference API for exposing a model is
important let’s explore some examples of Inference APIs.
OpenAI
OpenAI offers different Inference APIs, such as chat completions, embeddings, image,
image manipulation, or fine tuning.
To interact with those models, create an HTTP request including the following parts:
In the case of chat completions, two fields are mandatory: the model to use and the
messages to send to complete.
An example of body content sending a simple question is shown in the following
snippet:
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the Capital of Japan?"
}
],
"temperature": 0.2
}
Model to use
Role system allows you to specify the way the model answers questions
String to vectorize
Model to use
The response contains an array of floats in the data field containing the vector data:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
0.0023064255,
-0.009327292,
.... (1536 floats total for ada-002)
-0.0028842222,
Ollama
Ollama provides an Inference API to access LLM models that are running in ollama.
Ollama has taken a significant step forward by making itself compatible with the
OpenAI Chat Completions API, making it possible to use more tooling and applica‐
tions with Ollama. This effectively means interacting with models running in ollama
for chat completions can be done either with OpenAI API or with ollama API.
It uses the POST HTTP method, and the body content of the request is a JSON
document, requiring two fields, model and prompt:
{
"model": "llama3",
"prompt": "Why is the sky blue?",
"stream": false
}
The Figure 2-2 shows an overview of the DJL architecture. The bottom layer shows
the integration between DJL and CPU/GPU, the middle layer are native libraries to
run the models, and tese layers are controlled using plain Java:
Even though DJL provides a layer of abstraction, you still need to have a basic
understanding of machine learning common concepts.
First, generate a simple Spring Boot application with Spring Web dependency. You
can use Spring Initializr (https://start.spring.io/) to scaffold the project or start from
scratch. The name of the project is fraud-detection, and add the Spring Web
dependency.
The Figure 2-4 shows the Spring Initializr parameters for this example:
With the basic layout of the project, let’s work through the details, starting with
adding the DJL dependencies.
Dependencies
DJL offers multiple dependencies depending on the AI/ML framework used. DJL
project provides a Bill of Materials (BOM) dependency to manage the versions of
the project’s dependencies, offering a centralized location to define and update these
versions.
Since the model is in onnx format, add the following dependency containing the
ONNX engine: onnxruntime-engine:
<dependency>
<groupId>ai.djl.onnxruntime</groupId>
<artifactId>onnxruntime-engine</artifactId>
</dependency>
POJOs
The request is a simple Java record with all the transaction details.
public record TransactionDetails(String txId,
float distanceFromLastTransaction,
float ratioToMedianPrice, boolean usedChip,
boolean usedPinNumber, boolean onlineOrder) {}
The response is also a Java record returning a boolean setting if the transaction is
fraudulent.
public record FraudResponse(String txId, boolean fraud) {
}
The next step is configuring and loading the model into memory.
Then, create two methods, one instantiating a Criteria and the other one a ZooMo
del.
The first method creates a Criteria object with the following parameters:
• The location of the model file, in this case, the model is stored at classpath.
• The data type that developers send to the model, for this example, the Java record
created previously with all the transaction information.
• The data type returned by the model, a boolean indicating whether the given
transaction is fraudulent.
• The transformer to adapt the data types from Java code (TransactionDetails,
Boolean) to the model parameters (ai.djl.ndarray.NDList).
• The engine of the model.
@Bean
public Criteria<TransactionDetails, Boolean> criteria() {
return Criteria.builder()
.setTypes(TransactionDetails.class, Boolean.class)
.optModelUrls(modelLocation)
.optTranslator(new TransactionTransformer(THRESHOLD))
.optEngine("OnnxRuntime")
.build();
}
The Runtime. This is especially useful when more than one engine is present in
the classpath.
The second method creates the ZooModel instance from the Criteria object created
in the previous method:
@Bean
public ZooModel<TransactionDetails, Boolean> model(
@Qualifier("criteria") Criteria<TransactionDetails, Boolean> criteria)
throws Exception {
return criteria.loadModel();
}
Transformer
The transformer is a class implementing the ai.djl.translate.NoBatchifyTransla
tor to adapt the model’s input and output parameters to Java business classes.
The model input and output classes are of type ai.djl.ndarray.NDList, which
represents a list of arrays of floats.
For the fraud model, the input is an array in which the first position is the distan
ceFromLastTransaction parameter value, the second position is the value of ratio
ToMedianPrice, and so on. For the output, it is an array of one position with the
probability of fraud.
The transformer has the responsibility to have this knowledge and adapt it according
to the model.
Let’s implement one transformer for this use case:
@Override
public NDList processInput(TranslatorContext ctx, TransactionDetails input)
throws Exception {
NDArray array = ctx.getNDManager().create(toFloatRepresentation(input),
new Shape(1, 5));
return new NDList(array);
}
@Override
public Boolean processOutput(TranslatorContext ctx, NDList list)
throws Exception {
NDArray result = list.getFirst();
float prediction = result.toFloatArray()[0];
System.out.println("Prediction: " + prediction);
Predict
The model is accessed through the ai.djl.inference.Predictor interface. The
predictor is the main class that orchestrates the inference process.
The predictor is not thread-safe, so performing predictions in parallel requires one
instance for each thread. There are multiple ways to handle this problem. One option
is creating the Predictor instance per request. Another option is to create a pool of
Predictor instances so threads can access them.
Moreover, it is very important to close the predictor when it is no longer required to
free memory.
Our advice here is to measure the performance of creating the Predictor instance
per request and then decide whether it is acceptable o use the first or the second
option.
To implement per-request strategy in Spring Boot, return a java.util.function.Sup
plier instance, so you have control over when the object is created and closed.
@Bean
public Supplier<Predictor<TransactionDetails, Boolean>>
predictorProvider(ZooModel<TransactionDetails, Boolean> model) {
return model::newPredictor;
}
REST Controller
To create a REST API in Spring Boot, annotate a class with @org.springframe
work.web.bind.annotation.RestController.
Moreover, since the request to detect fraud should go through the POST HTTP
Method, annotate the method with the’ @org.springframework.web.bind.annota‐
tion.PostMapping` annotation.
@Resource
private Supplier<Predictor<TransactionDetails, Boolean>> predictorSupplier;
@PostMapping("/inference")
FraudResponse detectFraud(@RequestBody TransactionDetails transactionDetails)
throws TranslateException {
try (var p = predictorSupplier.get()) {
boolean fraud = p.predict(transactionDetails);
return new FraudResponse(transactionDetails.txId(), fraud);
}
}
}
One of the best features of the DJL framework is its flexibility in not requiring a spe‐
cific protocol for model inferencing. You can opt for the Kafka protocol if you have
an event-driven system or the gRPC protocol for high-performance communication.
Let’s see how the current example changes when using gRPC.
Protocol Buffers
The initial step in using protocol buffers is to define the structure for the data you
want to serialize, along with the services, specifying the RPC method parameters and
return types as protocol buffer messages. This information is defined in a .proto file
used as the Interface Definition Language (IDL).
Let’s implement the gRPC Server in the Spring Boot project.
Create a fraud.proto file in src/main/proto with the following content expressing
the Fraud Detection contract.
syntax = "proto3";
package fraud;
service FraudDetection {
rpc Predict (TxDetails) returns (FraudRes) {}
}
message TxDetails {
string tx_id = 1;
float distance_from_last_transaction = 2;
float ratio_to_median_price = 3;
bool used_chip = 4;
bool used_pin_number = 5;
bool online_order = 6;
}
message FraudRes {
string tx_id = 1;
bool fraud = 2;
}
Let’s create the gRPC server reusing the Spring Boot project but implementing now
the Inference API for the Fraud Detection model using gRPC Protocol Buffers.
<dependency>
<groupId>net.devh</groupId>
<artifactId>grpc-server-spring-boot-starter</artifactId>
<version>3.1.0.RELEASE</version>
</dependency>
<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.3.2</version>
<scope>provided</scope>
<optional>true</optional>
</dependency>
<build>
...
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.7.1</version>
</extension>
</extensions>
...
<plugins>
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>
com.google.protobuf:protoc:3.25.1:exe:${os.detected.classifier}
</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>
io.grpc:protoc-gen-grpc-java:3.25.1:exe:${os.detected.classifier}
</pluginArtifact>
</configuration>
<executions>
<execution>
<id>protobuf-compile</id>
<goals>
<goal>compile</goal>
<goal>test-compile</goal>
</goals>
</execution>
Finally, implement the rpc method defined in fraud.proto file under FraudDetec
tion service. This method is the remote method invoked when the gRPC client makes
the request to the Inference API.
Because of the streaming nature of gRPC, the response is sent using a reactive call
through the io.grpc.stub.StreamObserver class.
@Override
public void predict(TxDetails request,
StreamObserver<FraudResponse> responseObserver) {
org.acme.TransactionDetails td =
new org.acme.TransactionDetails(
request.getTxId(),
request.getDistanceFromLastTransaction(),
request.getRatioToMedianPrice(),
request.getUsedChip(),
request.getUsedPinNumber(),
request.getOnlineOrder()
);
responseObserver.onNext(fraudResponse);
responseObserver.onCompleted();
} catch (TranslateException e) {
throw new RuntimeException(e);
}
}
Next Steps
You’ve completed the first step in inferring models in Java. DJL has more advanced
features, such as training models, automatic download of popular models (resnet,
yolo, …), image manipulation utilities, or transformers.
This chapter’s example was simple, but depending on the model, things might be
more complicated, especially when images are involved.
In later chapters, we’ll explore more complex examples of inferencing models using
DJL and show you other useful enterprise use cases and models.
In the next chapter, you’ll learn how to consume the Inference APIs defined in this
chapter before diving deep into DJL.
Next Steps | 49
CHAPTER 3
Accessing the Inference Model with Java
In the previous chapter, you learned to develop and expose a model that produces
data using an Inference API. That chapter covered half of the development; you only
learned how to expose the model, but how about consuming this model from another
service? Now it is time to cover the other half, which involves writing the code to
consume the API.
In this chapter, we’ll complete the previous example, you’ll create Java clients to
consume the Fraud Inference APIs to detect if a given transaction can be considered
fraudulent or not.
We’ll show you writting clients for Spring Boot and Quarkus using both REST and
gRPC clients.
51
• The Jakarta REST Client is the standard Jakarta EE approach for interacting with
RESTful services.
• The MicroProfile REST Client provides a type-safe approach to invoke RESTful
services over HTTP using as much of the Jakarta RESTful Web Services spec
as possible. The REST client is defined as a Java interface, making it type-safe
and providing the network configuration using Jakarta RESTful Web Services
annotations.
In this section, you’ll develop a Quarkus service consuming the Fraud Detection
model using the MicroProfile REST Client.
Architecture
Let’s create a Quarkus service sending requests to the Fraud Service Inference API
developed in the previous chapter.
This service contains a list of all transactions done and exposes an endpoint to
validate whether a given transaction ID can be considered fraudulent.
The Figure 3-1 shows the architecture of what you’ll be implementing in this chapter.
Quarkus service receives an incoming request to validate whether a given transaction
is fraudulent. The service gets the transaction information from the database and
sends the data to the fraudulent service to validate whether the transaction is fraudu‐
lent. Finally, the result is stored in the database and returned to the caller.
@Path("/inference")
@RegisterRestClient(configKey = "fraud-model")
public interface FraudDetectionService {
@POST
FraudResponse isFraud(TransactionDetails transactionDetails);
}
REST Resource
The next step is to create the REST endpoint, which will call the REST client created
earlier. The endpoint is set up to handle requests using the GET HTTP method, and
it is implemented with the `@jakarta.ws.rs.GET annotation. The transaction ID is
passed as a path parameter using the `@jakarta.ws.rs.PathParam annotation.
To use the REST client, you should inject the interface using the `@org.eclipse.micro‐
profile.rest.client.inject.RestClient annotation.
Please create a class called TransactionResource with the following content:
// ....
@RestClient
FraudDetectionService fraudDetectionService;
@GET
@Path("/{txId}")
public FraudResponse detectFraud(@PathParam("txId") String txId) {
return fraudResponse;
}
// ....
Interface is injected
With both services running, send the following request to the TransactionResource
endpoint:
curl localhost:8000/fraud/1234
{"txId":"1234","fraud":false}
You consumed an Inference API using Quarkus; in the next section, we’ll see imple‐
menting the same consumer using Spring Boot.
public TransactionController() {
webClient = WebClient.create("http://localhost:8080");
}
return fraudResponseResponseEntity.getBody();
}
Body content
With the dependency registered, let’s implement the code to make gRPC calls.
package fraud;
service FraudDetection {
rpc Predict (TxDetails) returns (FraudRes) {}
}
message TxDetails {
string tx_id = 1;
float distance_from_last_transaction = 2;
float ratio_to_median_price = 3;
bool used_chip = 4;
bool used_pin_number = 5;
bool online_order = 6;
}
message FraudRes {
string tx_id = 1;
bool fraud = 2;
}
With this setup, you can place the protobuf file in the src/main/proto directory. The
quarkus-maven-plugin (already present in any Quarkus project) will then generate
Java files from the proto files.
Under the hood, the quarkus-maven-plugin fetches a compatible version of protoc
from Maven repositories based on your OS and CPU architecture.
At this point, every time you compile the project through Maven, the quarkus-
maven-plugin generates the required gRPC classes from the .proto file. These classes
are generated at the target/generated-sources/grpc directory, automatically added
to the classpath, and packaged in the final JAR file.
• Models are stateless, but in some scenarios, it’s crucial to know what was asked
before in order to generate a correct answer. Generic clients do not have memory
features.
• Using RAG is not directly supported by clients.
• There is no agent support.
• You need to implement the specific Inference API for each model you use.
For these reasons, there are some projects in the Java ecosystem to address these
limitations. The most popular one is LangChain4J.
In the next chapter, we’ll introduce you the LangChain4J project and discuss how to
use it when interacting with LLM models.
Going Beyond | 61
About the Authors
Markus Eisele is a technical marketing manager in the Red Hat Application Devel‐
oper Business Unit. He has been working with Java EE servers from different vendors
for more than 14 years and gives presentations on his favorite topics at international
Java conferences. He is a Java Champion, former Java EE Expert Group member,
and founder of Germany’s number-one Java conference, JavaLand. He is excited to
educate developers about how microservices architectures can integrate and comple‐
ment existing platforms, as well as how to successfully build resilient applications
with Java and containers. He is also the author of Modern Java EE Design Patterns and
Developing Reactive Microservices (O’Reilly). You can follow more frequent updates
on Twitter and connect with him on LinkedIn.
Alex Soto Bueno is a director of developer experience at Red Hat. He is passionate
about the Java world, software automation, and he believes in the open source soft‐
ware model. Alex is the coauthor of Testing Java Microservices (Manning), Quar‐ kus
Cookbook (O’Reilly), and the forthcoming Kubernetes Secrets Management (Man‐
ning), and is a contributor to several open source projects. A Java Champion since
2017, he is also an international speaker and teacher at Salle URL University. You can
follow more frequent updates on his Twitter feed and connect with him on LinkedIn.
Natale Vinto is a software engineer with more than 10 years of expertise on IT and
ICT technologies and a consolidated background on telecommunications and Linux
operating systems. As a solution architect with a Java development background, he
spent some years as EMEA specialist solution architect for OpenShift at Red Hat.
Today, Natale is a developer advocate for OpenShift at Red Hat, helping people within
communities and customers have success with their Kubernetes and cloud native
strategy. You can follow more frequent updates on Twitter and connect with him on
LinkedIn.