0% found this document useful (0 votes)
41 views

Serverless Best Practices - Paul Johnston - Medium

The document discusses serverless best practices for building applications that can scale effectively. Some of the key best practices mentioned include: having each function do one thing only; avoiding functions calling other functions directly and instead using queues; using as few libraries as possible to avoid slow cold starts; avoiding connection-based services like RDBMS and instead using serverless-native services; having one function per route; and designing applications where data flows through functions rather than being stored in data lakes. The document emphasizes that applications need to be designed with scale in mind from the beginning.

Uploaded by

Abilio Junior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Serverless Best Practices - Paul Johnston - Medium

The document discusses serverless best practices for building applications that can scale effectively. Some of the key best practices mentioned include: having each function do one thing only; avoiding functions calling other functions directly and instead using queues; using as few libraries as possible to avoid slow cold starts; avoiding connection-based services like RDBMS and instead using serverless-native services; having one function per route; and designing applications where data flows through functions rather than being stored in data lakes. The document emphasizes that applications need to be designed with scale in mind from the beginning.

Uploaded by

Abilio Junior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Serverless Best Practices

Paul Johnston Follow


Aug 21, 2018 · 7 min read

Within the community we’ve been debating the best practices for many
years, but there are a few that have been relatively accepted for most of
that time.

Most serverless practitioners who subscribe to these practices work at


scale. The promise of serverless plays out mostly at both high scale and
bursty workloads rather than at a relatively low level, so a lot of these
best practices come from the scale angle e.g. Nordstrom in retail and
iRobot in IoT. If you’re not aiming to scale that far, then you can
probably get away without following these best practices anyway.

And remember that best practices are not “the only practices”. Best
practices rely on a set of underlying assumptions. If those assumptions
don’t t your use case, then those best practices may not t.

My main assumption is that everybody is building their application


to be able to run at scale (even if it never ends up being run at scale).

So these are my best practices as I see them.

Each function should do only one thing


It’s about function error and scaling isolation.

Putting it another way, if you use a switch statement in your function,


you’re probably doing it wrong.

A lot of tutorials and frameworks work on the basis of a big monolithic


function behind a single proxy route and use switch statements. I
dislike this pattern. It doesn’t scale well, and tends to make large and
complex functions.

The problem with one/a few functions running your entire app, is that
when you scale you end up scaling your entire application, rather than
scaling the speci c element.

If you have one part of your web application that gets 1 million calls,
and another that gets 1 thousand calls, you have to optimise your
function for the million, whilst including all the code for the thousand.
That’s a waste, and you can’t easily optimise for the thousand. Separate
them out. There’s so much value in that.

Functions don’t call other functions


Functions calling other functions is an anti-pattern.

There are a very few edge cases where this is a valid pattern, but they
are not easily broken down.

Basically, don’t do it. You simply double your cost, and make debugging
more complex and remove the value of the isolation of your functions.

Functions should push data to a data store or queue, which should


trigger another function if more work is needed.

Use as few libraries in your functions as


possible (preferably zero)
This one seems obvious to me.

Functions have cold starts (when a function is started for the rst time)
and warm starts (it’s been started, and is ready to be executed from the
warm pool). Cold starts are impacted by a number of things, but the
size of the zip le (or however the code is uploaded) is a part of it. Also,
the number of libraries that need to be instantiated.

The more code you have, the slower it is to cold start.

The more libraries that need instantiating, the slower it is to cold start.

As an example, Java is a brilliantly performant language on a warm


start on some platforms. But if you use lots of libraries, you can nd it
taking many many seconds to cold start. You almost certainly don’t
need them and cold start performance will hinder not just on starting
up but on scaling too.

As another point I’m a big believer in developers only using libraries


when necessary and that means starting with none, and ending with
none unless I can’t build what’s needed without one.

Things like express are built for servers, and serverless applications do
not need all the elements in there. So why introduce all the code and
dependencies? Why bring in super uous code? It’s not just something
that will never get run, but it could introduce a security risk.
There are so many reasons for this being a best practice. Of course, if
there is a library that you have tested, know and trust, then absolutely
bring it in, but the key element there is testing, knowing and trusting
the code. Following a tutorial, is not the same thing.

Avoid using connection based services


e.g. RDBMS
Just don’t unless you have to.

This one will get me into the most trouble. A lot of web application
people will jump on the “but RDBMS are what we know” bandwagon.

It’s not about RDBMS. It’s about the connections.

Serverless works best with services rather than connections.

Services are intended to return responses to requests really rapidly and


to handle the complexity of the data layer behind the service. This is of
huge value in the serverless space, and why something like DynamoDB
ts so well within the serverless paradigm.

To be honest, serverless people are not against RDBMS, they are against
connections. Connections take time, and if you imagine a function
scaling up, each function environment needs a connection, and you’re
introducing both a bottleneck and a I/O wait into the cold start of the
function. It is needless.

So if you have to use an RDBMS, but put a service that handles


connection pooling in the middle, maybe an auto scaling container of
some description simply to handle that would be great.

The biggest point to make here is that serverless architecture may well
require you to rethink your data layer. That’s not the fault of serverless.
If you try to reuse your current data layer thinking and it doesn’t work,
then it’s probably a lack of understanding serverless architectures.

One function per route (if using HTTP)


Avoid using the single function proxy where possible. It doesn’t scale
well and doesn’t help isolate issues. There are occasions where you can
avoid this, e.g. where the functionality of a series of routes are tied
strictly to a single table and it’s very much decoupled from the rest of
the application, but that is an edge case in most applications I’ve
worked in.
This adds complexity in terms of management, but it really helps in
terms of isolation of errors and issues when your application scales.
Start as you mean to go on.

But then, you were using some sort of con guration management tool
anyway to run everything weren’t you? And you already used CI and
CD tools of some sort right? You still have to DevOps with serverless.

Learn to use messages and queues


(async FTW)
Serverless applications tend to work best when the application is
asynchronous. This isn’t straight forward for web applications where
the tendency is to do request-response and lots of querying.

Going back to the functions not calling other functions, it’s important to
point out that this is how you chain functions together. A queue acts as
a circuit breaker in the chaining scenario, so that if a function fails, you
can easily drain down a queue that has got backed up due to a failure,
or push messages that fail to a dead letter queue.

Basically, learn how distributed systems work.

With client applications with a serverless back end, the best approach is
to look into CQRS. Separating out the point of retrieving data from the
point of inputting data is key to this kind of pattern.

Data ows, not data lakes


In a serverless system, your data ows through your system. It can end
up in a data lake, but the likelihood is that while it’s in your serverless
system it is in some sort of ow. So treat all data like it is in motion, not
at rest at any point.

It’s not always possible, but try to avoid querying from a data lake
within a serverless environment.

Serverless requires you to rethink your data layer signi cantly. This is
the biggest gotcha with new people coming to serverless who tend to
reach for the RDBMS and fall at not only because the scaling catches
them out, but their data structures become too rigid too fast.

You will nd that your ows will change as your application changes
and scale will change all of it. If all you have to do is redirect a ow it’s
easy. It is far harder damming a lake.
I know this point is a bit more “out there” than others, but it’s not a
straight forward one to make.

Just coding for scale is a mistake, you have


to consider how it scales
It is very easy to create your rst serverless application, and watch it
scale. If you don’t understand what you’ve done though, you can easily
fall into the trap that you can with every other auto-scaling solution.

If you don’t consider your application and how it will scale then you
set yourself up for problems. If you make something with a slow cold
start (lots of libraries and using an RDBMS for example) and then get a
spike in usage, you could end up signi cantly increasing concurrency of
your function, and then maxing out your connections, and slowing
your application down.

So, don’t just drop an application in, and then imagine that it will work
the same under load. Understanding your application under load is still
part of the job.

Conclusion
There are lots more things I could have put in here and this is my
opinion about the things that I have to explain most to people when I
talk to them.

I haven’t mentioned things like how to plan your application, or how to


consider costing out an application or anything like that as that’s
slightly out of scope.

Looking forward to hearing other people’s thoughts. Pretty sure I’m


going to get a ood of people telling me I’m wrong about RDBMS. As
with containers, I don’t hate RDBMS, but I like to use the right tools for
the right jobs. Know your tools!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy