Goal The goal is to implement a reverse proxy similar to Nginx. I’m not sure what I’ll cover and I’ll decide as I go.
Origin server A reverse proxy needs to have one or more origin servers to proxy from/to. For this I’m doing a simple HTML with two static resources:
< title >Origin Server</ title >
< link rel = " stylesheet " type = " text/css " href = " styles.css " />
< p >Home of this cute cat.</ p >
< img src = " cute_cat.jpg " alt = " Cute cat " />
I’m using npx server origin-server/
to serve these files. By default it servers them to localhost:3000 .
With all my UX brilliance this is the result:
When accessing the page the server also logs the requests to it:
HTTP 7/20/2024 10:53:08 AM 127.0.0.1 GET /
HTTP 7/20/2024 10:53:08 AM 127.0.0.1 Returned 304 in 51 ms
HTTP 7/20/2024 10:53:08 AM 127.0.0.1 GET /styles.css
HTTP 7/20/2024 10:53:08 AM 127.0.0.1 Returned 304 in 2 ms
HTTP 7/20/2024 10:53:08 AM 127.0.0.1 GET /cute_cat.jpg
HTTP 7/20/2024 10:53:08 AM 127.0.0.1 Returned 304 in 2 ms
Getting started First I’m going to define a variable for my upstream server:
( def upstream-server { :hostname " localhost " :port 3000 })
We can re-use what was done on Basic HTTP server in Clojure .
( :require [clojure.string :as str])
( :import ( java.net ServerSocket)
( java.io BufferedReader InputStreamReader PrintWriter)))
( def origin-server { :hostname " localhost " :port 3000 })
( if-let [line ( .readLine in)]
( let [[method path protocol] ( str/split line #" \s +" )]
:method ( -> method str/lower-case keyword)
( println " Client disconnected " ))
( if-let [line ( .readLine in)]
( let [[key value] ( str/split line #": " )]
( recur ( assoc-in request [ :headers key] value) :headers )))
( let [content-length ( get-in request [ :headers " Content-Length " ])]
( let [size ( parse-long content-length)
( aset body i ( char ( .read in))))
( recur ( assoc request :body ( String. body)) :done ))))
( .println out " HTTP/1.1 200 OK \r\n\r\n " )
( .println out ( str request))
( println " Done parsing the request " )))))
( with-open [in ( BufferedReader. ( InputStreamReader. ( .getInputStream client-socket)))
out ( PrintWriter. ( .getOutputStream client-socket) true )]
( println " Client error: " ( .getMessage e)))
( println " Client disconnected " )))))
( let [state ( atom { :server-socket nil
( when ( nil? ( :server-socket @state))
( reset! state { :server-socket ( ServerSocket. port)
( println " Server started on port " port)
( let [client-socket ( .accept ( :server-socket @state))]
( swap! state update :clients conj client-socket)
( client-handler client-socket)
( swap! state update :clients #( remove #{client-socket} %))
( println " Server error: " ( .getMessage e)))
( when-not ( .isClosed ( :server-socket @state))
( .close ( :server-socket @state))
( println " Server stopped by error " )))))))
( when-let [server-socket ( :server-socket @state)]
( doseq [client-socket ( :clients @state)]
( println " Client disconnected by server " ))
( println " Server stopped by user " )
( reset! state { :server-socket nil
But remove this block:
(if-let [line (.readLine in)]
(let [[method path protocol] (str/split line #"\s+")]
:method (-> method str/lower-case keyword)
(println "Client disconnected"))
(if-let [line (.readLine in)]
(let [[key value] (str/split line #": ")]
(recur (assoc-in request [:headers key] value) :headers)))
(let [content-length (get-in request [:headers "Content-Length"])]
(let [size (parse-long content-length)
(aset body i (char (.read in))))
(recur (assoc request :body (String. body)) :done))))
(.println out "HTTP/1.1 200 OK\r\n\r\n")
(.println out (str request))
(println "Done parsing the request")))))
(with-open [in (BufferedReader. (InputStreamReader. (.getInputStream client-socket)))
out (PrintWriter. (.getOutputStream client-socket) true)]
(println "Client error:" (.getMessage e)))
(println "Client disconnected")))))
Then replace client-handler
with reading:
Read socket’s input Pipe that to the upstream server’s output Read the upstream server’s input Pipe that to socket’s output ( with-open [in ( DataInputStream. ( .getInputStream client-socket))
out ( DataOutputStream. ( .getOutputStream client-socket))]
( let [input-bytes ( byte-array 4096 )
res ( .read in input-bytes)]
-1 ( println " End of stream reached " )
0 ( println " No data received " )
( println ( format " -> * %dB " res))
( let [return-bytes ( byte-array 4096 )
upstream-socket ( atom nil )]
( reset! upstream-socket ( Socket. ( :hostname upstream-server) ( :port upstream-server)))
( let [upstream-in ( DataInputStream. ( .getInputStream @upstream-socket))
upstream-out ( DataOutputStream. ( .getOutputStream @upstream-socket))]
( println ( format " Connected to %s:%d " ( -> @upstream-socket .getInetAddress .getHostAddress) ( .getPort @upstream-socket)))
( .write upstream-out input-bytes 0 res)
( println ( format " * -> %dB " res))
( reset! upstream-res ( .read upstream-in return-bytes))
-1 ( println " End of upstream stream reached " )
0 ( println " No data from upstream received " )
( println ( format " * <- %dB " @upstream-res))))
( println " Upstream error: " e))
( .close @upstream-socket)
( println ( format " Disconnected from %s:%d " ( -> @upstream-socket .getInetAddress .getHostAddress) ( .getPort @upstream-socket)))))
( .write out return-bytes 0 @upstream-res)
( println ( format " <- * %dB " @upstream-res)))))))
( println " Client error: " ( .getMessage e)))
( println " Client disconnected " )))))
DataInputStream
and DataOutputStream
are the Java classes for handling binary data with IO streams.
Can't the input be more than 4096 bytes long?
Well, it can! But we’re not dealing with that for now.
When doing
we get back this response:
< title >Origin Server</ title >
< link rel = " stylesheet " type = " text/css " href = " styles.css " />
< p >Home of this cute cat.</ p >
< img src = " cute_cat.jpg " alt = " Cute cat " />
and the server shows these logs:
New connection from 0:0:0:0:0:0:0:1:8080
Connected to 127.0.0.1:3000
Disconnected from 127.0.0.1:3000
When opening the browser we don’t see the image!
Ha! It turns out you need to handle more than 4096 bytes!
Indeed. But I’m only going to do that for the connection between the proxy and the upstream server.
( with-open [in ( DataInputStream. ( .getInputStream client-socket))
out ( DataOutputStream. ( .getOutputStream client-socket))]
( let [input-bytes ( byte-array 4096 )
res ( .read in input-bytes)]
-1 ( println " End of stream reached " )
0 ( println " No data received " )
( println ( format " -> * %dB " res))
( with-open [upstream-socket ( Socket. ( :hostname upstream-server) ( :port upstream-server))
upstream-in ( DataInputStream. ( .getInputStream upstream-socket))
upstream-out ( DataOutputStream. ( .getOutputStream upstream-socket))]
( println ( format " Connected to %s:%d " ( -> upstream-socket .getInetAddress .getHostAddress) ( .getPort upstream-socket)))
( .write upstream-out input-bytes 0 res)
( println ( format " * -> %dB " res))
( let [return-bytes ( byte-array 4096 )]
( let [upstream-res ( .read upstream-in return-bytes)]
-1 ( println " End of upstream stream reached " )
0 ( println " No data from upstream received " )
( println ( format " * <- %dB " upstream-res))
( .write out return-bytes 0 upstream-res)
( println ( format " <- * %dB " upstream-res))
( when ( = upstream-res 4096 )
( println " Client error: " ( .getMessage e)))
( println " Client disconnected " )))))
With the logs showing:
New connection from 127.0.0.1:8080
Connected to 127.0.0.1:3000
New connection from 127.0.0.1:8080
Connected to 127.0.0.1:3000
New connection from 127.0.0.1:8080
Connected to 127.0.0.1:3000
New connection from 127.0.0.1:8080
Connected to 127.0.0.1:3000
And on the upstream server:
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 GET /
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 Returned 200 in 69 ms
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 GET /styles.css
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 Returned 200 in 3 ms
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 GET /cute_cat.jpg
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 Returned 200 in 4 ms
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 GET /favicon.ico
HTTP 7/21/2024 10:07:57 PM 127.0.0.1 Returned 404 in 7 ms
Handling upstream server down The correct thing to do when we can’t reach the upstream server is to return a 502.
( with-open [upstream-socket ( Socket. ( :hostname upstream-server) ( :port upstream-server))
upstream-in ( DataInputStream. ( .getInputStream upstream-socket))
upstream-out ( DataOutputStream. ( .getOutputStream upstream-socket))]
( println ( format " Connected to %s:%d " ( -> upstream-socket .getInetAddress .getHostAddress) ( .getPort upstream-socket)))
( .write upstream-out input-bytes 0 res)
( println ( format " * -> %dB " res))
( let [return-bytes ( byte-array 4096 )]
( let [upstream-res ( .read upstream-in return-bytes)]
-1 ( println " End of upstream stream reached " )
0 ( println " No data from upstream received " )
( println ( format " * <- %dB " upstream-res))
( .write out return-bytes 0 upstream-res)
( println ( format " <- * %dB " upstream-res))
( when ( = upstream-res 4096 )
( catch ConnectException _e
( let [bytes ( .getBytes " HTTP/1.1 502 Bad Gateway \r\n\r\n " )]
( .write out bytes 0 ( count bytes)))
( println " <- * 502 Bad Gateway " ))
( println " Upstream error " e)))
Java virtual threads Besides using Java 21 or higher we just need to change client-handler
from a future
to Thread/startVirtualThread
:
( Thread/startVirtualThread
( with-open [in ( DataInputStream. ( .getInputStream client-socket))
out ( DataOutputStream. ( .getOutputStream client-socket))]
( let [input-bytes ( byte-array 4096 )
res ( .read in input-bytes)]
-1 ( println " End of stream reached " )
0 ( println " No data received " )
( println ( format " -> * %dB " res))
( with-open [upstream-socket ( Socket. ( :hostname upstream-server) ( :port upstream-server))
upstream-in ( DataInputStream. ( .getInputStream upstream-socket))
upstream-out ( DataOutputStream. ( .getOutputStream upstream-socket))]
( println ( format " Connected to %s:%d " ( -> upstream-socket .getInetAddress .getHostAddress) ( .getPort upstream-socket)))
( .write upstream-out input-bytes 0 res)
( println ( format " * -> %dB " res))
( let [return-bytes ( byte-array 4096 )]
( let [upstream-res ( .read upstream-in return-bytes)]
-1 ( println " End of upstream stream reached " )
0 ( println " No data from upstream received " )
( println ( format " * <- %dB " upstream-res))
( .write out return-bytes 0 upstream-res)
( println ( format " <- * %dB " upstream-res))
( when ( = upstream-res 4096 )
( catch ConnectException _e
( let [bytes ( .getBytes " HTTP/1.1 502 Bad Gateway \r\n\r\n " )]
( .write out bytes 0 ( count bytes)))
( println " <- * 502 Bad Gateway " ))
( println " Upstream error " e)))))))
( println " Client error: " ( .getMessage e)))
( println " Client disconnected " ))))))
Persistent connections When opening in the browser it sends the Connection header as keep-alive , but instead we’re closing the connection on each request. The right thing to do is to close the connections on HTTP/1.0 unless the header is keep-alive and keep it open on HTTP/1.1 unless stated otherwise.
" Parses a sequence of bytes into a map representing an HTTP request "
( let [lines ( ->> ( String. b)
[method path version] ( str/split ( first lines) #" " )
headers ( ->> ( rest lines)
( map #( str/split % #": " ))
( filter #( = 2 ( count %)))
{ :method ( -> method str/lower-case keyword)
( Thread/startVirtualThread
( with-open [in ( DataInputStream. ( .getInputStream client-socket))
out ( DataOutputStream. ( .getOutputStream client-socket))
upstream-socket ( Socket. ( :hostname upstream-server) ( :port upstream-server))
upstream-in ( DataInputStream. ( .getInputStream upstream-socket))
upstream-out ( DataOutputStream. ( .getOutputStream upstream-socket))]
( let [input-bytes ( byte-array 4096 )
res ( .read in input-bytes)]
-1 ( println " End of stream reached " )
0 ( println " No data received " )
( let [parsed ( parse-http-request input-bytes)]
( println ( format " -> * %dB " res))
( println ( format " Connected to %s:%d " ( -> upstream-socket .getInetAddress .getHostAddress) ( .getPort upstream-socket)))
( .write upstream-out input-bytes 0 res)
( println ( format " * -> %dB " res))
( let [return-bytes ( byte-array 4096 )]
( let [upstream-res ( .read upstream-in return-bytes)]
-1 ( println " End of upstream stream reached " )
0 ( println " No data from upstream received " )
( println ( format " * <- %dB " upstream-res))
( .write out return-bytes 0 upstream-res)
( println ( format " <- * %dB " upstream-res))
( when ( = upstream-res 4096 )
( catch ConnectException _e
( let [bytes ( .getBytes " HTTP/1.1 502 Bad Gateway \r\n\r\n " )]
( .write out bytes 0 ( count bytes)))
( println " <- * 502 Bad Gateway " ))
( println " Upstream error " e)))
( if ( or ( and ( = ( :version parsed) " HTTP/1.0 " )
( not= " keep-alive " ( get-in parsed [ :headers " Connection " ])))
( = " close " ( get-in parsed [ :headers " Connection " ])))
( println " Upstream closed by client " ))
( println " Keeping connection alive " )
( println " Client error: " ( .getMessage e)))
( println " Client disconnected " ))))))
with all that we can see that the logs are quite different:
New connection from 127.0.0.1:8080
Connected to 127.0.0.1:3000
Connected to 127.0.0.1:3000
Connected to 127.0.0.1:3000
I’m appreciating much more how there are time tested software that does all the above and handles the spec (and more!) correctly.