TLDR

Using a stale conn caused consecutive requests to cowboy to respond with a status code 505. Check that you are always correctly handling your conn.

The problem

I have a quite simple API and some users were reporting that I was returning status code 505 in certain conditions.

Grep’ing phoenix.log for “505” showed no requests that matched that status code. All I was seeing were requests with status code 200.

A better report from one of my users made me realize that there were some requests that were indeed returning a different status than 200 but that were not hitting phoenix.

By default, phoenix only logs the requests that its handling. If there is a request that cowboy rejects prior to passing it to plug/phoenix, you won’t see it in the logs.

This is exactly what PhoenixCowboyLogging helps you with. It “adds logging of request acceptance and parse errors that happen at the cowboy level to your Phoenix app”. Exactly what I needed (and probably something you need to use too).

After adding it to my mix file and restarting phoenix I started seeing the errors in my log:

2017-04-06 19:06:35.141 request_id=88qkjej0 [info] Sent 200 in 42ms
2017-04-06 19:06:35.141 request_id=88qkjej0 [info] Completed POST <http://192.168.17.106/> with 200 OK
2017-04-06 19:06:35.255 request_id=88qkjej0 [error] It looks like cowboy failed to accept or parse a request. Unfortunately, cowboys are notoriously non-communicative so that is pretty much all we can tell you about what went wrong. Sorry.
2017-04-06 19:06:35.255 request_id=88qkjej0 [error] Completed UNKNOWN REQUEST (cowboy failure) with 505

Its worth remembering that those are 2 succesive requests. The first one returned 200, the second one 505. Notice they share the same request_id.

(BTW, if I waited more than 1 second between succesive request I didn’t get the 505 error.)

Worth mentioning that you can also get “Completed UNKNOWN REQUEST (cowboy failure) with 400”, depending on how you do the requests. I got the 505 when doing HTTP directly to phoenix and 400 if accesing it behind a TLS enabled load balancer.

The solution

After confirming the presence of the 505 status code it was easier to find a few other cases of people experiencing simila problems. Most of them were in erlang and they were using cowboy directly and I was using it via Phoenix.

But it was one comment in one related issue that gave me the hint I needed:

This may occur if you mix conn structs on Phoenix as well.

Browsing my code I found the culprit. I had the following code:

{:ok, body, _} = read_body(conn)
assign(conn, :sptxn, ServiceProviderTxn.parse_xml(body))
|> assign(:original_body, body)

I don’t know why, but after calling read_body/1 I was ignoring the mutated conn, whiche gets used later.

Changing that to:

{:ok, body, conn} = read_body(conn)

fixed the problem.

Thanks

Many thanks to benwilson512 and OliverMT that helped me with this over IRC.