HTTP Anti-Virus Proxy
http://havp.hege.li/forum/

Problem with "Transfer-Encoding: chunked"
http://havp.hege.li/forum/viewtopic.php?f=3&t=217
Page 1 of 1

Author:  pk [ 13 Mar 2007 18:30 ]
Post subject:  Problem with "Transfer-Encoding: chunked"

HAVP does not allow the "Transfer-Encoding" header in replies from a web server when HAVP is operating in HTTP/1.0 mode (which I think is all it ever does), and all you get is a HAVP error page.

This may be proper from the point of view of strict HTTP, but it seems that some web servers don't care. In particular, the blog servers at ZDnet.com like to give back "Transfer-Encoding: chunked" headers, so using HAVP makes unavailable a lot of items of interest to software people like me.

To get around this (temporarily, at least) I have modified the "ConnectionToHTTP::AnalyseHeaderLine" method to not disallow this header, and in fact let it go through the (unmodified) "ConnectionToHTTP::PrepareHeaderForBrowser" all the way to the browser.

I don't understand all the implications of "chunked" -- do multiple chunks need multiple HTTP GETs? -- but I gather it allows a Web page to be fragmented at the HTTP layer (just like oversize data may be fragmented at the IP layer). Presumably this means that HAVP wouldn't see the entire page at once, and therefore might not spot a virus that was split over two chunks.

So, is letting "Transfer-Encoding: chunked" through likely to cause problems, or should I strip it out like "ConnectionToHTTP::PrepareHeaderForBrowser" removes the "Keep-Alive" header? (I actually worry more about protocol state-machine confusion than viruses.)

How hard would it be for HAVP to do the "right thing" as suggested by section 19.4.6 of RFC2616 (see for example www.w3.org/Protocols/rfc2616/rfc2616-se ... #sec19.4.6)? Would this introduce unbounded latency in retrieving Web pages?

Author:  hege [ 13 Mar 2007 20:05 ]
Post subject: 

Chunked is really just a different transfer encoding, nothing to do with multiple gets. You can google for details..

There are some fixes, in order of preference:

1) Use Squid 2.6-STABLE10, it has chunked support. Than HAVP will never see it.

2) Let HAVP strip Accept-Encoding header from clients, but that will result in bandwidth wasted (servers don't send gzip compressed data then). I guess it could be made into config option. But using Squid is a must in my opinion anyhow.

Cheers,
Henrik

Author:  pk [ 16 Mar 2007 01:08 ]
Post subject:  Problem with "Transfer-Encoding: chunked"

hege wrote:
Chunked is really just a different transfer encoding, nothing to do with multiple gets. You can google for details..

I read section 3.6.1 of RFC 2616 again: it appears that "chunked" simply allows an HTTP response to have multiple segments with their own length fields, making it easier to assemble separate streams into a single Web page/document. This would probably be easy to add to HAVP, but I wouldn't much like forking my own branch of the HAVP source to do so.

hege wrote:
There are some fixes, in order of preference:

1) Use Squid 2.6-STABLE10, it has chunked support. Than HAVP will never see it.

I thought of using Squid, but decided against it, as my SOHO LAN is too small to really need caching, and the fewer services to secure and support, the better.

hege wrote:
2) Let HAVP strip Accept-Encoding header from clients, but that will result in bandwidth wasted (servers don't send gzip compressed data then). I guess it could be made into config option. But using Squid is a must in my opinion anyhow.

Cheers,
Henrik

I already use Privoxy between the browser/clients and HAVP, and Privoxy strips the "Accept-Encoding" headers (since it wants to filter the data, and isn't itself willing to do any gunzipping). Unfortunately, that doesn't seem to dissuade the zdnet.com blog server from using chunking.

Author:  hege [ 16 Mar 2007 08:34 ]
Post subject:  Re: Problem with "Transfer-Encoding: chunked"

pk wrote:
I read section 3.6.1 of RFC 2616 again: it appears that "chunked" simply allows an HTTP response to have multiple segments with their own length fields, making it easier to assemble separate streams into a single Web page/document. This would probably be easy to add to HAVP, but I wouldn't much like forking my own branch of the HAVP source to do so.


There are no "separate" streams. It's the single page/file sent in sized chunks. If Content-Length is not known, transport would be unreliable otherwise.

Anyways, there's no need to make your own branch. If you code it, send a patch. At the moment I have other things to do than add a semi-complex HTTP/1.1 feature because of some very few broken sites. :)

Why not just add it in browsers no-proxy then..

Author:  pk [ 16 Mar 2007 19:32 ]
Post subject:  Re: Problem with "Transfer-Encoding: chunked"

hege wrote:
There are no "separate" streams. It's the single page/file sent in sized chunks. If Content-Length is not known, transport would be unreliable otherwise.

What I meant by "separate streams" is concatenating two or more dynamic data sources (e.g., servlet output) into one web page. If you couldn't use "chunked", then you'd have to buffer the sources in order to strip their individual "Content-Length" headers, add the lengths together and generate a single "Content-Length" header with the total length. That uses a lot of memory and adds latency. Or else it gets much more complicated if you use something like multiple simultaneous TCP connections to the data sources in order to read their "Content-Length" headers before their bodies (and that still adds some latency, since you have to wait for all the "Content-Length" headers).

hege wrote:
Anyways, there's no need to make your own branch. If you code it, send a patch. At the moment I have other things to do than add a semi-complex HTTP/1.1 feature because of some very few broken sites. :)

Which methods should I modify to do it most cleanly -- I don't understand HAVP's class structure nearly as well as you do. (It *is* annoying that dealing with the Internet always eventually requires dealing with software that doesn't follow standards.)

hege wrote:
Why not just add it in browsers no-proxy then..

If the site weren't proxyed, then it wouldn't have anti-virus/anti-phish protection. (The only sites I have bypassing HAVP are anti-virus download sites that expose EICAR etc.)

Author:  hege [ 16 Mar 2007 22:54 ]
Post subject: 

Actually now that I looked a bit into it and hacked up some quick ugly code, it seems to work. :) Next version will include very experimental support.. I'll give a test version link soon.

Author:  hege [ 17 Mar 2007 18:13 ]
Post subject: 

Try this, problem seems to be fixed. Didn't have many sites to test though..

http://havp.hege.li/download/havp-0.86pre.tar.gz

Cheers,
Henrik

Author:  hescominsoon [ 14 Apr 2007 15:40 ]
Post subject: 

how do you make havp operate in http 1.1 mode? Would that work around this issue?

Author:  hege [ 14 Apr 2007 16:57 ]
Post subject: 

hescominsoon wrote:
how do you make havp operate in http 1.1 mode? Would that work around this issue?


The issue is already work around. There is no benefit from full HTTP 1.1 that I can think of right now.

Author:  hescominsoon [ 15 Apr 2007 14:49 ]
Post subject: 

hege wrote:
hescominsoon wrote:
how do you make havp operate in http 1.1 mode? Would that work around this issue?


The issue is already work around. There is no benefit from full HTTP 1.1 that I can think of right now.

http v1.1 is the defacto standard across the web. The pipelining abilities and other features are good to have.

Author:  hege [ 15 Apr 2007 15:00 ]
Post subject: 

hescominsoon wrote:
hege wrote:
hescominsoon wrote:
how do you make havp operate in http 1.1 mode? Would that work around this issue?


The issue is already work around. There is no benefit from full HTTP 1.1 that I can think of right now.

http v1.1 is the defacto standard across the web. The pipelining abilities and other features are good to have.


It being "defacto standard" doesn't mean anything, HAVP doesn't benefit from it. In my own unofficial opinion, HAVP is just a Squid filter, not a standalone application (it does work for home users like that well enough). Thus it has no benefit from HTTP 1.1, since Squid handles all that.

Author:  hescominsoon [ 15 Apr 2007 15:05 ]
Post subject: 

hege wrote:
hescominsoon wrote:
hege wrote:
hescominsoon wrote:
how do you make havp operate in http 1.1 mode? Would that work around this issue?


The issue is already work around. There is no benefit from full HTTP 1.1 that I can think of right now.

http v1.1 is the defacto standard across the web. The pipelining abilities and other features are good to have.


It being "defacto standard" doesn't mean anything, HAVP doesn't benefit from it. In my own unofficial opinion, HAVP is just a Squid filter, not a standalone application (it does work for home users like that well enough). Thus it has no benefit from HTTP 1.1, since Squid handles all that.

if squid is in 1.1 and havp is in 1.0 doesn't that make squid have to jump down to 1.0 then?

Author:  hege [ 15 Apr 2007 15:37 ]
Post subject: 

hescominsoon wrote:
if squid is in 1.1 and havp is in 1.0 doesn't that make squid have to jump down to 1.0 then?


That's probably true until Squid is fully 1.1 compliant. Why don't you go ask them, why pipelining is not fully supported.. ;)

Then there is something like Polipo, which can upgrade 1.0 to 1.1. Haven't tried it though.

edit: I don't have any objections to someone patching HAVP to 1.1, but I really don't have any time to spend for something I don't find that rewarding.

Page 1 of 1 All times are UTC + 2 hours [ DST ]
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/