Philtron: a PHP Proxy for HTTP

~
Home
Installation
Download
Project history
Project @ sf.net
Fora
Links

~
(Here be the sourceforge logo)

This is an abridged (spam and irrelevant messages removed) version of a thread at the Searchlores' phplab forum. Here you can see the initial ideas and inspirations behind this project, the early attempts and design decisions.
www.searchlores.org
phplab's forum
Original thread

phproxomitron (18/01/04 17:39:32)

nanoweb

loki

Re: phproxomitron (18/01/04 19:33:34)

The config could be done with a web interface, through an easy to understand interface (proxo was a bit confusing imho).

cinix

Re: Re: phproxomitron (19/01/04 01:46:22)

Funny you mention this, I started playing with the same idea yesterday:

I started from the idea like anonyweb, a webbased proxy filter you can configure

Loki

Re: Re: Re: phproxomitron (19/01/04 10:54:16)
I must say, I havn't tought of it much yet. I planned to have most if not all features of proxomitron and then see what else could happen. about anteriority: you don't have to wait for me. I'm currently involved in a "scrolls" project. It seems that you have it planned out a bit better. don't hesitate and start (maybe with a good concept outline). and tell me where I could be of help. cinix

I'd like to have your opinion... sorry I can't resist :-). (19/01/04 23:55:45)

have

my opinion (20/01/04 13:32:18)

3.) Two way to use the proxomitron web-filters: converting before use to something chewable ( like "s/match/replace/" ) which is a quick work ( I have something like this ), or write an inner parser. It would be interest measuring this parser in work - like a proxy, maybe serving more client.

5.) Normal places fish your automated-looking query ( 10000/s ), and blocking the "attacker". If it is going to be a project for a public, it needs some resource-saving part against abusing.

6.) Is there a chance to make it more powerful? I mean some smarter javascript-code parsing instead of just filtering it. ( There is a javascript-interpreter written in Java ). I don't know if it is worth it.

7.) More fun:
a.) Logging to file ( I love logs )! ( Then feed your web-path with 'walrus' :-) )

b.) Filter-lists
Semi-automatic updating of the block/bypass/and other lists. ( Here is an example: my proxy copy all the files to paths named after their origin. Now if I'm checking them for size the ones with one tiny gif are jumping out - they are candidates of the next blocklist. But they can be read from a logfile too. ) Also maybe we already know sites where they have blockists to download :-).

8.) More own url-commands ( beware: I put a button to every page with "bweb..'thispage'". If I click on this it's goin' out with the referer string. So the correct way to catch and filter them before they goes out ). Like one to check the url through google's "inurl", and an other to do the same in archiveweb.

ritz

Re: my opinion (20/01/04 23:36:19)

living update from its log

have

Re: Re: my opinion (21/01/04 12:57:25)

Q:if you wish i can post the source of the filter how I did it.
A:Yes, thank you.

Q:Semi-automatic updating of the block/bypass/and other lists.
A:I mean you know how a HTTP round looks:
Client -- please gimme that page! - http://here/itis.htm ( host:here )
Server -- ok here it is.
Client( parsing ) -- and please - http://here/nice.gif - too ( host:here )
Server -- ok here it is.
And now from the same page:
Client -- Hey gimme http://bloodyads/ugly.gif ( host:bloodyads ) <--- Now while we can catch them because they are "outer" links from "here" ( maybe Opera do this ), what I try to say we can catch them by "pattern" too ( aaaaBaaaa :-) ), so maybe with a button/link click we can ask this proxo-thingy to:
- parse its logfile
- fish the badguys
- write/append the founded strings to its blockfile ( even resort it ).
( -maybe reread the blockfile ). So the deal is the living update from its log.
But like you pointed out, maybe we already able to do this.

Q:Proxo already knows it.
A:Yes but we move to "platform-independency".

ritz

proxo logging (21/01/04 16:25:31)

[Blocklists]

List.SiteLog = "d:\projects\webmastertrapper\sitelog.txt"

SiteLog

In = FALSE

Out = TRUE

Key = "User-Agent: mastertrapper (out) "

Match = "*"

Replace = "$ADDLST(SiteLog,$DTM(mHc) : \u)Qspeedbot/0.$DTM(mHc)
		(compatible; http://qcrawl.4x2.net/botinfo.php?page=v$DTM(mHc) )"

Sitelog

xxxxxxxx http://whereever.you.was.surfing

Qspeedbot/0.xxxxxxxx (compatible; http://qcrawl.4x2.net/botinfo.php?page=xxxxxxxx )

ritz

proxo logging - Thanks, its my fault (21/01/04 22:47:15)

have

Re: proxo logging (23/01/04 13:11:03)
you mean proxy logging Phila

eh, no.. (23/01/04 14:14:31)
proxo is short for the Proxomitron, a program that runs as a local proxy and filters your http-traffic according to regexp-like user-defined filters. it already implements a lot of the functionality discussed in this thread. but proxo is not open-source, and its development is discontinued.. :( - ritz ritz

Re: phproxomitron (20/01/04 09:30:00)

Mordred

Re: Re: phproxomitron: some resources (20/01/04 11:07:27)
http://php.justinvincent.com/home/articles.php?articleID=15 http://sourceforge.net/projects/php-proxy/ I had these already local, not tested yet. cinix

Re: Re: Re: phproxomitron: some resources (20/01/04 15:04:54)
which article are you referring to for the first url, cinix? it seems to be invalid edd

http://php.justinvincent.com/home/articles.php?articleId=15 (n/t) (20/01/04 16:18:27)
Mordred

Reason for php pop3 proxy (20/01/04 20:14:35)
Someone asked about continious php running applications: I tought it could be of interest. The other: It is not because it is somewhat older that it is of no use here. cinix cinix

php proxies (20/01/04 15:17:55)

xcx

Re: PHProxies (20/01/04 17:11:59)

Mordred

Re: Re: PHProxies (27/01/04 15:23:20)

loki

Re: Re: Re: PHProxies (27/01/04 15:38:58)
http://lwest.free.fr/doc/php/lib/index.php3?page=net_http_client&lang=en "Net_HTTP_Client is an almost complete HTTP Client" humm. Maybe it will be possibile to build the phproxo on what i called 'web layer' (adding servives to a http client - how to run it locally and set the 'true' clients connect through it ?) loki

Re: Re: phproxomitron (27/01/04 15:13:10)

�

1. We shouldn't be *writing* a proxy, just add filtering capabilities! Research some good ready open-source php proxies. What are the diffrences in them? Which should we choose? If none is supperiour, and several are widely spread, consider writing a wrapper layer, so the code will work with any of the wrapped proxies.

�	2. Consider combining multiproxy and proxomitron capabilities. Should this be done on the server side? Or should the client manually chain the phproxomitron with a standalone multiproxy (or another proxy of his choice).

�	3. That reminds me - chaining capabilities are a must!

�	4. Umm, is php the right tool/language? Does anyone have experience with how constantly running php scripts behave? Are they memory or processor heavy?

loki

phproxomitron answers - and more... (28/01/04 03:32:11)

have

Re: phproxomitron answers - and more... (28/01/04 09:14:09)
hi i'm the author of the mod_proxy for nanoweb i'm also interested by this project :) franck

maybe (28/01/04 22:32:52)

ritz

Summary. ( about 9 bigger different task ) (29/01/04 02:26:22)

We can force Proxomitron to do a lot of things,
but we must to RE it, and program/hack on its native language ( C? ). I
don't want to. I don't feel any moral problem on this one - Scott left
his grown-up kid.

http://www.2113.ch/phplab/mbs.php3/mb001?thread=1075106564&num=1075306410

...I was hoping that I could find the post with the forum's search engine, but since it failed, I had to devise another method of reaching it...

...The only solution I found was brute-force: download the entire forum (the OpenSwf section of FlashKit took 100M :) and grep from the local machine.

The whole matter is nothing much really, just a demonstration of the idea that it's useful to download entire MBS-s. I had even started to design a script system for downloading known forums (i.e. any yahoo group, any phpbb, any vbulletin etc.), but later decided against it - it's not needed that often, and you can do it 'by hand' quickly enough (iirc, I used GetRight's browser, because of its great feature to sort links in the loaded html - by type, size, address, etc.).

I've posted a script once at the php board, that could download Laurent's mbs (i.e. this one, php, ebmb), but it can be replaced by a single wget command. Well, the script behaved better - it decided whether to download a thread by the date of the last post, while wget has to make a request to the server, but that has meaning only for incremental updates. If someone's interested - it's at the php, no updates since.

documentation





APPENDIX A. PROXIES



WE CAN SAY THAT A PROXY IS A WEB SERVER WHICH IS SERVES CONTENTS OF OTHER MACHINES.



FIGURE 1. RELAYING

    _________           _________             __________

   !         ! 1 QUERY !  PROXY  ! 2 QUERY   !  REMOTE  !

   ! CLIENT  !---------!WEBSERVER!-----------!  SERVER  !

   !_________!4 ANSWER !_________! 3 ANSWER  !__________!



( Original purpose - time saving. Good for us if remote - already implemented ! )



A CACHING PROXY SAVE THE CONTENT OF A REMOTE ANSWER, AND IF RE-QUERIED,
SERVE IT FROM ITS CACHE - SAVING BOTH TIME AND MONEY.



FIGURE 2. CACHING PROXY HAVE THE QUERIED DOCUMENT:



    _________           _________             __________

   !         ! 1 QUERY ! CACHING !   NOONE   !          !

   !  LOCAL  !---------!  PROXY  !-----------!  REMOTE  !

   !_________!2 ANSWER !_________! CARES NOW !__________!



(  Great time and bandwith saving. We need flushing/refreshing for this )



3. FILTERING PROXY



    _________             _________ 2.FILTERED  __________

   !         ! 1 QUERY   !FILTERING!   QUERY   !          !

   !  LOCAL  !-----------!  PROXY  !-----------!  REMOTE  !

   !_________!4. FILTERED!_________!3. ANSWER  !__________!

                ANSWER



Name = "RFC to local link"

Active = FALSE

URL = "*"

Bounds = "rfc*"

Limit = 256

Match = "( rfc[#0:5000])\1(.txt|.html|.pdf)\2"

Replace = "<img src="http://local.ptron/local.gif" width="45" height="16"
alt="we have a local copy of \1\2 (html)">"

have

Re: phproxomitron (28/01/04 20:17:34)
In order to support this so interesting project, i would be pleased to open a dedicated board (or pouche, if you think it's better) on phplab where people interested in this project -i see there are many- could freely talk about it. Let me know if you think this is a good idea. Laurent

a good idea... at least as memento mori (28/01/04 23:34:01)

ad hoc

memento mori

fravia+

Thank you - an idea/sketch (29/01/04 03:23:06)

TOP OF THE PAGE

what kind

have

pouche created (29/01/04 21:16:57)
here Laurent

Design proposal 0.1 (29/01/04 16:17:03)


----[ User ]-------------------[ Proxy ]----------------------[ Remote server ]----



1. Browser: HTTP request



			2. [SM] accept request

			3. [QM] Process request with output plugins

			4. [CM] issue modified request to remote server



							5. Modified HTTP request

							6. Answer



			7. [CM] accept answer from remote server

			8. [PM] Process answer with input plugins

			9. [SM] forward modified answer to client



10. Client's browser receives html/error code


//example:

class CRequestPlugin {

	//overload this to modify request headers

	function Process(&$sRequest) {

		return $sRequest;

	}

};



class CResponsePlugin {

	//overload this to modify response headers

	function ProcessHeaders(&$sResponseHeaders) {

		return $sResponseHeaders;

	}



	//overload this to modify response body

	function ProcessBody(&$sResponseBody) {

		return $sResponseBody;

	}

};

Mordred

Server module: Proof of concept (30/01/04 12:54:57)

Mordred

Re: Server module: real code candidate (30/01/04 16:44:25)
Rewrote this from scratch in a more as-god-intended-it-to-be manner. Again, test in your place, and this time check the code also are the class interfaces okay, is the name convention okay, are the right socket functions called with the right params? (lotta source here, removed) Mordred

real code candidate - dumb notes ( nanoweb proxy too ) (31/01/04 02:29:15)

have

etc. & nanoweb ;) (31/01/04 11:25:27)

Mordred

Mordred, this stuff simply screams... (31/01/04 14:16:36)

fravia+

ETC. & (nanoweb) ;-), libcurl? Cygwin? (31/01/04 14:25:54)

have

What do you plan about the non-blocking bug on win32? (n/t) (26/03/04 17:50:19)
Kriton

Re: What do you plan about the non-blocking bug on win32? (26/03/04 18:47:43)
What exactly do you mean? phproxy (or Philtron or whatever) uses a polling approach: Everything runs in a single thread, and from time to time the sockets are polled to see if they are ready for reading or writing, and the corresponding action is taken out. HTH, if not feel free to ask more :) Mordred

Re: Re: Server module: real code candidate (31/01/04 11:21:29)

Laurent

Some more code + answers (31/01/04 17:20:16)

have:

They

Laurent:

client

server

fetchers

client

Fravia+:

post mortem

Mordred

will work only for www.google.com (hardcoded ip) (n/t) (31/01/04 17:24:43)
forgot

Question (02/02/04 20:43:52)

Mordred

Re: Question (02/02/04 21:17:28)


function memory_use($st)

        {

            global $fLog;

            $st = "memory use at point $st ";

            $my_pid = getmypid();

            $st.="MEMORY USAGE (% KB PID ): ".`ps -eo%mem,rss,pid | grep $my_pid`;

            $st.= "\n";

            echo($st);

            fwrite($fLog, $st);



        }

Laurent

Re: Re: Question (03/02/04 13:15:30)


 73 #ifndef PHP_WIN32

 74 typedef int PHP_SOCKET;

 75 #else

 76 typedef SOCKET PHP_SOCKET;

 77 #endif

 78

 79 typedef struct {

 80         PHP_SOCKET bsd_socket;

 81         int             type;

 82         int             error;

 83 } php_socket;

Mordred

It lives :) (04/02/04 21:27:06)

Mordred