Divert sockets

Hi all,

I went through some of the early discussions on the topic of divert sockets. I also saw few sample code such as http://www.loudhush.ro/files/divert.m

My intention is to do something similar to what is described above, set a rule such that I get only TCP packets, that too only HTTP. Is it possible? The rule suggested is
Code:
00001 divert 8999 tcp from any 80 to any out

I assume the rule says divert any packet coming from my system to any website to be diverted to port number 8999 where my application will be listening

This rule should ensure that I receive only TCP packets. My main interest is HTTP payload. I want to log all the HTTP headers going out from my browser to internet, modify it if needed and reinject it back. Since I am dealing with only outbound traffic from my laptop to the internet, using "out" in the rule is appropriate. Please correct me if there is some misunderstanding here.

The question I have is what is the best way to handle the traffic going out? I mean, assume I restart a browser with multiple tabs. Once the browser comes up, at least 20 to 30 odd connections (HTTP) to the internet are going to take place. Obviously each of these sessions will have a unique source port number, so in my code, the recvfrom() will get all these 20 to 30 connections while running in a loop.

Is it smart to spawn off a thread for each individual connection to check for a HTTP header and modify and reinject to the same port and exit? Or a better design will be to maintain some hash table for all the connections and use the same thread to process the modify and write? I don't see how select() can help here.

Another aspect of this design is if I am not filtering on port 80, I will get all TCP packets (including SYN, SYN-ACK etc) for which I need to quickly reinject back without modification. I am only interested in TCP with HTTP payload. Can you suggest me the best approach?

Regards,
Varun
 
blazerguns said:
This rule should ensure that I receive only TCP packets. My main interest is HTTP payload. I want to log all the HTTP headers going out from my browser to internet, modify it if needed and reinject it back.

To modify/drop TCP payload then reinject, is complicated to do correctly. But the fact that you only need to modify the outgoing HTTP traffic makes it bit more easy as usually GET/HEAD and simple POST requests will almost always be sent in a single packet so implementing fragmentation handling is not even necessary.

blazerguns said:
The question I have is what is the best way to handle the traffic going out? I mean, assume I restart a browser with multiple tabs. Once the browser comes up, at least 20 to 30 odd connections (HTTP) to the internet are going to take place. Obviously each of these sessions will have a unique source port number, so in my code, the recvfrom() will get all these 20 to 30 connections while running in a loop.

Keep in mind that HTTP 1.1 uses 'Keep Alive' connections so most requests will be streamed to shared source ports.

blazerguns said:
Is it smart to spawn off a thread for each individual connection to check for a HTTP header and modify and reinject to the same port and exit? Or a better design will be to maintain some hash table for all the connections and use the same thread to process the modify and write? I don't see how select() can help here.

That would depend on your hardware. If you have only 1 or just 2 CPU cores then single threaded asynchronous handling will be fastest. If you have 4 or more cores combination of multithreading and asynchronous handling is fastest. For 4 cores I would go with 2 threads and shared async pool. Its also a matter how much time you want to invest into developing this.
 
expl said:
To modify/drop TCP payload then reinject, is complicated to do correctly. But the fact that you only need to modify the outgoing HTTP traffic makes it bit more easy as usually GET/HEAD and simple POST requests will almost always be sent in a single packet so implementing fragmentation handling is not even necessary.

I understand, so most GET/HEAD/POST method will be a single packet. Ok, so I dont have to bother about fragmentation.

Keep in mind that HTTP 1.1 uses 'Keep Alive' connections so most requests will be streamed to shared source ports.

If so, then its possible that the packets might all come from the same source port to different destination address. So if I were to maintain a hash table, it would make sense to create a key based on source port, destination address? May be I can have linked list of packets hanging out form the hash table. Would that work?

That would depend on your hardware. If you have only 1 or just 2 CPU cores then single threaded asynchronous handling will be fastest. If you have 4 or more cores combination of multithreading and asynchronous handling is fastest. For 4 cores I would go with 2 threads and shared async pool. Its also a matter how much time you want to invest into developing this.

I guess I will be having multi core CPUs, so when you say have 2 threads and shared async pool you mean have a thread to read all the HTTP packets and put in the hash table. Another thread to just read the hash table do the processing on HTTP data and reinject back. Is my understanding correct? Thanks for you help.

Regards,
Varun
 
blazerguns said:
I guess I will be having multi core CPUs, so when you say have 2 threads and shared async pool you mean have a thread to read all the HTTP packets and put in the hash table. Another thread to just read the hash table do the processing on HTTP data and reinject back. Is my understanding correct? Thanks for you help.

Regards,
Varun

There are few ways to do it, for single browser I dont think its going to make a big difference so you might as well go with a single threaded solution. You need to go multithreading once you go into hundreds of simultaneous connections.

If you still want to go with a multithreaded solution, you can either have a single master thread that would distribute incoming packets to 'worker' threads or just have worker threads that would read same divert socket in a queue.
 
Back
Top