Having no luck with firefox dev team

john_shadow · Jan 9, 2010

Hello friend,
I have posted a question on the google groups but nobody answers.
Here is the question.

Now I want to modify firefox a bit to use it like an automated client. I want to modify it so if a new version is launched I can run an automated script on the source and it modifies it again, that is why I am looking for functions which are persistent, like an API.

I will copy paste from the google group thread:

"I have not looked yet at the source code so don't flame me on this.

I have to do this:
1. Request a page and have firefox process it as normal. Firefox
should have processed all the scripts on it and if it had any remote
scripts to get, then get them and process them.
2. I do not need to view the page but I need to save it to a file.

The functions should be as highly abstract as possible, something like
an API so that my program will resist through the new releses of
firefox. Also I want to be able to remove all the junk that is not
necessary for my program without much hassle. That means not to have
to dig to much in the inner layers to remove code, but this is
optional.

I will take a look at the source sometime soon and will come back to
add to this thread on my findings.

Also I want to be able to access all the variables that a script in a
web page can access. I want to access the cookies. I want to modify
the strings that firefox send when it does a GET HTTP.

Thank you for your time. "

I have to say sorry I still had no time to look at the source, considered it of low priority, but the time to modify it draws close. So I am looking for an experienced coder to give some insight, hints.

Thank you.

john_shadow · Jan 9, 2010

I have managed to bypass 90% of my problem by using selenium.
But now I have new problems:
1. I can not modify the request header with selenium.
2. The same with the proxy setting.

I think I can bypass this using firefox extensions. But I do have to add a little more functionality to them.

ta0kira · Jan 10, 2010

It sounds like you need to take the source, draw a line through it (a very jagged one,) and separate it into a shared library and a GUI front-end. Functionally it should be identical to Firefox; however, this will allow you to discard the traditional UI and replace it with the non-interactive interface you have in mind. In my opinion, every program should be designed like that, anyway. Easier said than done, though. I think the first step would be to squeeze a shared library into the Firefox source, then pull things into it incrementally (at each stage making sure it can be build and run like a real browser) until the GUI is just a superfluous shell linking to e.g. libfirefox.so.
Kevin Barry

john_shadow · Jan 10, 2010

It seems mozilla aplications are structured like this:
They are written as a set of XPCOM Components that provide functionality. Over those components are a bunch of XUL, Javascript and DTD that provide the UI( user interface ).

The XPCOM are interfaced through IDL files that describe what the component does.

Now I change the components to add some functionality that I require. Like changeing the request header, for each new connection. And using a proxy from a list.

The UI will be controlled through selenium.

The next step will be to understand diffrences between firefox DOM and other browsers DOM if there are any. As I have seen in the Internet explorer request header it uses Mozzila platform, so maybe there are no diffrences if I am lucky.

larrypatrickmaloney · Feb 2, 2010

Fetch?

You say you need to let the page process, but do you really?

If you just need the files, why not use fetch?

I guess unless you really need to page to process whatever script it's executing.

Larry

SirDice · Feb 2, 2010

What are you trying to accomplish?

Have a look at www/libwww and the corresponding perl modules. I'm quite sure it can do everything you want.

john_shadow · Feb 6, 2010

Well I will leave a bit more of my mind here.
I needed firefox to fool some scripts into believing I am a real person.

Well the story goes like this.
I want to build the next generation of ad frauder.
(Just for study or for the hell of it.)

It is composed of 2 parts:
A)One centralized tool for proxy discovery and testing.
I have seen many tools released with these functions but none to my likeing. My tool will be coded in c++. Using the standards, templates and whatever to make it very clean, portable, and updatable.

I already build the scripts to test my theories about proxys:
1.Asynchronous server coded in python(twisted) with command line user interaction that tests the safeness of a proxy.
-It stays on port 80 and waits for full requests header from an ip.
-Analyze the request. Sees if it is safe(eg: no X-FOWARDED-FOR or other stuff).
-If there is something alien in the request header it puts it in a queue and let it be solved by me.
Of course the script is coded with threading and does not wait for me.
...Much more(brands the proxy, organizez all stuff in files, makes reports, dynamically fixes problemes etc.)

2.Simple proxy checker. This script is based on the consumer-producer problem. It is coded in python(httplib2 and socks)
- It reads from a file proxys. Puts the proxys in queue.
- Has some threads which get from the queue and test the proxys by connecting through them to my asynchronous server.

3.Proxy extractor coded in perl.
-It extracts proxys from known sites.
-It extracts proxys from progresive links /proxy1, /proxy2...
-It extracts proxys from a page by using a tree like structure. Where the root is the main page. The real depth is the number of pages from the root on which I assert proxys are.
A force depth < real depth, which tells the script to go a number of pages more in the tree. This method does not leave the site. It checks so it does not go in loops. And much more.

B)The main program that requests the commercial like a real person.
My first (good) idea for the program was this:
Modify firefox to make it connect to a site by using a proxy given by local/remote server. The server is something I will code to sort the proxys by probability of usage right now. Example if the proxy is in china I would very much like it to be used when chinese people surf web.

What is the reason for modifying firfox? I need a w3c compatible DOM and a good javascript engine. Firefox has a w3c DOM and has spidermonkey.

Why do I need those? Because ad companies use scripts to verify you are a real person, your browser and much, much more... I need to build some patches or/and add some components
to firefox to make it do what I want.

I solved the problems by understanding how tor button works and picked up some browser anonimity stuff as I went on, implemented a new protocol etc.. I planned on using selenium rc which is a testing tool for web pages, I was using it from java. The thing with that tool is that it does not support working on multiple tabs in firefox. And even if I modify it, I have no ideea how to emulate the mouse interaction parallel to all tabs. So this brings me to my second solution which I think I will implement after I code the proxy tool in c++.

The second solution: I will use gecko, add some components to do the server comunication stuff, also I will modify it to hook every function that interacts with the DOM so I can feed my own data to the scripts. As for the interaction I have to think about it, probably I will be using perl to communicate with a component in gecko.
Of course I will have to analyze the scripts beforehand.

The tools will be all released when finished without any god-damn license and I don't give a damn. I can give my scripts if anyone wants them and so on.

I eagerly await your opinions.
Also I want to know if anyone has any experiance with webkit or chrome/v8(though I don't like google

).

john_shadow · Feb 6, 2010

There are some other alternatives:
Perl interface to javascript.
XML DOM.
libwww.

the B) part will take lots of research.
I think I will try testing from perl before going to more complex paths like gecko.

Having no luck with firefox dev team

Administrator