Hello community!
I have been venturing in online dating and I'd like to optimise the process of finding a match since the datesite is not providing the function of searching by keyword.
To put the time of eyeballing profile to better use I thought I could get ftp/wget to do the work for me.
So the plan is to have ftp/wget to use session cookie and be able to scrape the profiles being logged in with my ID and have a script going through the files deleting the scraped pages with non-matching keyword.
Steps taken:
1. Get the cookies for the login session of my ID
At first I tried logging in and saving cookies with wget but the result was unsuccessful (the passed parameters of --user=foo and --password=bar / as well as --post-data 'user=foo&password=bar' may not work with this website).
I decided to have Firefox store the cookies for me and checking their viability with the addon "Cookie Manager". To be extra sure I used another addon to export the cookies in the wanted Netscape format, hence:
wget(1) - "Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape's cookies.txt file."
2. Use the cookie file to scrape my profile as an example, two ways
First attempt I decide to use the method of passing parameters --no-cookies --header (because other methods have failed on me before):
Second attempt I use this approach:
The index.html.1 is a page stating: "You must be logged in to access this page" so this method didn't work either.
Conclusion and new steps to be made
I notice I get the server response 302 from happypancake.com initiating a redirection process. I should get redirected to my profile if the cookie is adding up correctly. Not fully sure what attempt has the most reasonable outcome and to go further I have to pin point where the problem is.
Questions
1. Why isn't it working using the name and value of the session cookie in the two attempts?
2. Can I in a better way make ftp/wget adjust to the redirection and how can I with Firefox check the sent requests and responses of redirecting to my profile?
3. Are the wget parameters correct in this case?
Thanks!
- - - michael_hackson
I have been venturing in online dating and I'd like to optimise the process of finding a match since the datesite is not providing the function of searching by keyword.
To put the time of eyeballing profile to better use I thought I could get ftp/wget to do the work for me.
So the plan is to have ftp/wget to use session cookie and be able to scrape the profiles being logged in with my ID and have a script going through the files deleting the scraped pages with non-matching keyword.
Steps taken:
1. Get the cookies for the login session of my ID
At first I tried logging in and saving cookies with wget but the result was unsuccessful (the passed parameters of --user=foo and --password=bar / as well as --post-data 'user=foo&password=bar' may not work with this website).
I decided to have Firefox store the cookies for me and checking their viability with the addon "Cookie Manager". To be extra sure I used another addon to export the cookies in the wanted Netscape format, hence:
wget(1) - "Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape's cookies.txt file."
2. Use the cookie file to scrape my profile as an example, two ways
First attempt I decide to use the method of passing parameters --no-cookies --header (because other methods have failed on me before):
Code:
wget -vk -e robots=off --user-agent="Mozilla/5.0 (X11; Linux
x86_64; rv:60.0) Gecko/20100101 Firefox/60.0" --no-cookies --header "Cookie: <cookienamehere>=<cookievaluehere>" https://www.happypancake.com/min-sida/
--2018-07-09 17:31:04-- https://www.happypancake.com/min-sida/
Resolving www.happypancake.com (www.happypancake.com)... 104.24.14.10, 104.24.1
5.10, 2400:cb00:2048:1::6818:f0a, ...
Connecting to www.happypancake.com (www.happypancake.com)|104.24.14.10|:443...
connected.
HTTP request sent, awaiting response... 302 Found
Location: /Error404 [following]
--2018-07-09 17:31:05-- https://www.happypancake.com/Error404
Reusing existing connection to www.happypancake.com:443.
HTTP request sent, awaiting response... 302 Found
Location: /Error404 [following]
--2018-07-09 17:31:05-- https://www.happypancake.com/Error404
Reusing existing connection to www.happypancake.com:443.
HTTP request sent, awaiting response... 302 Found
Location: /Error404 [following]
repeating...
Second attempt I use this approach:
Code:
wget -vk -e robots=off --load-cookies allcookies_netscape.txt
--user-agent "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/
60.0" https://www.happypancake.com/min-sida/
--2018-07-09 17:24:03-- https://www.happypancake.com/min-sida/
Resolving www.happypancake.com (www.happypancake.com)... 104.24.15.10, 104.24.1
4.10, 2400:cb00:2048:1::6818:e0a, ...
Connecting to www.happypancake.com (www.happypancake.com)|104.24.15.10|:443...
connected.
HTTP request sent, awaiting response... 302 Found
Location: /login/ [following]
--2018-07-09 17:24:03-- https://www.happypancake.com/login/
Reusing existing connection to www.happypancake.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html.1’
index.html.1 [ <=> ] 38.31K --.-KB/s in 0.08s
2018-07-09 17:24:03 (506 KB/s) - ‘index.html.1’ saved [39229]
Converting links in index.html.1... 5-50
Converted links in 1 files in 0.002 seconds.
The index.html.1 is a page stating: "You must be logged in to access this page" so this method didn't work either.
Conclusion and new steps to be made
I notice I get the server response 302 from happypancake.com initiating a redirection process. I should get redirected to my profile if the cookie is adding up correctly. Not fully sure what attempt has the most reasonable outcome and to go further I have to pin point where the problem is.
Questions
1. Why isn't it working using the name and value of the session cookie in the two attempts?
2. Can I in a better way make ftp/wget adjust to the redirection and how can I with Firefox check the sent requests and responses of redirecting to my profile?
3. Are the wget parameters correct in this case?
Thanks!
- - - michael_hackson