Too-smart search engines

wblock@

Developer
Is there a web search engine that will not filter what I enter and actually let me search for things with backslashes in them?

For example, I see some suspicious stuff in the Apache logs, a request for a "file" that is pretty obviously some type of exploit starting with "\xd8\x9b".

Google filters out the backslash. Always. With quotes or +, too.
 
I do not know what filtering you are referring to, but google does not filter backslashes.
And combination of these two bytes \xd8\x9b makes unicode symbol for ARABIC SEMICOLON.
 
wblock@ said:
For example, I see some suspicious stuff in the Apache logs, a request for a "file" that is pretty obviously some type of exploit starting with "\xd8\x9b".

Did you try to embrace your search in double quotes?

For example I entered "\xd8\x9b" including the quotes into the search field of Safari (Mac), and this created the following google URL:

http://www.google.com/search?client=safari&rls=en&q=%22%5Cxd8%5Cx9b%22&ie=UTF-8&oe=UTF-8

Guess what, Google found this thread as the first hit.

Anyway, also double-quoting the search string does not always work for me as I intended to have it work.

Best regards

Rolf
 
expl said:
I do not know what filtering you are referring to, but google does not filter backslashes.

Try searching on mod_security \x. The backslash is ignored, whether in quotes or preceded by a + or both. As rolfheinrich points out, using a longer string helps. Look at the bolded entries in the results; the backslash is still being filtered out before searching.

And combination of these two bytes \xd8\x9b makes unicode symbol for ARABIC SEMICOLON.

The complete request string is
Code:
"HEAD /]\xd8\x9b\x1b\xd8\xda\xcb\xd9\x1b\xd8\xdc\xcb\xdc\x19\x19\x8b\xdc\x1e\x19K\x9c\x19\x19 HTTP/1.1"
 
wblock@ said:
... The backslash is ignored, whether in quotes or preceded by a + or both ...

Sorry if I am too ignorant to see your point, however, the Google search engine does exactly what I would expect it does on backslaches, it URL encodes it. So, if I enter a single backslash into the search field of http://www.google.com, it produces the following URL:

Code:
http://www.google.com/search?client=safari&rls=en&q=%5C&ie=UTF-8&oe=UTF-8

%5C is the URL encoded backslash (ASCII 0x5C), and as a matter of fact, the first hit of the search is the wikipedia page explaining BACKSLASH.

So the backslash is not filtered out but URL encoded, IMHO, this is simply how it works.

Best regards

Rolf
 
rolfheinrich said:
Sorry if I am too ignorant to see your point, however, the Google search engine does exactly what I would expect it does on backslaches, it URL encodes it. So, if I enter a single backslash into the search field of http://www.google.com, it produces the following URL:

Code:
http://www.google.com/search?client=safari&rls=en&q=%5C&ie=UTF-8&oe=UTF-8

%5C is the URL encoded backslash (ASCII 0x5C), and as a matter of fact, the first hit of the search is the wikipedia page explaining BACKSLASH.

So the backslash is not filtered out but URL encoded, IMHO, this is simply how it works.

This also looks to be context-sensitive. I can't get a search for "\x" to search for those literal characters. The backslash is in the URL (although not encoded here), but it's not in the results, which are identical to just searching on "x". So rather than saying the backslash is filtered out, it just isn't included in the comparison.

Bing does that also, although it's not as smart about context (or anything, really) and treats a single backslash as nothing. (Like Silverlight, I suspect Bing is just a pretend implementation so somebody can say "See, we've got that too!")
 
I tried searching for the HEAD... string
Let's see: ask.com, dogpile, kvasir.no (Google-powered) metacrawler search.com and WebCrawler shows this thread, Alexa, altavista, gigablast, Lycos and search.yahoo.com doesn't find a thing, ixquick filters away anything after head...
The strangest result came from WolframAlpha.
What others?
 
fonz said:
Looks promising! It's perfectly possible to stay away from Google in many ways, but web searching is a toughie.

Fonz

DDG is usually great, but 1 search in 50-150 doesn't produce useful results. I resort to Bing then and sometimes to Google too. Simply, Google database is the biggest around and DDG is near the other side of spectrum. So I think that it's hard to really leave Google search engine, but reduction of traffic to a fraction of % is OK for me.
 
DuckDuckGo works for >90% of the searches in English language for me.

Searching at DDG in my native language (Spanish) is (sadly) a joke.
 
For those of you who need the google but do not want to give them a finger print and iris scan each time you look at the screen (well, a print from a different part of the anatomy would ... never mind) should look at scroogle. They are a proxy for google, do not keep access logs (their claim), do not set cookies (have not seen them trying), and do not serve ads as a side order.
 
Too bad that you can't create an url for particular search in scroogle.
Right now when I want to use google, I type "g search_term1 search_term2", which my browser translated to "www.google.com/search?q=search_term1+search_term2" or something like this. I wanted to change the meaning of g to use scroogle instead, but it doesn't seem possible. :(
 
draco003 said:
Seem its search results are dated? I search FreeBSD there, and see this:
vYmZhcQ

We're living in 2007 now? :O
 
Back
Top