Browser bookmark files cleanup

I have a wild bunch of bookmark backups, ranging in format from json, html and even sqlite.

I'd like to text-search through all the files and pipe to a browser importable (ie firefox) text file all unique address links. I have NO concern for tags, time, or other metadata; only URL. Is something like
$ grep <start text> "http" OR "www" <end text> "whatever" > out.file
possible? Does grep work through places.sqlite files?

Other alternative, as advised previously in this forum something of the kind: $ sort -u file1 | file2
The problem here being that, sort command would not start differentiating from "http" or "www", so the result would be a mess. Therefore, solution might be a combination of (grep + sort) ?
 
Does grep work through places.sqlite files?

Not directly, obviously, but you can use the sqlite commandline tool, ie:

Code:
[/data/code/nordavind]% sqlite3 db/db.sqlite3 'select name from albums order by name limit 5' | uniq
"...Famous Last Words..."
"...in Death of Steve Sylvester"
'Allelujah! Don't Bend! Ascend!
'Tage Mahal
(II)

To discover the database scheme, one could use:
Code:
[/data/code/nordavind]% sqlite3 db/db.sqlite3 '.schema' 
CREATE TABLE artists (
                id integer primary key autoincrement,
                name text not null
        );
[...snip...]

grep <start text> "http" OR "www" <end text>

You can use an `or' in grep like so:
grep -E '^(http|www)'

The ^ anchors the searching to the start of the line (you may not want this?) And inside a group ((...)) you can use a pipe | as an `or' character. Note you need `extended' regular expressions for this (grep -E or egrep).

sort command would not start differentiating from "http" or "www", so the result would be a mess. Therefore, solution might be a combination of (grep + sort) ?

To make www.site and http.site behave equal, you could strip these prefixes with sed
grep -E '(www|http)' a | sed -E 's/^(www|http)//' | sort -u
 
Back
Top