help with rsync/unrar script

I am trying to come up with a scripted solution to this:


I have a server which functions as a seedbox. I've got it working perfectly and i've set it to move finished downloads to a COMPLETE/ folder, that way i can easily rsync the dir from the remote server to my local box. This is all well and good but, now i have a new problem i'd like to solve.

If i use rsync, i have no problem RETRIEVING the files, but i have to leave the local copy of the downloaded files on my box until i decide to stop seeding them (because if i remove it locally, it will just download again as soon as the next rsync is run)

SOOO what i'm looking to do is this: somehow keep a history of files which i have already sync'd and only download the ones which do not match this history, while at the same time removing items from this history file when they no longer exist on the remote server. After this, i'd also like to be able to take these newly synced files, unrar them and delete everything except the rar'd copy without having to worry about it all downloading and unraring again.


If anyone has ANY idea of where a good place to start with this is, please let me know.

(i reccon the rsync issue should be tackled first but i'm kind of stuck)
 
This unpacking script is on a forums I visit. I've not used it myself so I can't vouch for it in anyway but hopefully it's useful.

Code:
descene — Mass RAR unpacker

[color="Red"]v1.1[/color] (2010-02-21) [url]https://pastee.org/2p6mz[/url]
 - Proper handling of *.partXX.rar files
 - Handle up to 1000 RAR parts (foo.r999)
 - Ignores are now case-insensitive

[color="Red"]v1.0[/color] (2010-02-21) [url]https://pastee.org/vr6ut[/url]
 - Initial version


This script unpacks RARed releases and also copies or moves plain movie files, or
directories containing them, to a destination directory.
[CODE]
THIS SCRIPT COMES WITH NO WARRANTY. IF IT EATS YOUR FILES, THAT IS ENTIRELY
YOUR PROBLEM.

EXAMPLES
   # copy to a destination (add "-m" to move)
   descene -d ~/Videos/Movies ~/bt/completed/Blockbuster.XviD.AC3-NoGroup/

   # unpack in-place and keep the archive files
   descene -d. Blockbuster.XviD.AC3-NoGroup

   # unpack in-place and remove the archive files
   descene -md. Blockbuster.XviD.AC3-NoGroup

NOTES
 - save the script as "descene" and make it executable (chmod a+x descene)
 - you need to have Python >= 2.5 installed (aptitude install python-minimal)
 - "unrar" has to be available on your path
 - should work on any *nix and Windows, but only tested on Ubuntu
 - on Linux, use "easy_install --prefix /usr/local trash-cli" when you want
   files to be trashed rather have them instantly deleted
 - use copy mode (no -m) until you feel comfortable with the options and
   things work like you wanted them to; also, use -v to see details of what
   happens


Usage: descene [options] <pathnames>...

Options:
 --version                             show program's version number and exit
 -h, --help                            show this help message and exit
 -q, --quiet                           omit informational logging
 -v, --verbose                         increase informational logging
 -n, --dry-run                         don't do anything, only show what would've been done
 --fix                                 edit this script
 -d DESTDIR, --destination=DESTDIR     destination directory, use '.' for in-place operation
 -m, --move                            move instead of copying
[/code]
 
rusty said:
This unpacking script is on a forums I visit. I've not used it myself so I can't vouch for it in anyway but hopefully it's useful.

Code:
descene — Mass RAR unpacker

[color="Red"]v1.1[/color] (2010-02-21) [url]https://pastee.org/2p6mz[/url]
 - Proper handling of *.partXX.rar files
 - Handle up to 1000 RAR parts (foo.r999)
 - Ignores are now case-insensitive

[color="Red"]v1.0[/color] (2010-02-21) [url]https://pastee.org/vr6ut[/url]
 - Initial version


This script unpacks RARed releases and also copies or moves plain movie files, or
directories containing them, to a destination directory.
[CODE]
THIS SCRIPT COMES WITH NO WARRANTY. IF IT EATS YOUR FILES, THAT IS ENTIRELY
YOUR PROBLEM.

EXAMPLES
   # copy to a destination (add "-m" to move)
   descene -d ~/Videos/Movies ~/bt/completed/Blockbuster.XviD.AC3-NoGroup/

   # unpack in-place and keep the archive files
   descene -d. Blockbuster.XviD.AC3-NoGroup

   # unpack in-place and remove the archive files
   descene -md. Blockbuster.XviD.AC3-NoGroup

NOTES
 - save the script as "descene" and make it executable (chmod a+x descene)
 - you need to have Python >= 2.5 installed (aptitude install python-minimal)
 - "unrar" has to be available on your path
 - should work on any *nix and Windows, but only tested on Ubuntu
 - on Linux, use "easy_install --prefix /usr/local trash-cli" when you want
   files to be trashed rather have them instantly deleted
 - use copy mode (no -m) until you feel comfortable with the options and
   things work like you wanted them to; also, use -v to see details of what
   happens


Usage: descene [options] <pathnames>...

Options:
 --version                             show program's version number and exit
 -h, --help                            show this help message and exit
 -q, --quiet                           omit informational logging
 -v, --verbose                         increase informational logging
 -n, --dry-run                         don't do anything, only show what would've been done
 --fix                                 edit this script
 -d DESTDIR, --destination=DESTDIR     destination directory, use '.' for in-place operation
 -m, --move                            move instead of copying
[/code]

that's cool, but i'm still stuck with the rsync issue.

my problem, is:

I want to rsync data from a remote server to a local server but i only want to copy the date one time. After it's been copied, i want it to ignore the same data if it rsyncs a second time because it won't be on the local machine anymore

currently, i can get rsync working but i can't figure out how to make it keep track of what it has already downloaded and avoid redownloading ti again, unless i leave all the data on the local machine untouched....but i don't want to keep 500 gb of data on my local machine just to keep from redownloading it over and over.
 
rsync compares the files you have, not the files you had ... That is simply not a function of rsync; it has no recollection, history, or been-there-done-that list. If you want something like that, you'll have to write some sort of history function.
 
DutchDaemon said:
rsync compares the files you have, not the files you had ... That is simply not a function of rsync; it has no recollection, history, or been-there-done-that list. If you want something like that, you'll have to write some sort of history function.

I know that, that's the problem.

It should be possible to script it with the filter/exclude rules
 
Maybe a combination of 'touch', 'find' and 'rsync' would work. Every time you rsync, touch a file on the source system (somewhere in /var/tmp, for example), and use 'find files in source directory that are newer than the previously touched file' to rsync those specific files over (and then touch that file again to update the timestamp). This would work just as well with scp, because you're not really synchronising anything anymore, just copying over 'files newer than a certain timestamp'.
 
actually, that's a good idea.

I never thought of using scp for this....i have no idea exactly how to make it work...but thanks for the idea


so you're saying i could do something like.....create a database using touch? so use find to get the new names, then scp them one at a time, use touch at the same time with a pipe to create a touched file of the same name...i get that part but how do i make it use those files to ignore them the next time...that's what i'm confused on...
 
You may be overcomplicating the problem. Basically, what DD suggests is something like this

Code:
#!/bin/sh
for f in `find SRC -newer .timestamp` 
do
scp $f DST
done
touch .timestamp
 
Or using a different approach, you could keep a history of files copied

Code:
#!/bin/sh
for f in `find SRC`
do
grep -qx $f .history || ( scp $f DST; echo $f  >> .history )
done
 
Yep, something like that. Just 'touch' a file right after copying over your files. It can be called anything, but .timestamp sounds functional. Now, all you do is tell 'find' to locate files that are newer than the .timestamp file, which should give you all the files that were created since .timestamp was last updated ... which was after you last copied files. The .timestamp file itself has no content, it's just an emtpy file with one basic function: it holds the timestamp of your last copy process to assist 'find' with finding newer files.
 
Using a history would be possible, but you're opening a can of worms with odd filenames, spaces, non-ASCII characters, etc. You may find it difficult to get a proper match without using additional special regexps, escaped strings, quoted variables, etc. I think using a timestamped reference file is more KISS.
 
no, the timestamp idea is perfect i think.

I will work on that....thanks so much.

edit:

ok, i have it working but in the oposite direction to what i'd like....

basically...when i rsync now, i run it locally TO the seedbox...this is prefered because it doesn't require me to punch a hole in my firewall.

the way i have it working now, i have to scp TO my local machine like this:

scp somefile wonslung@my.local.address:/somedir

i'd rather run it like this:

scp wonslung@my.remote.address:/somedir/somefile somedir/

know what i mean?

The problem is, i can't figure out how to do "find" over ssh correctly....

i've run other commands over ssh in this way before but this isnt' working the way i'd expect.


EDIT2:

damn, i feel dumb....i figured it out.

Code:
ssh wonslung@seedbox.example.com 'find /home/wonslung/test/ -newer /home/wonslung/test/.timestamp'
 
ok, i got it working but i'm a little confused as to how to get the variables to work the way i want

anyways, this is what works:

Code:
#! /bin/sh
RMT="wonslung@seedbox.example.com"
SRC="/home/wonslung/test/"
DST="/export/home/wonslung/test/"
TMST="'$SRC'/.timestamp"

for f in `ssh $RMT 'find /home/wonslung/test/ -newer /home/wonslung/test/.timestamp`
do
scp -r "$RMT":$f $DST
done
ssh $RMT 'touch /home/wonslung/test/.timestamp'


this doesn't work:
Code:
#! /bin/sh
RMT="wonslung@seedbox.example.com"
SRC="/home/wonslung/test/"
DST="/export/home/wonslung/test/"
TMST="'$SRC'/.timestamp"

for f in `ssh $RMT 'find $SRC -newer $TMST`
do
scp -r "$RMT":$f $DST
done
ssh $RMT 'touch $TMST'


i'm sure it has something to do with the the fact i'm running it remotely, is there anyways to pass variables in such a way?
 
another question i have is this:

Will this have any problem with unfinished files?

let's say a file is being moved to the dir i'm running the find on and it isn't done yet, will it try to copy the file? or does the timestamp of the file get updated AFTER the move?

hrm...this doesn't seem to be working recursively....

I must have done something wrong,....

Edit:

I must be doimng something wrong with my variables....when i do it one way, it ends up scp'ing everything , when i do it another, it doesn't work it all....if i just put the commands in directly they work....so until i figure out why my variables fdon't wokr the way i think they should i went with this
Code:
#! /bin/sh
RMT="wonslung@seedbox.example.com"
SRC="/home/wonslung/Complete/"
DST="/tank/nas/dump/torrents/seedbox/"
TMST="'$SRC'/.timestamp"

for f in `ssh wonslung@seedbox.example.com 'find /home/wonslung/Complete/ -newer /home/wonslung/Complete/.timestamp`
do
scp -r wonslung@seedbox.example.com:"$f" /tank/nas/dump/torrents/seedbox/
done
ssh $RMT 'touch /home/wonslung/Complete/.timestamp'

of course my variables don't do anything in this script but it works...i'd like to figure out how to use the variables correctly though.
 
Also, now it is working....somewhat, but in the situation where i have a couple new dirs filled with files, i end up with it copying the dir with all the files and then copying the files inside the dir as well...twice....

like this:
Code:
wonslung@wonslung-raidz:~$ cd /tank/nas/dump/torrents/seedbox/
wonslung@wonslung-raidz:/tank/nas/dump/torrents/seedbox$ ls
obj-tosh.0.s02e07.nfo           obj-tosh.0.s02e08.r00
obj-tosh.0.s02e07.r00           obj-tosh.0.s02e08.r01
obj-tosh.0.s02e07.r01           obj-tosh.0.s02e08.r02
obj-tosh.0.s02e07.r02           obj-tosh.0.s02e08.r03
obj-tosh.0.s02e07.r03           obj-tosh.0.s02e08.r04
obj-tosh.0.s02e07.r04           obj-tosh.0.s02e08.r05
obj-tosh.0.s02e07.r05           obj-tosh.0.s02e08.r06
obj-tosh.0.s02e07.r06           obj-tosh.0.s02e08.r07
obj-tosh.0.s02e07.r07           obj-tosh.0.s02e08.r08
obj-tosh.0.s02e07.r08           obj-tosh.0.s02e08.r09
obj-tosh.0.s02e07.r09           obj-tosh.0.s02e08.r10
obj-tosh.0.s02e07.r10           obj-tosh.0.s02e08.r11
obj-tosh.0.s02e07.r11           obj-tosh.0.s02e08.rar
obj-tosh.0.s02e07.rar           obj-tosh.0.s02e08.sfv
obj-tosh.0.s02e07.sfv           Tosh.0.S02E07.HDTV.XviD-OBjECT
obj-tosh.0.s02e08.nfo           Tosh.0.S02E08.HDTV.XviD-OBjECT
wonslung@wonslung-raidz:/tank/nas/dump/torrents/seedbox$ cd Tosh.0.S02E07.HDTV.XviD-OBjECT/
wonslung@wonslung-raidz:/tank/nas/dump/torrents/seedbox/Tosh.0.S02E07.HDTV.XviD-OBjECT$ ls
obj-tosh.0.s02e07.nfo  obj-tosh.0.s02e07.r04  obj-tosh.0.s02e07.r09
obj-tosh.0.s02e07.r00  obj-tosh.0.s02e07.r05  obj-tosh.0.s02e07.r10
obj-tosh.0.s02e07.r01  obj-tosh.0.s02e07.r06  obj-tosh.0.s02e07.r11
obj-tosh.0.s02e07.r02  obj-tosh.0.s02e07.r07  obj-tosh.0.s02e07.rar
obj-tosh.0.s02e07.r03  obj-tosh.0.s02e07.r08  obj-tosh.0.s02e07.sfv
wonslung@wonslung-raidz:/tank/nas/dump/torrents/seedbox/Tosh.0.S02E07.HDTV.XviD-OBjECT$

any ideas how to fix this?

i can remove the -r from scp but then i end up with just a bunch of files without the dirs...i GUESS this is ok but it isn't exactly what i'm looking for.....


Edit:

I can get scp for dir's to work like this.....i'm wondering if this is going to cause problems though...dont' know enough about how timestamps work to predict

Code:
#!/bin/sh

for f in `ssh wonslung@seedbox.example.com 'find /home/wonslung/Complete/ -type d -newer /home/wonslung/Complete/.timestamp`
do
scp -r wonslung@seedbox.example.com:"$f" /tank/nas/dump/torrents/seedbox/
done
ssh wonslung@seedbox.example.com 'touch /home/wonslung/Complete/.timestamp'

my main worry is, if a dir inside of another dir is added, does it update the timestamp on the lower level dir? if so is this going to cause me to scp the entire contents of the dir? if so i'm no better off than where i started.

also, this doesn't work for files added which aren't part of a dir....this isn't very often the case but i'd like to figure a way to make those files work too....
 
yah, nether of these work because i have a recursive dir structure. As soon as a new dir is copied into /home/wonslung/Complete/TV/

it tries to scp the entire structure......is there any way around this?

I guess i could use this method on the lowest level......

but then i'll need to use a different script for each subdir in /home/wonslung/Complete/

(one for /home/wonslung/Complete/TV/, one for /home/wonslung/Complete/Movies/ one for /home/wonslung/Complete/Music)

any ideas on this?
 
wonslung said:
this doesn't work:
Code:
#! /bin/sh
RMT="wonslung@seedbox.example.com"
SRC="/home/wonslung/test/"
DST="/export/home/wonslung/test/"
TMST="'$SRC'/.timestamp"

for f in `ssh $RMT 'find $SRC -newer $TMST`
do
scp -r "$RMT":$f $DST
done
ssh $RMT 'touch $TMST'


i'm sure it has something to do with the the fact i'm running it remotely, is there anyways to pass variables in such a way?

You're passing the variables in single quotes, which will make them literal. Try using double quotes for the remote command.

Code:
# SRC=something
# echo '$SRC' and "$SRC"
$SRC and something
 
ahh, yeah, i forgot about that.

So this should work:

Code:
#! /bin/sh
RMT=wonslung@seedbox.example.com
SRC=/home/wonslung/Complete/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp

for f in `ssh $RMT "find $SRC -type d -newer $TMST"`
do
scp -r $RMT:$f $DST
done
ssh $RMT "touch $TMST"

is this going to cause me any issues thouhg? i need to test i guess..i don't want it to try to scp the entire dir of /home/wonslung/Complete/TV just because a new file ends up in there

edit:

just tested it, has the same problem....if a new dir is copied into /home/wonslung/Complete/TV/ it ends up updating the timestamp on /TV and i download the entire thing.

changing it to
Code:
#! /bin/sh
RMT=wonslung@seedbox.wonslung.com
SRC=/home/wonslung/Complete/TV/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp

for f in `ssh $RMT "find "$SRC"* -type d -newer $TMST"`
do
scp -r $RMT:$f $DST
done
ssh $RMT "touch $TMST"

works...but it's not exactly what i was hoping for. This method means i'll end up needing to script like this for each dir....and i'm not sure if there are going to be other unforseen issues.
 
Also, i've got 2 other issues i need to figure out.

One, i need to figure out how to add multiple finds and scp's to the same dir (one for files and one for directories)

because every so often a single file will end up in /home/wonslung/Complete/TV/ that needs to be scp'd as well. I've got the working command i think
Code:
for g in `ssh $RMT "find "$SRC"* -type f -maxdepth 0 -newer $TMST"`
(or is it -maxdepth 1?) eitherway, i'm sure it's one of those....i want it to look for single files in /home/wonslung/Complete/TV/ but not in /home/wonslung/Complete/TV/Someotherdir/


but, what is the best way to add it to the script? should i pipe it behind the first one or should it be after the first done?
Code:
#! /bin/sh
RMT=wonslung@seedbox.example.com
SRC=/home/wonslung/Complete/TV/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp

for f in `ssh $RMT "find "$SRC"* -type d -newer $TMST"`|for g in `ssh $RMT "find "$SRC"* -type f -maxdepth 0 -newer $TMST"`
do
scp -r $RMT:$f $DST
scp $RMT:$g $DST
done
ssh $RMT "touch $TMST"

or
Code:
#! /bin/sh
RMT=wonslung@seedbox.example.com
SRC=/home/wonslung/Complete/TV/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp

for f in `ssh $RMT "find "$SRC"* -type d -newer $TMST"`
do
scp -r $RMT:$f $DST
for g in `ssh $RMT "find "$SRC"* -type f -maxdepth 0 -newer $TMST"`
scp $RMT:$g $DST
done
ssh $RMT "touch $TMST"

and the second issu, i need to make sure this script won't run if it's already running. I've handled this in the past with pgrep...i guess i could use the same idea for this.
 
ok, i got the 2 find commands working....i did it like this:

Code:
#! /bin/sh
RMT=wonslung@seedbox.example.com
SRC=/home/wonslung/Complete/TV/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp

for f in `ssh $RMT "find "$SRC"* -type d -newer $TMST|find "$SRC"* -type f -maxdepth 0 -newer $TMST"`
do
scp -r $RMT:$f $DST
done
ssh $RMT "touch $TMST"

i'm not sure exactly how you mean to use the lockfile.....i used something like this before:
Code:
if pgrep -u $USER $SERVICE
then
I'm sure it's somethign similar....but i'll have to see if i can read up on it

or do i use "while"


edit:

I figured it out

Code:
LOCKFILE=/home/wonslung/.lockfile
if [ -f $LOCKFILE ]
then
echo "lockfile, exiting"
exit 0
else
touch $LOCKFILE
fi
 
well done ;)

Don't forget to remove it again somwehere in the process, though.
 
I'm making progress =) thanks so much for helping me with this...
Code:
#! /bin/bash
LOCKFILE="/tmp/seedbox.lockfile"
RMT=wonslung@seedbox.example.com
SRC=/home/wonslung/Complete/TV/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp
if [ -f $LOCKFILE ]
        then
                echo "lockfile, exiting"
        exit 0
else
touch $LOCKFILE 
fi    
for f in `ssh $RMT "find "$SRC"* -type d -newer $TMST|find "$SRC"* -type f -maxdepth 0 -newer $TMST"`
  do
scp -r $RMT:$f $DST
done

rm -f $LOCKFILE ; ssh $RMT "touch $TMST"

I have another question, if i needed to add more to a single line..but i didn't want it going on for so long, how do i break it into other lines...do i use \
?
 
Yep, end line with \
and continue on next line.

Code:
cd /usr/src && \
make cleanworld && make cleandir && \
make -j 4 buildworld && \
(etc.)
 
ok, question.....why doesn't this work?


I know it's something to do with the quotes....but i can't pass all 4 commands....2 work fine though.
Code:
#! /bin/bash
LOCKFILE="/tmp/seedbox.lockfile"
RMT=wonslung@seedbox.example.com
SRC=/home/wonslung/Complete/TV/
SRC2=/home/wonslung/Complete/Movies/
DST=/tank/nas/dump/torrents/seedbox/
TMST=$SRC.timestamp
if [ -f $LOCKFILE ]
        then
                echo "lockfile, exiting"
        exit 0
else
touch $LOCKFILE 
fi    
for f in `ssh $RMT "find "$SRC"* -type d -newer $TMST|find "$SRC"* -type f -maxdepth 0 -newer $TMST|find "$SRC2"* -type d -newer $TMST|find "$SRC2"* -type f -maxdepth 0 -newer $TMST"`
  do
scp -r $RMT:$f $DST
done

rm -f $LOCKFILE ; ssh $RMT "touch $TMST"
 
Back
Top