Solved Fastest way to transfer files between servers.

Hello all!

Need to transfer huge amount of files between servers. The average file size is about 15Mb - 1100Mb. This is video fragments from surveillance cameras. The income video traffic 2-4Gbit. Income traffic recording into fragments and after it fragments spreading to servers.
Have some questions:
  • What is the fastest way to do it? FTP? rsync/rcp/scp? Something else?
  • Is there something to be tuned for this amount of traffic?
 
How many video streams/cameras you have? This is huge amount of traffic. I'm using H265 encoding in the camera itself which is around 5Mbit/s @30fps for 4K camera w/o sound.
 
That's about 500 Mbyte/second. Given that an individual disk drive can not sustain reading or writing much above 100 Mbyte/second, this means a parallel IO disk array. Given that spinning disks give the most bandwidth per dollar, I find it unlikely that SSDs are being used for storage, but perhaps they are.

With a good server, those kinds of speeds are easily reachable; our group was doing 18 GByte/second using a pair of servers (active/active failover) about 10 years ago, but we used several hundred disk drives per server pair (and a heck of a lot of Infiniband and SAS hardware to get the data plumbed in and out).

From a server hardware viewpoint, 1/2 Gbyte/second doesn't stress the system much. The issue is things like file system, locking, how to handle deletion, how to get parallel and cache-coherent access to work efficiently.

At that kind of speed, I would look at the system architecture first; the question of what file transfer protocol to use (rsync versus FTP) is secondary. Matter-of-fact, I would move towards a cluster file system model, where data is never moved, once it has been stored, since moving data is inefficient.

Your question "is there something to be tuned" is somewhat humorous. Yes, to reach really high speeds, you will have a team of experts spend a year or two tuning the system, using labs with dozens of test systems.
 
That's about 500 Mbyte/second. Given that an individual disk drive can not sustain reading or writing much above 100 Mbyte/second, this means a parallel IO disk array. Given that spinning disks give the most bandwidth per dollar, I find it unlikely that SSDs are being used for storage, but perhaps they are.

With a good server, those kinds of speeds are easily reachable; our group was doing 18 GByte/second using a pair of servers (active/active failover) about 10 years ago, but we used several hundred disk drives per server pair (and a heck of a lot of Infiniband and SAS hardware to get the data plumbed in and out).

From a server hardware viewpoint, 1/2 Gbyte/second doesn't stress the system much. The issue is things like file system, locking, how to handle deletion, how to get parallel and cache-coherent access to work efficiently.

At that kind of speed, I would look at the system architecture first; the question of what file transfer protocol to use (rsync versus FTP) is secondary. Matter-of-fact, I would move towards a cluster file system model, where data is never moved, once it has been stored, since moving data is inefficient.

Your question "is there something to be tuned" is somewhat humorous. Yes, to reach really high speeds, you will have a team of experts spend a year or two tuning the system, using labs with dozens of test systems.
The question was about "Fastest way to transfer files between servers". It wasn't about how to store it on server within several hundred disk drivers.
 
1762844446539.png
 
I guess the data in the source server is already compressed, if it's something like mp4. If you can do any further compression before the transfer, that might be worth exploring. For example can you reduce the video resolution during capture/encoding, or do you need the full resolution? Anything you can do to cut down the volume of data to be transferred must help, provided you have a sufficiently meaty cpu (and possibly hardware compression coprocessor) at the video capture side. What is the absolute minimum video resolution you need? You can of course get video compression accelerator cards like this, although I don't know if freebsd supports them, you would need to explore what freebsd supports.


Once you've captured and compressed it... using ssh for the transfer usually has some overhead because the data has to be encrypted/decrypted. Using rsync or ftp would avoid that, rsync is probably the preferred option nowadays. If the data is already highly compressed then ssh -C may not buy you much additional compression.

To increase the link bandwidth... investigate using the lagg(4) link aggregation driver to obtain a faster link between the source and destination machines. You can aggregate links to get a faster channel. It goes without saying that you want the fastest NICs and switch (if there is a switch and not a direct link) you can get. It all depends where the bottlenecks are, you need to find that by experiment.

And I guess optimising zfs and raid configuration on the target machine, but ralphbsz has already discussed that. That's probably the bottleneck once you've got the network bandwidth up and the data size minimised. There's some discussion here that might be useful


Perhaps it's worth experimenting with building a 2-node freebsd HAST cluster as the target, which should increase the speed at which you can dump data onto the target.
 
There is one other question, which is WHY you need to transfer this huge volume of data in the first place. Do you really need to build an archive of the video data? Or is it possible to process the data at source in realtime and only archive the finished data products, and/or only archive the video segments that are interesting. Just a thought. :)
 
I guess the data in the source server is already compressed, if it's something like mp4. If you can do any further compression before the transfer, that might be worth exploring. For example can you reduce the video resolution during capture/encoding, or do you need the full resolution? Anything you can do to cut down the volume of data to be transferred must help, provided you have a sufficiently meaty cpu (and possibly hardware compression coprocessor) at the video capture side. What is the absolute minimum video resolution you need? You can of course get video compression accelerator cards like this, although I don't know if freebsd supports them.


Once you've captured and compressed it... using ssh for the transfer usually has some overhead because the data has to be encrypted/decrypted. Using rsync or ftp would avoid that, rsync is probably the preferred option nowadays. If the data is already highly compressed then ssh -C may not buy you much additional compression.

To increase the link bandwidth... investigate using the lagg(4) link aggregation driver to obtain a faster link between the source and destination machines. You can aggregate links to get a faster channel. It goes without saying that you want the fastest NICs and switch (if there is a switch and not a direct link) you can get. It all depends where the bottlenecks are, you need to find that by experiment.

And I guess optimising zfs and raid configuration on the target machine, but ralphbsz has already discussed that. That's probably the bottleneck once you've got the network bandwidth up and the data size minimised. There's some discussion here that might be useful


Perhaps it's worth experimenting with building a 2-node freebsd HAST cluster as the target, which should increase the speed at which you can dump data onto the target.

Thx for reply.

1) Everything for compression or any kind of minimizing the size of files - done. Nothing more could be done. It is what it is. We have on first stage HLS translation (everything in memory on RAM disk), next stage is compressing fragments in mp4 (also in memory and only after send it to storage server, there are only one action of writing to disk on storage server). And fragments stored in cloud within HTTPS access. For now question is about how to speed up the transfer between servers.

2) Thx for lagg link aggregation driver notice. Will look into it. And for HAST, never been doing something like this

3) SSD isn't the solution because all of them have 'write limit'. We've been calculating that every ssd for 3-5 month only. After that time SSD degradation starting. For this amount of data it will be cheaper if it's made of gold. Classical HDD is what we using. We have really a lot of cameras and we must be storing data for 10-20 days, sometimes with 3D part processing.
 
There is one other question, which is WHY you need to transfer this huge volume of data in the first place. Do you really need to build an archive of the video data? Or is it possible to process the data at source in near realtime and only archive the finished data products, and/or only archive the video segments that are interesting. Just a thought. :)
To handle realtime HLS, processing video analyse and storing this huge amount of data is impossible on one server, especially when you handle about 1000 cameras per one server within ability to load archive from cloud. There must be solution for scaling storage.
 
Thx for reply.

1) Everything for compression or any kind of minimizing the size of files - done. Nothing more could be done. It is what it is. We have on first stage HLS translation (everything in memory on RAM disk), next stage is compressing fragments in mp4 (also in memory and only after send it to storage server, there are only one action of writing to disk on storage server). And fragments stored in cloud within HTTPS access. For now question is about how to speed up the transfer between servers.

2) Thx for lagg link aggregation driver notice. Will look into it. And for HAST, never been doing something like this

3) SSD isn't the solution because all of them have 'write limit'. We've been calculating that every ssd for 3-5 month only. After that time SSD degradation starting. For this amount of data it will be cheaper if it's made of gold. Classical HDD is what we using. We have really a lot of cameras and we must be storing data for 10-20 days, sometimes with 3D part processing.
It sounds pretty sophisticated already. Interesting technical problem. :)
 
Why??? It's just data storage provider within ability to store video by any kind of civilian customer. HLS for drone warfare??? Really???
You pretend "civilian" use. Interesting wording, as "commercial" would be a more common term.
Your project does have a bad smelling. Be it military or commercial, for sure it is not on the level of a private user.
But in either case I'd strongly reject supporting any gov or corporate surveillance projects.

And speaking out clearly:

Folks, do not be such a fool supporting war actors that are attacking the free and open world FreeBSD is part of.
 
You pretend "civilian" use. Interesting wording, as "commercial" would be a more common term.
Your project does have a bad smelling. Be it military or commercial, for sure it is not on the level of a private user.
Possibly more language barrier (which also makes me suspect that some of the requirements might be a little off?).

If feel that someone doing something dodgy would be using Linux. Its more accessible, more popular and more common knowledge. In a war or criminal (or both) setting, someone isn't going to go through the effort of using a niche platform like FreeBSD and focus on correctness rather than just duct tape some crap together with Linux if it is just going to explode anyway.

Folks, do not be such a fool supporting war actors that are attacking the free and open world FreeBSD is part of.
I don't disagree, though I will note that many tech breakthroughs tend to only come from war. Without conflict, humans tend to just stagnate and sell the same old crap mobile phones to each other like we have done for the last decade.
 
Back
Top