Server architecture for financial services application

manwithahammer · Jul 9, 2023

Hi everyone.

I have a side project that could turn into a full-time project. so I wanted to ask what is the best way to go about it. The application needs to do the following:
- accept files via HTTP, FTP or SFTP
- accept files via email
- run a public-facing web server
- run a database (my preferred choice is PostgresQL) that stores data from the files, does some post-processing, and is used to serve data from the website
I currently am running PostgresQL, Nginx and ProFTPd on a FreeBSD virtual instance. ProFTPd saves its activity log to PostgresQL, which has triggers on the table to process files. I'm also using another FreeBSD instance with Postfix/Dovecot to run my personal email server, and aliased an email address to a python script that FTP's email attachments to the application server, so it hooks into the FTP workflow.

My main question is, if I wanted to turn this into a high availability, production grade architecture, what is the best way to go about it? Is it OK to put everything on a sufficiently large instance, and put each application in a jail? Or should it be split up into multiple servers, one (or multiple for replication) for the DB, one or more for Nginx/ProFTPd/Postfix?
And related to this, any recommendations on where to find FreeBSD sysadmins who could spec out a project like this?

Thanks!

ralphbsz · Jul 9, 2023

Question 1: How high is the throughput? How many files to you expect to be uploaded per second? What is the web server traffic? How complex is the serving (only static pages versus running extensive back-end processes)? That tells us how many servers you need. What is the data volume? That tells us how much storage you need.

Question 2: You said "high availability". What do you mean by that? Or to be precise, what is your SLA? It could for example be: Web server latency P99 less than 1s and less than 10s always, at 5 nines. Maximum outage duration 30 seconds, except in case of a government declared disaster (hurricane, flood, fire ...), during which a 60 second outage is permissible.

Question 3: What are your durability requirements for the data? A typical SLA might be 10 nines per object or file.

Question 4: You said this is financial services. What regulatory and reporting requirements do you have? Has your company's VP of regulatory affairs cleared the answers above with the SEC (or whatever agency is in charge in your country)?

Depending on your answers to these questions, we might be able to architect it. Given that you said "high availability" and "financial", the answer will be that you will need to be located in multiple data centers, each equipped with redundant network feeds and backup power (diesels and batteries), and not in a common failure domain (* footnote about former customer below), such as the same flood plain of the Mississippi river. I would probably go with a major cloud provider, and use their services and storage architecture for it.

Footnote: In my previous job, we had a customer who needed to have high availability and durability of their financial services. They were headquartered in New York, in lower Manhattan. Their backup data center (to which data was synchronously copied) was in the OTHER tower of the World Trade Center. Obviously, the company went under after 9/11. Don't be like them, but your backup center far away, and by that I don't mean New Jersey.

Jose · Jul 9, 2023

ralphbsz said:
Question 2: You said "high availability". What do you mean by that? Or to be precise, what is your SLA? It could for example be: Web server latency P99 less than 1s and less than 10s always, at 5 nines. Maximum outage duration 30 seconds, except in case of a government declared disaster (hurricane, flood, fire ...), during which a 60 second outage is permissible.

Are you allowed to have maintenance windows when the service is unavailable?

_martin · Jul 9, 2023

On top of what was said above..

manwithahammer said:
I wanted to ask what is the best way to go about it.

Do you just need SA (solution architect) or do you require also support on top of it? You didn't specify what financial application are you running (some local laws/standards may be required to follow).
In some cases NDAs do need to be signed by invited parties to work on/with the data.

You should balance in the importance of the setup with all its implications. In the end it may be cheaper to have this designed by somebody local who you can speak to face to face.
There are way too many variables and unknowns to have this properly designed.

gpw928 · Jul 9, 2023

manwithahammer said:
- run a public-facing web server
- run a database

I have never worked in a place that kept a database with financial information in the DMZ/Gateway.

Data that need to be kept confidential should be securely stored well away from the Internet.

If you collect confidential information from an Internet facing application, you should encrypt it immediately, you should obliterate the memory in which the plain text was stored, and you should pass the encrypted data item through a firewall to a secure internal server for processing and storage. Your encryption key management procedures need to be well designed.

manwithahammer · Jul 10, 2023

ralphbsz said:
Question 1: How high is the throughput? How many files to you expect to be uploaded per second? What is the web server traffic? How complex is the serving (only static pages versus running extensive back-end processes)? That tells us how many servers you need. What is the data volume? That tells us how much storage you need.

Question 2: You said "high availability". What do you mean by that? Or to be precise, what is your SLA? It could for example be: Web server latency P99 less than 1s and less than 10s always, at 5 nines. Maximum outage duration 30 seconds, except in case of a government declared disaster (hurricane, flood, fire ...), during which a 60 second outage is permissible.

Question 3: What are your durability requirements for the data? A typical SLA might be 10 nines per object or file.

Question 4: You said this is financial services. What regulatory and reporting requirements do you have? Has your company's VP of regulatory affairs cleared the answers above with the SEC (or whatever agency is in charge in your country)?

Depending on your answers to these questions, we might be able to architect it. Given that you said "high availability" and "financial", the answer will be that you will need to be located in multiple data centers, each equipped with redundant network feeds and backup power (diesels and batteries), and not in a common failure domain (* footnote about former customer below), such as the same flood plain of the Mississippi river. I would probably go with a major cloud provider, and use their services and storage architecture for it.

Footnote: In my previous job, we had a customer who needed to have high availability and durability of their financial services. They were headquartered in New York, in lower Manhattan. Their backup data center (to which data was synchronously copied) was in the OTHER tower of the World Trade Center. Obviously, the company went under after 9/11. Don't be like them, but your backup center far away, and by that I don't mean New Jersey.

Q1: I suspect it's going to be less than 1 file per second on average. It will come in bursts, once in the morning, one in the afternoon. It'll probably be between 100,000 and 1,000,000 lines per day, split up among up to 5,000 files per day (.csv format, around 15 columns). It will probably be less than 1,000 users total. Back end will be Python/Flask.

Q2: still to be determined, I am currently working on the proof of concept.

Q3/Q4: I am not sure yet. This service isn't a regulatory requirement, it is something that will allow large financial institutions to save a substantial amount of money.

I will definitely need redundant service in geographically separated regions.

manwithahammer · Jul 10, 2023

Jose said:
Are you allowed to have maintenance windows when the service is unavailable?

Yes, anything outside of business hours should work.

manwithahammer · Jul 10, 2023

_martin said:
On top of what was said above..

Do you just need SA (solution architect) or do you require also support on top of it? You didn't specify what financial application are you running (some local laws/standards may be required to follow).
In some cases NDAs do need to be signed by invited parties to work on/with the data.

You should balance in the importance of the setup with all its implications. In the end it may be cheaper to have this designed by somebody local who you can speak to face to face.
There are way too many variables and unknowns to have this properly designed.

I am definitely open to hire someone to do the design. I'd be happy to get recommendations for where to find someone who knows FreeBSD. From a cursory google search wasn't immediately obvious. Upwork seems to have some people who could do the actual setup, not sure about the design though.

ralphbsz · Jul 10, 2023

Why do you want to use FreeBSD? Why don't you use AWS (*) instead?

Footnote: When I say "AWS", I actually mean: A commercial cloud service that operates at the API layer, for example having an API for storing objects, one for database updates, one for setting up files to be served by the web, and so on. If you go to that level, then issues like geo-replication and failure tolerance become someone else's problem, which is where they belong. Leave the difficult but generic things to highly paid experts, and work in the area where your expertise is (namely the problem domain).

shkhln · Jul 11, 2023

Does anybody actually pay for said project?