I have the following scenario: A large database (postgresql). Many organizations
ID: 657816 • Letter: I
Question
I have the following scenario:
A large database (postgresql).
Many organizations that need to extract data from this database, each one with its own schedule and data requirements.
I want an application that:
Provides me a sort of calendar where I can schedule different query executions.
For each execution I can specify a simple sql file, a format for the resulting file and some sort of automatic uploading, for example, to an ftp server.
Has a web interface so I can use it from another computer over the internet.
Can manage several users.
Preferably "software libre" and linux-compatible.
Explanation / Answer
I do something similar, but in reverse (ingesting large amounts of information from several different companies into an enterprise data warehouse).
For this job, I'd recommend that you look at an ETL tool. If you're using PostgreSQL, I'd recommend that you look at either Pentaho Data Integration (PDI) or Talend. I've chosen to use PDI where I work and I couldn't be happier.
The way it works, you build out your jobs (which run several transformations, handle data transfer to things like FTP/SFTP, email, syslog entries, etc.) and your transformations (the bits that do the data extracts). You can save them to a file system, or in my case a database so that they can be accessed from multiple computers.
You can schedule the jobs to execute using a simple cron job from any computer that can run Java (I'm using OpenJDK with Ubuntu 14.04 and it's working great). A nice side effect of this when working in production is that your DB and ETL servers are separate, that way if for some reason ETL goes down it won't take your DB with it.