Collect Spreadsheets Outlook data into your data warehouse or ours. The Matatika pipelines will take care of the data collection and preparation for your analytics and BI tools.
Sync spreadsheets data from an Outlook mailbox
In Outlook:
In Meltano Cloud:
A list of table definition objects
name: string (required)
The name to assign to the stream
path: string (required)
A path to a folder or set of emails in the format
imap://imap.outlook.com/path/to/folder-or-emails
Folders and emails are treated as directories and attachments as files
Supports glob pattern matching:
imap://imap.outlook.com/*/*/ all emails in all top-level foldersimap://imap.outlook.com/INBOX/*/ all emails in the inboximap://imap.outlook.com/INBOX/*/*: all attachments for emails in the inboximap://imap.outlook.com/INBOX/*/*.csv: all CSV file attachments for emails in the inboxformat: string (required)
The format of files to sync - one of csv, json, jsonl, excel or detect
pattern: string (required)
A regex pattern to filter resolved files on name by - set to "" if path specifies some
kind of filtering mechanism (e.g. glob pattern matching) and no more granular filtering is
required
start_date: string (required)
An ISO-8601 date-time to filter resolved files on last modified timestamp by
key_properties: array of strings (required)
The stream primary keys - for files where a primary key cannot be clearly identified, you can reference meta-properties
[
"_smart_source_bucket",
"_smart_source_file",
"_smart_source_lineno"
]
or [] for append-only behaviour
If using the meta-property approach outlined above, be aware that changes to file locations or contents may result in unexpected duplicates or overwrites - therefore, it is safest to use this approach if the targeted files are, for all intents and purposes, immutable
encoding: string (default: utf-8)
The encoding to use when reading files
state_based_discovery: boolean (default: false)
Whether or not to use state-based discovery, i.e. sample files starting from the bookmark
stored in state, rather than the initial start_date as defined in the tables
configuration) - we recommend setting this to true to capture any file schema changes and
optimise performance
skip_initial: integer (default: 0)
The number of lines to skip over when reading a file - mostly useful for excel format
files
max_sampled_files: integer (default: 50)
The number of files to sample during dynamic catalog discovery
max_sampling_read: integer (default: 1000)
The number of lines to sample for each file during dynamic catalog discovery
sample_rate: integer (default: 5)
Controls how frequently lines are sampled (i.e. every nth line) during dynamic catalog
discovery
prefer_schema_as_string: boolean (default: false)
Whether or not to skip inferring property types during sampling - if using with CSV files,
you can set max_sampling_read to 1 alongside this to improve discovery performance
(only the header row needs to be sampled to resolve the property set, given that all values
will be treated as strings)
csv format onlydelimiter: string (default: ,)
The value delimiter sequence used in the targeted files
quotechar: string (default: ")
The quotechar delimiter character used in the targeted files - set to detect to
auto-discover
excel format onlyworksheet_name: string
The specific worksheet name to pull out - defaults to the sheet with the most available data
See the tap README for further information and other miscellaneous settings
Mailbox email address
Extract, Transform, and Load Spreadsheets Outlook data into your data warehouse or ours.