SharePoint Connect

SharePoint data into your data warehouse in minutes

Collect SharePoint data into your data warehouse or ours. The Matatika pipelines will take care of the data collection and preparation for your analytics and BI tools.

Automate SharePoint from a single space with no code

Sync spreadsheets data from SharePoint

Setup

In SharePoint:

  1. Identify the site you want to sync files from
  2. Identify the directory structure where the files you want to sync exist

In Meltano Cloud:

  1. Login and connect with your Microsoft account
  2. Add Tables configuration, with respect to the identified site and directory structure

Settings

Tables

A list of table definition objects

Table definition

  • name: string (required)

    The name to assign to the stream

  • path: string (required)

    A path to a directory containing files in the format sharepoint://<site_name>/path/to/dir

  • format: string (required)

    The format of files to sync - one of csv, json, jsonl, excel or detect

  • pattern: string (required)

    A regex pattern to filter resolved files on name by - set to "" if path specifies some kind of filtering mechanism (e.g. glob pattern matching) and no more granular filtering is required

  • start_date: string (required)

    An ISO-8601 date-time to filter resolved files on last modified timestamp by

  • key_properties: array of strings (required)

    The stream primary keys - for files where a primary key cannot be clearly identified, you can reference meta-properties

    [
    "_smart_source_bucket",
    "_smart_source_file",
    "_smart_source_lineno"
    ]

    or [] for append-only behaviour

    If using the meta-property approach outlined above, be aware that changes to file locations or contents may result in unexpected duplicates or overwrites - therefore, it is safest to use this approach if the targeted files are, for all intents and purposes, immutable

  • encoding: string (default: utf-8)

    The encoding to use when reading files

  • skip_initial: integer (default: 0)

    The number of lines to skip over when reading a file - mostly useful for excel format files

  • max_sampled_files: integer (default: 50)

    The number of files to sample during dynamic catalog discovery

  • max_sampling_read: integer (default: 1000)

    The number of lines to sample for each file during dynamic catalog discovery

  • sample_rate: integer (default: 5)

    Controls how frequently lines are sampled (i.e. every nth line) during dynamic catalog discovery

  • prefer_schema_as_string: boolean (default: false)

    Whether or not to skip inferring property types during sampling - if using with CSV files, you can set max_sampling_read to 1 alongside this to improve discovery performance (only the header row needs to be sampled to resolve the property set, given that all values will be treated as strings)

csv format only

  • delimiter: string (default: ,)

    The value delimiter sequence used in the targeted files

  • quotechar: string (default: ")

    The quotechar delimiter character used in the targeted files - set to detect to auto-discover

excel format only

  • worksheet_name: string

    The specific worksheet name to pull out - defaults to the sheet with the most available data

See the tap README for further information and other miscellaneous settings


View source code

SharePoint data you can trust

Extract, Transform, and Load SharePoint data into your data warehouse or ours.

Interested in learning more?

Get in touch