Attribution raw data schema

Attribution collects a number of data points of which can later be exported. Data scientists, analysts and developers can use the schema below to design custom reports on top of this data.

Attribution exports provide the same data in the same format as used in your Attribution Dashboard to build all available reports, this data is in a raw form (as collected) and does not have any attribution model applied.

It is not possible to export data with any attribution model applied because your model is dependent on many factors you select on the dashboard, for example your current filters, date range, and model type etc. Also note that exports do not contain any pre-build reports.

The graph below displays the break down of Attribution's raw data. Some tables contain normalized data, which means the same data could be present in multiple tables but in different formats. For example the params table stores parsed URL parameters originally stored in events.uri column.

Events

The events table contains all events and pageviews data generated by track() and page() snippet calls and also server-side sent events. This table does not contain the raw form of event properties instead they are stored in properties table.

Column Name

Type

Description

id

BIGINT

Unique ID of associated pageview or track event. Primary key for this table, is referenced from other tables.

name

VARCHAR

Event name. For pageview Loaded a Page or Viewed * Page is used depending how you fire page() snippet method.

ip

VARCHAR

IP address of visitor.

created_at

TIMESTAMP

Timestamp when event was written into Attribution database.

user_id

BIGINT

References users.id. This is internal user ID generated by Attribution, not the USER_ID used for tracking users.

browser_id

BIGINT

References browsers.id.

project_id

BIGINT

Internal ID of your project.

time

TIMESTAMP

Timestamp when event was fired.

referring_url

VARCHAR

HTTP Referer URL of the pageview.

referring_host

VARCHAR

Hostname part of referring_url.

revenue

BIGINT

Revenue property value in cents.

visitor_id

BIGINT

Internal ID of visitor. References visitors.id.

updated_at

TIMESTAMP

Timestamp when event was last updated in database.

uri

VARCHAR

Pageview URL at capture time.

uri_path

VARCHAR

URL path of uri.

self_referral

BOOLEAN

TRUE if referring_host matches uri hostname, FALSE otherwise.

message_id

VARCHAR

Unique event ID generated by library which sent the event.

source

VARCHAR

Name of the source where event was captured from.

type

VARCHAR

p for pageview events, e.g. page() snippet command, t for custom events sent by track() snippet command.

Visits (sessions)

Visits table has variable name as it depends on our internal ID for your project, if your internal project ID is 1234 - you'd have visits_1234 table in your data export. The visits table contains all the information you see as Visits in Dashboard.

Visits are touchpoints, usually visit is a first pageview in the user session (learn more what is visit), we don't call it sessions because session is a group of pageviews and events, where visit is just the first pageview which contains information about the source of the session (it could be URL parameters, referring website or combination of both). The source of the visit (and the whole session following it) is called a Filter in Attribution system.

This table contains the most value information needed to build custom reports on top of Attribution data.

Also this table is NOT incrementally exportable as most of the other tables, this means you will have to drop existing visits table and load in full with every export. The reason for that is that visits depend on filters and filters are often added/updated/removed, especially by integration, which leads that visits table is re-created based on information from events and filters table. In other words visits are the result of applying filters to events and finding all the matches, for example if you have a filter which has rule "URL Paramter utm_source is partner" than any pageview which has "utm_source=partner" in it's URL parameters will be written into visits table.

Column Name

Type

Description

id

BIGINT

References events.id, unique for visits table. Can be used as Primary Key.

visitor_id

BIGINT

References events.visitor_id and visitors.id.

visit_time

TIMESTAMP

References events.time.

filter

BIGINT

References xNNNN_filters.id where xNNNN_filters.type is filter. Replace NNNN with your internal Project ID, e.g. 1234.

company_id

BIGINT

If the number is positive it matches visitor_id from above, if the number is negative it references company.id. If visitor belongs to a company (has company_id trait) this number would be negative, you need to apply modulus to reference company.id.

visit_type

VARCHAR

v is for regular visit (pageview); i is for influence touchpoint used for TV Attribution.

user_id

BIGINT

References events.user_id and users.id. Can be NULL.

original_created_at

TIMESTAMP

Matches visitor.original_created_at. This column contains value of createdAt trait that if passed to identify() call. Can be NULL.

path

VARCHAR

Matches events.uri_path but with the exception that NULL indicates for no path or /. Can be NULL.

Most of the columns are excessive in visits table but they are added to help optimize complex queries, you can skip loading selected columns and just load the two most important -id and filter. The rest of columns can be fetched by joining events and other linked tables.

Filters (xNNNN_filters)

Filters table has variable name as it depends on our internal ID for your project, if your internal project ID is 1234 - you'd have x1234_filters table in your data export. The filters table contains base information about the channels and filters in your Dashboard.

Dashboard has tree-like structure called "filter tree". Filters table contains information about both channels and filters from the Dashboard, since handling tree-like structures in SQL could be tricky we tried to flatten everything into filters table for easier use.

Important notice - this table is NOT incrementally exportable, because filters are added/updated/removed often by automated integrations - it's hard to keep track of all the changes, so it's easier just to truncate and load this table again. Also the original order of the channels and filters as in Dashboard is not preserved here. Primary Key of this table is composite (id, type), technically you have two

Column Name

Type

Description

id

BIGINT

ID of fliter of channel. Not unique.

parent_group_id

BIGINT

References id where type is channel.
ID of the parent channel to which this channel or filter belongs to.

top_parent_group_id

BIGINT

References id where type is channel.
ID of the TOP parent channel to which this channel or filter belongs to.

type

VARCHAR

Could be filter or channel.

name

VARCHAR

Name of the filter of channel.

label

VARCHAR

Attribution or integration qualification of filter or channel, could take different values to help identify what this entity is about. Can be NULL.

integration

VARCHAR

Contains technical identifier of the integration, e.g. bing is for "Microsoft Ads" and etc.

path

VARCHAR

JSON encoded Array of the full path (top to bottom) of channels to which filter or channel belongs to.

If you have any questions on this please feel free to contact [email protected]