Apache AVRO was added as an inbound data format in Cinchy v5.3.
Apache AVRO (inbound) is a data format with added integration with the Kafka Schema Registry, which helps enforce data governance within a Kafka architecture.
Avro is an open source data serialization system that helps with data exchange between systems, programming languages, and processing frameworks. Avro stores both the data definition and the data together in one message or file. Avro stores the data definition in JSON format making it easy to read and interpret; the data itself is stored in binary format making it compact and efficient.
Some of the benefits for using AVRO as a data format are:
It is compact
It has a direct mapping to/from JSON
It's fast
It has bindings for a wide variety of programming languages.
For more about AVRO and Kafka, read the documentation here.
To set up the Apache AVRO connection to a Kafka Schema Registry, you will need to configure your Listener Configs table with the below specified attributes.
Name | Description |
---|---|
Name | Description |
---|---|
"topicName"
Mandatory. This is the Kafka topic name to listen messages on.
"messageFormat"
Put "AVRO" if your messages are serialized in AVRO
"bootstrapServers"
Mandatory. List the Kafka bootstrap servers in a comma-separated list. Should be in the form of host:port
"url"
This is required if your data follows a schema when serialized in AVRO. It is a comma-separated list of URLs for schema registry instances that are used to register or lookup schemas.
"basicAuthCredentialsSource"
Specifies the Kafka configuration property "schema.registry.basic.auth.credentials.source" that provides the basic authentication credentials. This can be "UserInfo" | "SaslInherit"
"basicAuthUserInfo"
Basic Auth credentials specified in the form of username:password
"sslKeystorePassword"
The client keystore (PKCS#12) password
Apache Kafka is an end-to-end event streaming platform that:
Publishes (writes) and subscribes to (reads) streams of events from sources like databases, cloud services, and software applications.
Stores these events durably and reliably for as long as you want.
Processes and reacts to the event streams in real-time and retrospectively.
Those events are organized and durably stored in topics. These topics are then partitioned over a number of buckets located on different Kafka brokers.
Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time for your key use cases.
Example Use Case: You currently use Kafka to store the metrics for user logins, but being stuck in the Kafka silo means that you can't easily use this data across a range of business use cases or teams. You can use a batch sync in order to liberate your data into Cinchy.
The Kafka Topic source supports batch syncs.
You can review the parameters that can be found in the info tab below (Image 1).
The following table outlines the mandatory and optional parameters you will find on the Source tab.
The following parameters will help to define your data sync source and how it functions.
The Schema section is where you define which source columns you want to sync in your connection. You can repeat the values for multiple columns.
There are other options available for the Schema section if you click on Show Advanced.
You can choose to add in a Transformation > String Replacement by inputting the following:
Note that you can have more than one String Replacement
You have the option to add a source filter to your data sync. Please review the documentation here for more information on source filters.
Configure your Destination
Define your Sync Behaviour.
Add in your Post Sync Scripts, if required.
Define your Permissions.
To run a real-time sync, set up your Listener Config and enable it to begin your sync.
In this example, we are syncing from a Kafka Topic source to a Cinchy Table target.
We want to sync the following data from Kafka and map it to the appropriate column in the "Sync Target 2" table in the "Kafka Sync" domain.
This is what the Connections UI will look like with the aforementioned example parameters and data.
Your source tab should be set to "Kafka Topic" and have the following information (Image 1):
Tip: Click on an image in this document to enlarge it.
Your destination tab should be set to "Cinchy Table", and have the following information (Image 2):
Domain: The domain where your destination table resides. In our example we are using the "Kafka Sync" domain.
Table: The name of your destination table. In our example we are using the "Sync Target 2" table.
Degree of Parallelism: This is the number of parallel batch inserts and updates that can be run. Set this to 1 for our example.
Under the Sync Behaviour tab, we want to use the following parameters:
Synchronization Pattern: Full File
Sync Key Column Reference Name: Employee Id
New Record Behaviour: Insert
Dropped Record Behaviour: Delete
Change Record Behavior: Update
The following code is what the XML for our example connection would look like:
Parameter | Description | Example |
---|---|---|
Parameter | Description | Example |
---|---|---|
Parameter | Description | Example |
---|---|---|
Parameter | Description | Example |
---|---|---|
Parameter | Description | Example |
---|---|---|
Kafka Source | Cinchy Column |
---|---|
Column 1 (Standard Column) Parameters | Example Data |
---|---|
Column 2 ( Standard Column) Parameters | Example Data |
---|---|
Column 1 (Standard Column) Parameters | Example Data |
---|---|
Column 2 (Standard Column) Parameters | Example Data |
---|---|
Title
Mandatory. Input a name for your data sync
Website Metrics
Version
Mandatory. This is a pre-populated field containing a version number for your data sync. You can override it if you wish.
1.0.0
Parameters
Optional. Review our documentation on Parameters here for more information about this field.
Source
Mandatory. Select your source from the drop down menu.
Kafka Topic
Name
Mandatory. The name of your column as it appears in the source.
Name
Alias
Optional. You may choose to use an alias on your column so that it has a different name in the data sync.
Data Type
Mandatory. The data type of the column values.
Text
Description
Optional. You may choose to add a description to your column.
Mandatory
If both Mandatory and Validated are checked on a column, then rows where the column is empty are rejected
If just Mandatory is checked on a column, then all rows are synced with the execution log status of failed, and the source error of "Mandatory Rule Violation"
If just Validated is checked on a column, then all rows are synced.
Validate Data
If both Mandatory and Validated are checked on a column, then rows where the column is empty are rejected
If just Validated is checked on a column, then all rows are synced.
Trim Whitespace
Optional if data type = text. If your data type was chosen as "text", you can choose whether to trim the whitespace (that is, spaces and other non-printing characters).
Max Length
Optional if data type = text. You can input a numerical value in this field that represents the maximum length of the data that can be synced in your column. If the value is exceeded, the row will be rejected (you can find this error in the Execution Log).
Pattern
Mandatory if using a Transformation. The pattern for your string replacement, i.e. the string that will be searched and replaced.
Replacement
What you want to replace your pattern with.
$.employeeId
Employee Id
$.name
Name
Name
$.employeeid
Alias
Employee Id
Data Type
Number
Name
$.name
Alias
Name
Data Type
Text
Trim Whitespace
True
Source Column
Employee Id
Target Column
Employee Id
Source Column
Name
Target Column
Name