Skip to Main Content

Setting up Contact Trace Records for Analysis with Amazon Athena

["[vc_row][vc_column][vc_column_text]Amazon Connect has excellent reporting features, both for real-time and historical analysis. Contact Trace Records (CTRs) are the primary source of data that Connect collects for every call that occurs within your contact center and are used for reporting and analysis.\u00a0 These records are only stored for 24 months after creation, so they don\u2019t last forever.\n\nCTRs contain a lot of details about the call and one of the most important parts are Contact Attributes.\u00a0 While these are not a necessary feature of CTRs, using them within your contact flows can yield a lot of useful data concerning the customer experience and record important events and details that occur during a call.\u00a0 Amazon Connect does not support searching and reporting based on these Contact Attributes, however.\u00a0 They can only be viewed when looking at the details of an individual CTR.\n\nSo how can we store permanently store CTRs and make them available for analysis with other AWS Services?\u00a0 Keep reading to understand what services are needed and how to set them up to use with Connect.[\/vc_column_text][vc_custom_heading text=\"AWS Services Overview\" font_container=\"tag:h4","text_align:left\" use_theme_fonts=\"yes\"][vc_single_image image=\"3041\" img_size=\"full\"][vc_column_text]Amazon Connect<\/strong>\n\n \tAmazon Connect will be our source of data by generating CTRs for all calls going in and out of your contact center.<\/li>\n \tCheck out https:\/\/voicefoundry.com\/amazon-connect-data-sources-part-1\/<\/a>for more information regarding CTRs and other Connect Data.<\/li>\n<\/ul>\nAmazon Kinesis Data Firehose<\/strong>\n\n \tAmazon Kinesis Data Firehose is a managed data streaming service that will transport our CTRs from Connect to Amazon S3.<\/li>\n<\/ul>\nAmazon S3<\/strong>\n\n \tWith unlimited capacity and scalability, Amazon S3 is the best choice for storing our CTRs and it easily integrates with all the other services we need.<\/li>\n<\/ul>\nAWS Glue<\/strong>\n\n \tAWS Glue is another managed service which stores the metadata and database definitions as a Data Catalog (database table schema) that we will use with Amazon Athena, based on our CTR data structure.<\/li>\n<\/ul>\nAmazon Athena<\/strong>\n\n \tAmazon Athena gives us the power to run SQL queries on our CTRs in S3, using the Data Catalog from AWS Glue.<\/li>\n<\/ul>\n[\/vc_column_text][vc_custom_heading text=\"Implementation Guide\" font_container=\"tag:h4","text_align:left\" use_theme_fonts=\"yes\"][vc_column_text]Note: Some configuration settings are not fully supported through CloudFormation so we'll be setting up everything manually through the AWS Console.\n\nAWS S3<\/strong>\n\nIf you already have an S3 bucket setup that you wish to use to store CTRs, just take note of the bucket name.\u00a0 You can also use the S3 bucket that is used by your Connect instance to store call recordings and exported reports.\u00a0 Otherwise, you will need to create a new S3 bucket.\n\n \tNavigate to the S3 Service Console in AWS.<\/li>\n \tClick on the Create Bucket button.<\/li>\n \tEnter a unique name for your bucket. Remember this must be a unique name world-wide.<\/li>\n \tClick Create as the default settings are enough, the bucket will already be private.<\/li>\n<\/ol>\nAWS Glue<\/strong>\n\nThe next service we are going to set up is AWS Glue.[\/vc_column_text][vc_single_image image=\"3042\" img_size=\"full\"][vc_column_text]\n\n \tNavigate to the AWS Glue Service Console in AWS.<\/li>\n \tStart by selecting Databases in the Data catalog section and Add database.<\/li>\n \tEnter the desired name for your database, and optionally, the location and description.<\/li>\n \tClick on your newly created database. From here you can update the optional information if needed.<\/li>\n \tWhen you click on Tables from the left-hand side, you will see all tables in that region. By clicking on your Database and then View tables, the console will automatically filter the tables to show only those for the selected database.<\/li>\n<\/ol>\nThere are a couple of ways to create definitions for your table: manually, or with a crawler.\u00a0 With the manual option, you can specify the table schema yourself.\u00a0 With a crawler, you can schedule or run an on demand a job that can go through your data to attempt to determine the schema for you.\u00a0 In this case, we will go through both options; we will setup the initial table manually, then add additional definitions using a crawler.\n\n \tClick on Add tables, then Add table manually.<\/li>\n \tEnter your table name and select the database you want it to belong to.<\/li>\n \tOn the next page, select the S3 bucket to use as your Data Store. This will be the bucket that you wish to use to store your CTRs.<\/li>\n \tFor the Data format, choose Parquet. You then need to Define a schema by clicking on the Add Column.\u00a0 Add the following columns and data types:<\/li>\n<\/ol>\n\n\n\n<\/td>\nColumn Name<\/td>\nType<\/td>\n<\/tr>\n\n1<\/td>\nawsaccountid<\/td>\nstring<\/td>\n<\/tr>\n\n2<\/td>\nawscontacttracerecordformatversion<\/td>\nstring<\/td>\n<\/tr>\n\n3<\/td>\nagent<\/td>\nstring<\/td>\n<\/tr>\n\n4<\/td>\nagentconnectionattempts<\/td>\nstring<\/td>\n<\/tr>\n\n5<\/td>\nattributes<\/td>\nstring<\/td>\n<\/tr>\n\n6<\/td>\nchannel<\/td>\nstring<\/td>\n<\/tr>\n\n7<\/td>\nconnectedtosystemtimestamp<\/td>\nstring<\/td>\n<\/tr>\n\n8<\/td>\ncontactid<\/td>\nstring<\/td>\n<\/tr>\n\n9<\/td>\ncustomerendpoint<\/td>\nstring<\/td>\n<\/tr>\n\n10<\/td>\ndisconnecttimestamp<\/td>\nstring<\/td>\n<\/tr>\n\n11<\/td>\ninitialcontactid<\/td>\nstring<\/td>\n<\/tr>\n\n12<\/td>\ninitiationmethod<\/td>\nstring<\/td>\n<\/tr>\n\n13<\/td>\ninitiationtimestamp<\/td>\nstring<\/td>\n<\/tr>\n\n14<\/td>\ninstancearn<\/td>\nstring<\/td>\n<\/tr>\n\n15<\/td>\nlastupdatetimestamp<\/td>\nstring<\/td>\n<\/tr>\n\n16<\/td>\nmediastreams<\/td>\nstring<\/td>\n<\/tr>\n\n17<\/td>\nnextcontactid<\/td>\nstring<\/td>\n<\/tr>\n\n18<\/td>\npreviouscontactid<\/td>\nstring<\/td>\n<\/tr>\n\n19<\/td>\nqueue<\/td>\nstring<\/td>\n<\/tr>\n\n20<\/td>\nrecording<\/td>\nstring<\/td>\n<\/tr>\n\n21<\/td>\nsystemendpoint<\/td>\nstring<\/td>\n<\/tr>\n\n22<\/td>\ntransfercompletedtimestamp<\/td>\nstring<\/td>\n<\/tr>\n\n23<\/td>\ntransferredtoendpoint<\/td>\nstring<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\nNow you have a table that resembles the data from our Contact Trace Records.\u00a0 Review and finalize your table configuration.\u00a0 We'll come back to Glue in a little bit.\n\nAmazon Kinesis Data Firehose<\/strong>\n\nThe next service we will create is an Amazon Kinesis Data Firehose.\u00a0 This will deliver the CTRs from Connect to S3.\n\n \tNavigate to the Amazon Kinesis Dashboard in AWS.<\/li>\n \tSelect Create delivery stream under Kinesis Firehose delivery streams.<\/li>\n \tEnter your desired stream name.<\/li>\n \tChoose the source of data, in this case, select Direct PUT or other sources. The other option is to get data from Kinesis Data Streams which is useful if you have other needs with the CTRs but in our case it's not necessary.\u00a0 Click Next.<\/li>\n \tUnder Process records, keep Record transformation disabled. Here you can invoke a lambda function to transform your data records prior to delivering to S3.<\/li>\n \tWe do want to enable Record format conversion. Select Apache Parquet.<\/li>\n \tNext, we will integrate our new Glue database. Select your AWS Glue region, the Glue database and table we created earlier.\u00a0 Select Latest for table version.<\/li>\n \tNext, choose Destination. Due to enabling format conversion, S3 is the default and only option.\u00a0 So, we'll need to select the bucket we created earlier.<\/li>\n \tFor the prefix, you can leave this blank.\u00a0 Firehose automatically distributes records using the following folder structure \"YYYY\/MM\/DD\/HH.\" You can alter this if needed but that structure format will work for us.<\/li>\n \tIt is good to set an error prefix, such as \"error\/\". With this option, any records with delivery errors will get sent to this folder, isolated from other records.<\/li>\n \tYou can also specify a S3 backup destination which will store untransformed records. We'll keep this disabled for now.<\/li>\n \tKeep the default buffer size of 128 MB and set the buffer interval to 60 seconds.<\/li>\n \tYou can set S3 compression, encryption, error logging, and tags if desired.<\/li>\n \tThe final step is to create or choose an IAM role for Firehose to access S3 and any other needed services. Clicking the link will open a new page to create or choose a role. The default role is good enough, just change the Role name as needed.\u00a0 Click Apply.<\/li>\n \tReview your Firehose configuration and finish creating. It will take a minute or two to create.<\/li>\n<\/ol>\nAmazon Connect<\/strong>\n\nNext, we need to configure Connect to send CTRs to your new Kinesis Data Firehose.\n\n \tNavigate to your Connect Instance in the AWS Console.<\/li>\n \tSelect Data streaming from the left-hand side.<\/li>\n \tClick the checkbox \"Enable data streaming\" if not already enabled. This will display options for Contact Trace Records and Agent Events.\u00a0 Under Contact Trace Records, Select Kinesis Firehose and select the Firehose we just created.<\/li>\n \tClick Save and AWS Connect will update the instance and its own IAM role to access the selected Firehose.<\/li>\n<\/ol>\n[\/vc_column_text][vc_single_image image=\"3043\" img_size=\"full\"][vc_column_text]We've now created all the Services we need; the only thing left is data.\u00a0 If your Connect instance is already live, just wait a bit for new CTR's to be generated otherwise, start making some test calls.\u00a0 After you've got some new CTRs, check out the Amazon Kinesis Firehose in the Console.\u00a0 Select your Firehose and click on the Monitoring tab.\u00a0 You should start seeing some metrics pop up.[\/vc_column_text][vc_single_image image=\"3044\" img_size=\"full\"][vc_column_text]If it's been more than five or so minutes with no metrics, you might want to check permissions as this is the most likely issue.\u00a0 If you see data in DeliveryToS3Records, you are good to go.\n\nAmazon Athena<\/strong>\n\nNavigate to the Amazon Athena Console.\u00a0 You don't need to set anything up in Athena.\u00a0 Athena automatically looks at your Glue Data Catalog and shows your available Databases you can query.\u00a0 You should see your database in the drop down and tables underneath.\u00a0 You can simply click the three vertical dots to open a small menu and select Preview table to run a simple query.\u00a0 Or enter in the query tab:\n
SELECT<\/span> * FROM<\/span> \"<your-database>\"<\/span>.\"<your-table>\"<\/span> limit<\/span> 10<\/span>;<\/pre>\nClick Run query.\u00a0 You should see your CTRs pop up in the results window below.\u00a0 In the Results window, you can export your results as CSV.\u00a0 Your CTR data can now be queried using SQL.\u00a0 You'll notice that currently, some columns have JSON strings in them.\u00a0 Athena allows you to query keys within the JSON using json_extract() like this:\n
SELECT<\/span> json_extract(attributes<\/span>, '$.<attribute-key>'<\/span>) AS<\/span> \"<attribute-key>\"<\/span>\nFROM<\/span> \"<your-database>\"<\/span>.\"<your-table>\"<\/span>;<\/pre>\nVisit