Demo API Latencies

In analyzing the benchmark data, it's clear that Deepgram's transcription API surpasses the other two . When handling brief prompts that elicit short responses, the latency between the agent (GPT-4) and the synthesizer (Azure) appears relatively comparable. This similarity can be attributed to the minimal text size being processed for synthesis. However, as the text volume expands, there's a noticeable divergence, with the synthesizer's latency significantly exceeding that of the agent. This suggests that the synthesizer's performance is more sensitive to increases in text length, impacting its efficiency in time-sensitive applications. This process can be complex and time-consuming, particularly for synthesizers that produce highly natural-sounding speech. It involves several stages, including text processing, linguistic analysis, waveform generation, and more.

NOTES:

  • If the prompts are too long, the demo cuts off at the middle and starts responding. This does cut off the prompt, however, it does catch few parts of the prompt if the user continues to speak. This will have to be fixed as it interrupts user prompt.
  • Therefore, asking 2-3 questions in a single prompt was the most this test was able to achieve.

Actual End-to-End Latency

Deepgram (Transcriber): 2.35%

GPT4 (Agent): 15.81%

Azure (Synthesizer): 81.84%

User Perceived Latency

Deepgram (Transcriber): 15.66% (From end of user prompt to transcription completion)

GPT4 (Agent): 45.72% (Transcription completion to start of synthesis)

Azure (Synthesizer): 38.61% (Start of synthesis to the first voice chunk sent)

End-to-End Flow

Deepgram (Transcriber): 2.35%

GPT4 (Agent): 15.81%

Azure (Synthesizer): 81.84%

What is two plus two?

Response:

  • Two plus two equals four.

Total Latency = 3.941 seconds

  • Deepgram: 0.373 seconds (9.46%)
  • GPT4: 1.617 seconds (41.03%)
  • Azure: 1.951 seconds (49.51%)

What is a vector database?

Response:

  • A vector database is a type of data storage system that deals with data in the form of vectors.
  • A vector in this context refers to an ordered set of numbers, which could often represent things like coordinates in space, attributes of an object, etc.

Total Latency = 15.401 seconds

  • Deepgram: 0.136 seconds (0.88%)
  • GPT4: 1.639 seconds (10.64%)
  • Azure: 13.626 seconds (88.48%)

What is a blockchain?

Response:

  • Blockchain is like a set of building blocks where each block contains some information that everyone agrees is true.
  • Before adding a new block, everyone needs to agree it's correct. Once they do, it's added to the chain of previous blocks.
  • This creates a long, unchangeable record that everyone can see, ensuring the information is trustworthy without needing a central authority.

Total Latency = 60.691 seconds

  • Deepgram: 0.632 seconds (1.04%)
  • GPT4: 4.998 seconds (8.23%)
  • Azure: 55.061 seconds (90.73%)

Explain blockchain to me like I’m five.

Response:

  • Sure, imagine you have a box of building blocks.
  • Each block represents some information and before you can add another block onto your tower, you need your friends to all agree that this block is true and correct.
  • Once they all agree, you can add your block to the tower.
  • This is the basic idea of how blockchain works.
  • Just like the block, each piece of information is recorded then checked by various computers around the world.

Total Latency = 24.691 seconds

  • Deepgram: 0.632 seconds (2.56%)
  • GPT4: 1.000 seconds (4.05%)
  • Azure: 23.059 seconds (93.38%)

How do you build a robust program with multiple developers working on different features?

Response:

  • Building a robust program with multiple developers is a task that requires efficient teamwork, clear communication, and strict organization.
  • Here are some strategies for making this process smoother:
  • 1. Use Version Control Systems: Tools like Git allow developers to work on different parts of a project simultaneously without stepping on each other's toes. Each developer can work on their own branch and then merge their changes into the main codebase when they're ready.
  • 2. Implement a Code Review Process: Before any code gets merged into the main codebase, it should be reviewed by at least one other developer. This not only helps catch potential bugs, but it also promotes knowledge sharing and maintains code consistency.
  • 3. Devise a Clear Development Process: This could include sprints, milestones and designated roles within the team. Define who is responsible for what tasks, and who will make final decisions, to avoid confusion.

Total Latency: 62.439 seconds

  • Deepgram: 1.866 seconds (2.99%)
  • GPT4: 3.083 seconds (4.94%)
  • Azure: 57.490 seconds (92.07%)

Can AI create art?

Response:

  • Yes, AI can create art.
  • It does this through techniques like neural networks and deep learning.
  • This involves generating new patterns and styles that humans may not have considered.

Total Latency = 6.281 seconds

  • Deepgram: 0.245 seconds (3.90%)
  • GPT-4: 2.136 seconds (34.01%)
  • Azure: 3.900 seconds (62.09%)

What are the benefits of renewable energy?

Response:

  • Renewable energy, like solar and wind, is beneficial.
  • These sources generate less pollution.
  • They reduce dependence on fossil fuels.
  • They provide inexhaustible energy sources.
  • This contributes to a safer and more sustainable environment.

Total Latency = 20.879 seconds

  • Deepgram: 0.158 seconds (0.76%)
  • GPT-4: 2.321 seconds (11.12%)
  • Azure: 18.400 seconds (88.12%)

How does quantum computing work?

Response:

  • Quantum computing uses the principles of quantum mechanics to process information.
  • Unlike classical computers that use bits as binary 0s or 1s, quantum computers use quantum bits or qubits.
  • Qubits can be both 0 and 1 at the same time.
  • This allows them to perform multiple calculations simultaneously.

Total Latency = 44.157 seconds

  • Deepgram: 0.399 seconds (0.90%)
  • GPT-4: 4.758 seconds (10.78%)
  • Azure: 39.000 seconds (88.32%)

Explain the theory of relativity.

Response:

  • Einstein's theory of relativity consists of two key ideas: special relativity and general relativity.
  • Special relativity posits that the laws of physics are the same for all non-accelerating observers.
  • It states that the speed of light in a vacuum is constant regardless of the motion of all observers.
  • General relativity suggests that massive objects cause a distortion in space-time.
  • This distortion is felt as gravity.

Total Latency = 53.002 seconds

  • Deepgram: 0.512 seconds (0.97%)
  • GPT-4: 5.240 seconds (9.89%)
  • Azure: 47.250 seconds (89.15%)

What is the difference between machine learning and deep learning?

Response:

  • Machine learning involves algorithms that learn from and make predictions on data.
  • Deep learning is a subset of machine learning.
  • It structures algorithms in layers to create an artificial neural network.
  • This network can learn and make intelligent decisions on its own.

Total Latency = 27.310 seconds

  • Deepgram: 0.273 seconds (1.00%)
  • GPT-4: 2.737 seconds (10.02%)
  • Azure: 24.300 seconds (88.98%)

What are some types of noodles you might recommend?

Response:

  • Some familiar types might include Italian pasta, Japanese ramen, Chinese lo mein, and Vietnamese pho.

Total Latency = 5.320 seconds

  • Deepgram: 0.250 seconds (4.70%)
  • GPT-4: 1.072 seconds (20.15%)
  • Azure: 3.998 seconds (75.15%)

What are Italian pastas like?

Response:

  • Italian pasta is diverse, coming in various shapes and sizes.
  • It's traditionally made from durum wheat semolina.
  • You may have heard of spaghetti, penne, and fettuccine, among others.

Total Latency = 8.541 seconds

  • Deepgram: 0.194 seconds (2.27%)
  • GPT-4: 1.533 seconds (17.94%)
  • Azure: 6.814 seconds (79.79%)

Interesting. How are they different from Japanese ramen?

Response:

  • Japanese ramen is distinct, primarily a noodle soup dish.
  • The broth, rich and flavorful, is a hallmark, along with diverse toppings.
  • Common toppings are slices of pork, boiled eggs, and seaweed.

Total Latency = 12.673 seconds

  • Deepgram: 0.278 seconds (2.19%)
  • GPT-4: 2.095 seconds (16.53%)
  • Azure: 10.300 seconds (81.28%)

Got it. What about Chinese lo mein then?

Response:

  • Chinese lo mein consists of wheat flour noodles, usually stir-fried.
  • They're combined with various vegetables and proteins.
  • It's a staple in both traditional Chinese and American-Chinese cuisine.

Total Latency = 9.327 seconds

  • Deepgram: 0.183 seconds (1.96%)
  • GPT-4: 1.774 seconds (19.01%)
  • Azure: 7.370 seconds (79.03%)

Nice. Lastly, tell me about Vietnamese pho.

Response:

  • Vietnamese pho is a beloved soup, featuring delicate rice noodles.
  • It's prepared with herbs and meat, often beef or chicken.
  • The aromatic broth and fresh ingredients are central to its appeal.

Total Latency = 7.498 seconds

  • Deepgram: 0.145 seconds (1.93%)
  • GPT-4: 1.503 seconds (20.04%)
  • Azure: 5.850 seconds (78.03%)

User Perceived Latency

Total Latency = Start of AI response - End of user prompt

Deepgram (Transcriber): 15.66%

GPT4 (Agent): 45.72%

Azure (Synthesizer): 38.61%

What is two plus two?

Response:

  • Two plus two equals four.

User Perceived Latency = 3.941 seconds

  • Deepgram: 0.373 seconds (9.46%)
  • GPT4: 1.617 seconds (41.03%)
  • Azure: 1.951 seconds (49.51%)

What is a vector database?

Response:

  • A vector database is a type of data storage system that deals with data in the form of vectors.
  • A vector in this context refers to an ordered set of numbers, which could often represent things like coordinates in space, attributes of an object, etc.

User Perceived Latency = 3.020 seconds

  • Deepgram (Transcriber): 0.136 seconds (4.50%)
  • GPT4 (Chat-Agent): 1.639 seconds (54.27%)
  • Azure (Synthesizer): 1.245 seconds (41.23%)

What is a blockchain?

Response:

  • Blockchain is like a set of building blocks where each block contains some information that everyone agrees is true.
  • Before adding a new block, everyone needs to agree it's correct. Once they do, it's added to the chain of previous blocks.
  • This creates a long, unchangeable record that everyone can see, ensuring the information is trustworthy without needing a central authority.

User Perceived Latency = 2.924 seconds

  • Deepgram: 0.632 seconds (21.62%)
  • GPT-4: 1.000 seconds (34.21%)
  • Azure: 1.292 seconds (44.17%)

Explain blockchain to me like I’m five.

Response:

  • Sure, imagine you have a box of building blocks.
  • Each block represents some information and before you can add another block onto your tower, you need your friends to all agree that this block is true and correct.
  • Once they all agree, you can add your block to the tower.
  • This is the basic idea of how blockchain works.
  • Just like the block, each piece of information is recorded then checked by various computers around the world.

User Perceived Latency = 2.924 seconds

  • Deepgram: 0.632 seconds (21.62%)
  • GPT-4: 1.000 seconds (34.21%)
  • Azure: 1.292 seconds (44.17%)

How do you build a robust program with multiple developers working on different features?

Response:

  • Building a robust program with multiple developers is a task that requires efficient teamwork, clear communication, and strict organization.
  • Here are some strategies for making this process smoother:
  • 1. Use Version Control Systems: Tools like Git allow developers to work on different parts of a project simultaneously without stepping on each other's toes. Each developer can work on their own branch and then merge their changes into the main codebase when they're ready.
  • 2. Implement a Code Review Process: Before any code gets merged into the main codebase, it should be reviewed by at least one other developer. This not only helps catch potential bugs, but it also promotes knowledge sharing and maintains code consistency.
  • 3. Devise a Clear Development Process: This could include sprints, milestones and designated roles within the team. Define who is responsible for what tasks, and who will make final decisions, to avoid confusion.

User Perceived Latency = 5.424 seconds

  • Deepgram: 1.083 seconds (19.97%)
  • GPT-4: 3.083 seconds (56.87%)
  • Azure: 1.258 seconds (23.19%)