thumbnail

Speech to Text Convertor Using HTML, CSS and JavaScript

Introduction :

The Speech to Text Converter project is a web-based application designed to transform spoken language into written text. Utilizing HTML, CSS, and JavaScript, this application leverages the Web Speech API to provide a seamless speech recognition experience. The primary interface includes a text area for displaying transcriptions and a microphone button to start and stop the speech recognition process.

Explanation :

HTML Structure

The HTML structure of the Speech to Text Converter includes several key components:

  1. Title: The <title> element sets the title of the webpage, which appears on the browser tab.
  2. Heading: An <h1> element provides the main title of the page, “Speech to Text Converter,” making it clear to the user what the application does.
  3. Wrapper Div: A <div> element with the ID wrapper contains a <textarea> where the transcribed text will be displayed. This textarea is styled to appear as a large box suitable for displaying long text.
  4. Container Div: Another <div> element with the class container holds the interactive elements, such as the microphone button (<img> element) and an instructional text (<span> element). This container is centrally positioned and styled to provide a user-friendly interface.

CSS Styling

The CSS styles applied to the HTML elements enhance the visual appeal and usability of the application:

  1. Body Styling: The background color of the body is set to a light gray (#dbdbdb), providing a neutral and calming background.
  2. Container Positioning: The container is centrally aligned and given a high z-index to ensure it stays on top of other elements. Its opacity is set for a subtle visual effect.
  3. Heading Styling: The heading uses a sans-serif font and has some opacity to blend softly with the background.
  4. Play Button: The microphone button (.playButton) is styled with specific dimensions and a hover effect that adds a box shadow, making it visually responsive to user interaction.
  5. Textarea Styling: The textarea is styled with ample padding and a light background color to ensure text readability. It also has rounded corners and is made non-resizable to maintain consistent layout.
  6. Instruction Text: This text is styled with a clear, sans-serif font for easy readability.

JavaScript Logic

The JavaScript file contains the core functionality of the speech recognition process:

  1. Initialization: Upon loading the page, the speechToTextConversion function initializes the speech recognition service. This function sets up the speech recognition object from the Web Speech API, configuring it for continuous recognition in English (India).
  2. Microphone Button: The microphone button is set up with an onclick event handler that toggles the speech recognition on and off. When the button is clicked:
    • If recognition is not started, the button image changes to indicate recording, and the recognition service starts.
    • If recognition is already started, the button image reverts to its original state, and the recognition service stops.
  3. Handling Recognition Results: The onresult event handler processes the speech recognition results:
    • It captures the transcribed text from the speech recognition event and updates the textarea with this text.
    • The confidence level of the recognition result is logged to the console for debugging purposes.
  4. Error Handling:
    • The onnomatch event handler displays a message in the textarea if the speech is not recognized.
    • The onerror event handler displays an error message in the textarea if an error occurs during recognition.

Purpose of Functions

  • speechToTextConversion Function: This function initializes the speech recognition settings and defines the behavior for starting and stopping the recognition process.
  • Microphone Button Event Handler: This function toggles the state of the microphone button, switching between starting and stopping the speech recognition process.
  • onresult Event Handler: This function handles the results from the speech recognition service, updating the textarea with the transcribed text and logging the confidence level.
  • onnomatch Event Handler: This function provides user feedback when the speech input is not recognized.
  • onerror Event Handler: This function provides user feedback in case of errors during the speech recognition process.

Conclusion

The Speech to Text Converter project combines HTML, CSS, and JavaScript to create a functional and visually appealing application that converts spoken language into text. By utilizing the Web Speech API, it offers a modern and interactive experience for users, making speech recognition accessible through a simple web interface. This project can serve as a foundation for more advanced applications or be extended with additional features such as language selection, improved error handling, and integration with other services.

SOURCE CODE:

HTML (Index.html)

				
					<title>Speech to Text</title>






	<h1>Speech to Text Converter</h1>
	<div id="wrapper">

			<textarea id="text" name="text" rows="4" style="overflow: hidden;height: 160px"></textarea>

	</div>

	<div class="container">
		<br><br><br>
		<img data-lazyloaded="1" src="" decoding="async" id="playButton" class="playButton" data-src="mic.png">
		<br>
		<span class="instruction">
			Click on the mic to start recording
		</span>
	</div>
				
			

CSS (style.css)

				
					body {
	background-color: #dbdbdb;

}

.container {
	position: block;
	z-index: 3;
	top: 40%;
	left: 5%;
	opacity: 1;

}

h1 {
	margin-top: 3rem;
	font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
	opacity: .8;
}

.playButton {
	position: inline;
	height: 70px;
	width: 70px;
}

.playButton:hover {
	box-shadow: 0 6px 20px 0 #00ff15;
}

.submitButton {
	position: relative;
	width: 30%;
	border: none;
	font-size: 28px;
	color: black;
	padding: 10px;
	width: 1rem;
	text-align: center;
	-webkit-transition-duration: 0.4s;
	transition-duration: 0.4s;
	text-decoration: none;
	overflow: hidden;
	cursor: pointer;
	border-radius: 10px 10px 10px 10px;

}

.submitButton:hover {
	box-shadow: 0px 7px 1px 0px rgba(0, 0, 0, 0.0), 0 6px 20px 0 rgba(0, 0, 0, 0.19);

}

.submitButton:after {
	content: "";
	display: block;
	position: absolute;
	padding-top: 300%;
	padding-left: 350%;
	margin-left: -20px !important;
	margin-top: -120%;
	opacity: 0;
	transition: all 0.8s
}

.submitButton:active:after {
	padding: 0;
	margin: 0;
	opacity: 1;
	transition: 0s;
}

.recogText {
	position: block;
	top: 70%;
	left: 5%;
	right: 5%;
	height: 100%;
	width: 99.3%;
	font-size: 25px;
}


#paper {
	font-size: 20px;
	z-index: 2;
}

#margin {
	margin-left: 12px;
	margin-bottom: 20px;
	-webkit-user-select: none;
	-moz-user-select: none;
	-ms-user-select: none;
	-o-user-select: none;
	user-select: none;
}

#text {
	width: 500px;
	height: 1000px;
	overflow: hidden;
	background-color: #cdcdcd;
	border: none;
	color: #222;
	font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
	font-weight: normal;
	font-size: 24px;
	resize: none;
	line-height: 40px;
	padding-left: 50px;
	padding-right: 50px;
	padding-top: 20px;
	padding-bottom: 120px;
	-webkit-border-radius: 12px;
	border-radius: 12px;
	margin-top: 5rem;
}

#title {
	background-color: transparent;
	border-bottom: 3px solid #FFF;
	color: #FFF;
	font-size: 20px;
	font-family: Courier, monospace;
	height: 28px;
	font-weight: bold;
	width: 220px;
}

.instruction {
	font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serifS;
}

#button {
	cursor: pointer;
	margin-top: 20px;
	float: right;
	height: 40px;
	padding-left: 24px;
	padding-right: 24px;
	font-family: Arial, Helvetica, sans-serif;
	font-weight: bold;
	font-size: 20px;
	color: red;
	text-shadow: 0px -1px 0px #000000;
	-webkit-border-radius: 8px;
	border-radius: 8px;
	border-top: 1px solid #FFF;
	-webkit-box-shadow: 0px 2px 14px #000;
	box-shadow: 0px 2px 14px #000;
	background-color: #62add6;
	background-image: url(https://static.tumblr.com/maopbtg/ZHLmgtok7/button.png);
	background-repeat: repeat-x;
}

#button:active {
	zoom: 1;
	filter: alpha(opacity=80);
	opacity: 0.8;
}

#button:focus {
	zoom: 1;
	filter: alpha(opacity=80);
	opacity: 0.8;
}

#wrapper {
	width: 700px;
	height: auto;
	margin-left: auto;
	margin-right: auto;
	margin-top: 24px;
	/* margin-bottom: 100px; */
}


.th {
	position: absolute;
	transition: transform .2s;
	top: 200px;
	font-family: Segoe Print;
	left: 50px;
	color: yellow;
	-ms-transform: rotate(278deg);
	transform: rotate(342deg);
	text-shadow: 1px 3px #8CF7F2;
	font-size: 25px;
	-webkit-animation: glow 1s ease-in-out infinite alternate;
	-moz-animation: glow 1s ease-in-out infinite alternate;
	animation: glow 1s ease-in-out infinite alternate;

}

.th:hover {

	transform: scale(1.1);
	-ms-transform: rotate(278deg);
	/* IE 9 */
	transform: rotate(350deg);
	font-size: 40px
}

.blink {
	animation: blinker 1s linear infinite;
	color: yellow;
	font-size: 26px;
	font-weight: bold;
	font-family: cursive;
}

@keyframes blinker {
	50% {
		opacity: 0;
	}
}
				
			

JavaScript (app.js)

				
					var i = 0;


function speechToTextConversion() {
	var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition

	var SpeechRecognitionEvent = SpeechRecognitionEvent || webkitSpeechRecognitionEvent

	var recognition = new SpeechRecognition();

	recognition.continuous = true;
	recognition.lang = 'en-IN';
	recognition.interimResults = true;
	recognition.maxAlternatives = 1;

	var diagnostic = document.getElementById('text');


	var i = 0;
	var j = 0;
	document.getElementById("playButton").onclick = function () {
		if (i == 0) {
			document.getElementById("playButton").src = "record-button-thumb.png";
			recognition.start();
			i = 1;
		}
		else {
			document.getElementById("playButton").src = "mic.png";
			recognition.stop();
			i = 0;
		}
	}
	recognition.onresult = function (event) {
		var last = event.results.length - 1;
		var convertedText = event.results[last][0].transcript;
		diagnostic.value = convertedText;
		console.log('Confidence: ' + event.results[0][0].confidence);
	}

	recognition.onnomatch = function (event) {
		diagnostic.value = 'I didnt recognise that.';
	}
	recognition.onerror = function (event) {
		diagnostic.value = 'Error occurred in recognition: ' + event.error;
	}
};
				
			

Download the resources used in this project :

OUTPUT :

OUTPUT