In this codelab, you'll learn how to add voice interactions to your app with the Voice Interaction API. The Voice Interaction API allows users of your app to confirm actions and select from a list of options using only their voice.

What you’ll learn

What you’ll need

You can either download all the sample code to your computer...

Download Zip

...or clone the GitHub repository from the command line.

$ git clone

Let’s start off by trying the finished sample. This will help illustrate what we’re building and show you the interaction between the Google app and your own app.

  1. Select the voice-interaction-end directory from your sample code download (File >  Import Project… > voice-interaction-end).
  2. Click the Gradle sync button.
  3. Click the Run button.
  4. Open the Google app.
  5. Say “OK Google”.
  6. When the voice prompt appears, say “Take a selfie”. Remember: due to a known config issue, you may have to try this multiple times before it starts working.
  7. The Voice Camera app should open and you should be prompted to take a photo.
  8. Answer “cheese”.

That’s it! You’ve taken your first voice-powered selfie using the Voice Interaction API.

Frequently Asked Questions

Now let’s open the starter app and learn how to add voice interaction to it so that it has all the features that you just saw in the finished sample app.

  1. Select the voice-interaction-start directory from your sample code download (File >  Import Project… > voice-interaction-start).
  2. Click the Gradle sync button.
  3. Click the Run button.

You should see a camera app appear after a few seconds. That’s all that our app does so far. In the following steps, we’ll be adding voice interaction to this starter app.

The first thing that you need to do to prepare your project for the Voice Interaction API is to update your Gradle build settings to use Android M.


compileSdkVersion "android-MNC" buildToolsVersion "21.1.1" defaultConfig { minSdkVersion "android-MNC" targetSdkVersion "android-MNC" }

Once you’ve changed your build settings, make sure to do aGradle sync.

Frequently Asked Questions

Now it’s time to add some voice intents to your application. We’ll be using the and system actions which are already supported in Android.


<activity android:name="" android:label="@string/app_name" > <intent-filter> <action android:name="" /> <category android:name="android.intent.category.DEFAULT" /> <category android:name="android.intent.category.VOICE" /> </intent-filter> <intent-filter> <action android:name="" /> <category android:name="android.intent.category.DEFAULT" /> <category android:name="android.intent.category.VOICE" /> </intent-filter> </activity>

Notice that we need to specify the android.intent.category.VOICE and android.intent.category.DEFAULT categories for both intents in order for them to be called by the Voice Interaction API.

Once you have declared some voice intents for your application, you’ll need to add an activity to handle them.

Create a new class called TakePictureActivity in the package and add the following code to it.

package; import; import android.content.Intent; import android.os.Bundle; import android.util.Log; public class TakePictureActivity extends Activity { private static String TAG = "TakePictureActivity"; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); Log.d(TAG, "onCreate: "); Intent intent = getIntent(); if (intent == null) { finish(); } else if (CameraActivity.needPermissions(this)) { startActivity(new Intent(this, CameraActivity.class) .setFlags(Intent.FLAG_ACTIVITY_NEW_TASK)); finish(); return; } else if (!isVoiceInteraction()) { Log.e(TAG, "Not voice interaction"); if (intent != null) { intent.setComponent(null); intent.setPackage(""); intent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK); startActivity(intent); } finish(); return; } setContentView(R.layout.activity_camera); CameraFragment fragment = CameraFragment.newInstance(); fragment.setArguments(getIntent().getExtras()); getFragmentManager().beginTransaction() .replace(, fragment) .commit(); } }

In this activity you can see that the onCreate() method is inspecting the intent that triggers the activity and looking for a voice interaction intent with the isVoiceInteraction() helper method. You will also notice that it checks whether camera permissions are needed. This is another new feature in Android M which allows applications to obtain new permissions at runtime.

If all of the checks pass, the activitydisplays a camera using the CameraFragment. In a few more steps, you’ll see how we add voice interaction to that fragment.

In the previous step, you saw how TakePictureActivity checked for  camera permissions, and if it didn’t find them, it redirected the user to the CameraActivity.

You’ll need to make some changes to CameraActivity in order to allow the user to give the app permissions at runtime. First, add a couple of imports to the file:

import android.Manifest; import; import android.provider.MediaStore; import android.widget.Toast;

private static final int PERMISSIONS_REQUEST_ALL_PERMISSIONS = 1; private Bundle mSavedInstanceState;

protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); Intent intent = getIntent(); if (needPermissions(this)) { requestPermissions(); } else if (intent != null) { intent.setComponent(null); intent.setPackage(""); intent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK); startActivity(intent); finish(); } else { finish(); } }

static public boolean needPermissions(Activity activity) { Log.d(TAG, "needPermissions: "); return activity.checkSelfPermission(Manifest.permission.CAMERA) != PackageManager.PERMISSION_GRANTED || activity.checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED; }

private void requestPermissions() { Log.d(TAG, "requestPermissions: "); String[] permissions = new String[] { Manifest.permission.CAMERA, Manifest.permission.WRITE_EXTERNAL_STORAGE, }; requestPermissions(permissions, PERMISSIONS_REQUEST_ALL_PERMISSIONS); }

When requestPermissions() has completed, it will make a callback to the onRequestPermissionsResult method, which is where you'll need to record whether or not all of the permissions were granted.

public void onRequestPermissionsResult(int requestCode, String permissions[], int[] grantResults) { switch (requestCode) { case PERMISSIONS_REQUEST_ALL_PERMISSIONS: boolean hasAllPermissions = true; for (int i = 0; i < grantResults.length; ++i) { if (grantResults[i] != PackageManager.PERMISSION_GRANTED) { hasAllPermissions = false; Log.e(TAG, "Unable to get permission " + permissions[i]); } } if (hasAllPermissions) { finish(); } else { Toast.makeText(this, "Unable to get all required permissions", Toast.LENGTH_LONG).show(); finish(); return; } break; default: Log.e(TAG, "Unexpected request code"); } }

Now, you’ve made sure that your app has access to the camera and, if it doesn’t, the user is prompted to decide whether they want to grant camera access to your app.

Let’s get to the fun part: defining the voice interaction between your app and the assistant. First we’ll add some imports.

import; import; import; import android.view.Gravity; import android.widget.TextView; import java.util.Timer; import java.util.TimerTask;

After the assistant has captured the user’s response, the focus returns to your app and you can proceed to take a photo. In order to do this, add an if statement to the end of your onResume() method which checks for a voice interaction. If it finds one, it checks whether a timer has been specified and calls startVoiceTimer(), otherwise it calls the startVoiceTrigger() method.

@Override public void onResume() { super.onResume(); Log.d(TAG, "onResume: "); startBackgroundThread(); // When the screen is turned off and turned back on, the SurfaceTexture is already // available, and "onSurfaceTextureAvailable" will not be called. In that case, we can open // a camera and start preview from here (otherwise, we wait until the surface is ready in // the SurfaceTextureListener). if (mTextureView.isAvailable()) { openCamera(mTextureView.getWidth(), mTextureView.getHeight()); } else { mTextureView.setSurfaceTextureListener(mSurfaceTextureListener); } if (mOrientationListener.canDetectOrientation()) { mOrientationListener.enable(); } if (getActivity().isVoiceInteraction()) { if (isTimerSpecified()) { startVoiceTimer(); } else { startVoiceTrigger(); } } }

Now create the startVoiceTrigger() method. Create a voice interaction by getting the VoiceInteractor from the activity. As you can see, you can easily customize the Option that gets offered to the user. Add synonyms for “cheese”, “ready”, “go”, “take it”, and “ok”.

private void startVoiceTrigger() { Log.d(TAG, "startVoiceTrigger: "); Option option = new Option("cheese"); option.addSynonym("ready"); option.addSynonym("go"); option.addSynonym("take it"); option.addSynonym("ok"); getActivity().getVoiceInteractor() .submitRequest(new PickOptionRequest("Say Cheese", new Option[]{option}, null) { @Override public void onPickOptionResult(boolean finished, Option[] selections, Bundle result) { if (finished && selections.length == 1) { Message message = Message.obtain(); message.obj = result; takePicture(); } else { getActivity().finish(); } } @Override public void onCancel() { getActivity().finish(); } }); }

If the user chooses your Option using any one of those synonyms, the takePicture() method will be called; otherwise we exit the activity.

The next step is to add the voice interaction capabilities to your camera fragment. First, add the newInstance() method to help create new instances of this fragment.

public static CameraFragment newInstance() { Log.d(TAG, "newInstance: "); CameraFragment fragment = new CameraFragment(); fragment.setRetainInstance(true); return fragment; }

In the onCreateView() method, you should check for voice interactions and hide the camera controls if the user is interacting through voice.

@Override public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) { Log.d(TAG, "onCreateView: "); View view = inflater.inflate(R.layout.fragment_camera2_basic, container, false); if (getActivity().isVoiceInteraction()) { View controls = view.findViewById(; if (controls != null) { controls.setVisibility(View.GONE); } } return view; }

Finally, we’ll enhance the showToast() method so that it not only shows a visual confirmation that a photo was taken, but also confirms through voice by saying “Here it is”.

private void showToast(String text) { // We show a Toast by sending request message to mMessageHandler. This makes sure that the // Toast is shown on the UI thread. Activity activity = getActivity(); if (activity.isVoiceInteraction()) { Message message = Message.obtain(); message.obj = text; //mSharingHandler.sendMessage(message); Uri contextUri = Uri.fromFile(mFile); Log.e(TAG, "PHOTO URI: " + contextUri); Log.e(TAG, "PHOTO LOCATION: " + mFile.getAbsolutePath()); Log.e(TAG, "showToast:" + Log.getStackTraceString(new Exception())); Bundle extras = new Bundle(); extras.putParcelable("context_uri", contextUri); activity.getVoiceInteractor().submitRequest( new VoiceInteractor.CompleteVoiceRequest("Here it is", extras) { @Override public void onCompleteResult(Bundle result) { super.onCompleteResult(result); Log.d(TAG, "OnCompleteResult:" + Log.getStackTraceString(new Exception())); Intent intent = new Intent(); intent.setAction(Intent.ACTION_VIEW); intent.setDataAndType(Uri.parse("file://" + mFile.getAbsolutePath()), "image/*"); intent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK); intent.setFlags(Intent.FLAG_ACTIVITY_CLEAR_TOP); getActivity().finish(); startActivity(intent); } }); } else { Message message = Message.obtain(); message.obj = text; mMessageHandler.sendMessage(message); } }

Now the camera fragment is ready to handle voice interactions.

When the camera is taking a photo in hands-free mode, it’s helpful to have a counter so that you know when to smile.  First, you need to add a couple of attributes to CameraFragment.

private static final String EXTRA_TIMER_DURATION_SECONDS = "android.intent.extra.TIMER_DURATION_SECONDS"; private TextView mTimerCountdownLabel = null; private Toast mTimerCountdownToast = null;

Then, the startVoiceTime() method needs to be added to handle create the toast and start the timer.

private void startVoiceTimer() { Log.d(TAG, "startVoiceTimer: "); final int countdown = getArguments().getInt(EXTRA_TIMER_DURATION_SECONDS); mTimerCountdownToast = new Toast(getActivity().getApplicationContext()); mTimerCountdownToast.setGravity(Gravity.CENTER, 0, 0); mTimerCountdownToast.setDuration(Toast.LENGTH_SHORT); LayoutInflater inflater = getActivity().getLayoutInflater(); View layout = inflater.inflate(R.layout.toast_timer, (ViewGroup) getActivity().findViewById(; mTimerCountdownToast.setView(layout); final TextView label = (TextView) layout.findViewById(; Timer timer = new Timer("camera_timer"); timer.scheduleAtFixedRate(new TimerTask() { private int mCountdown = countdown; @Override public void run() { getActivity().runOnUiThread(new Runnable() { @Override public void run() { if (mCountdown < 0) { Log.e(TAG, "Take photo: " + mCountdown); mTimerCountdownToast.cancel(); takePicture(); } else { Log.e(TAG, "Execute timer: " + mCountdown); label.setText(String.format("Photo in %d", mCountdown));; } } }); mCountdown--; if (mCountdown < 0) { cancel(); } } }, 1000, 1000); }

We also need to create a helper method to check for the presence of a timer and

private boolean isTimerSpecified() { Log.d(TAG, "isTimerSpecified: "); return getArguments() != null && getArguments().containsKey(EXTRA_TIMER_DURATION_SECONDS); }

That’s all the code that you need to add. There are a few layouts that you need to add and then you’re done!

Weneed to make some updates to the UI. First, let’s create a layout for the MainActivity. This is a very simple layout with a TextView that shows an intro message.


<LinearLayout xmlns:android="" android:layout_width="match_parent" android:layout_height="match_parent" android:orientation="vertical"> <LinearLayout style="@style/Widget.SampleMessageTile" android:layout_width="match_parent" android:layout_height="wrap_content" android:orientation="vertical"> <TextView style="@style/Widget.SampleMessage" android:layout_width="match_parent" android:layout_height="wrap_content" android:layout_marginLeft="@dimen/horizontal_page_margin" android:layout_marginRight="@dimen/horizontal_page_margin" android:layout_marginTop="@dimen/vertical_page_margin" android:layout_marginBottom="@dimen/vertical_page_margin" android:text="@string/intro_message" /> </LinearLayout> </LinearLayout>

We also need to create a layout for the toast that shows our countdown time. Here’s what that looks like:


<?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android="" android:id="@+id/toast_layout_root" android:background="@drawable/bg_toast" android:orientation="vertical" android:layout_width="match_parent" android:layout_height="match_parent" android:padding="20dp"> <TextView android:id="@+id/countdown_text" android:layout_width="match_parent" android:layout_height="match_parent" android:textAppearance="?android:attr/textAppearanceLarge"/> </LinearLayout>

And that’s it, we’re ready to try taking a photo with only our voice!

Now you’re test your app to make sure that the voice interaction works properly.

  1. Click the Gradle sync button.
  2. Click the Run button.
  3. Open the Google app.
  4. Say “OK Google”.
  5. When the voice prompt appears, say “Take a selfie”.
  6. The Voice Camera app should open and you should be prompted to take a photo.
  7. Answer “cheese”.

That’s it! You’ve built your first voice-powered camera app using the Voice Interaction API, and taken a pretty sweet photo

Your app is now ready to handle support voice interaction in your app. As you can see there are many areas where a hands-free interface like this could make your app easier to use and more fun for users.

What we've covered

Next Steps

If you would like to find out more about the Voice Interaction API, please see the full developer documentation.

You can post questions and find answers on Stackoverflow under the google-search tag.