1. HowTo

How to use Intel Perceptual Computing SDK for human-robot interface

Obtain PDF

1. Introduction

This text offers a short overview of Rover, then focuses on our implementation of the human-robot interface using the Intel® Perceptual Computing SDK for gesture and face detection. For a brief introduction to Rover’s options, see the Intel® Developer Zone video from Recreation Builders Convention 2014 in San Francisco:

In comparatively up to date instances robots have both been relegated behind closed doorways of enormous industrial manufacturing crops or demonized in films corresponding to Terminator the place they have been depicted as destroyers of the human race. Each stereotypes contribute to creating an unfounded concern in self-operating machines shedding management and harming the dwelling. However now, vacuum-cleaning and lawn-mowing robots, amongst others, are starting a brand new pattern: service robots as devoted helpers in shared environments with people. The miniaturization and cost-effective manufacturing of vary and localization sensors on the one hand and the ever-increasing compute energy of contemporary processors on the opposite, allow the creation of sensible, sensing robots for home use instances.

Sooner or later, robots would require clever interactions with their surroundings, together with adapting to human feelings. State-of-the-art {hardware} and software program, such because the Intel Perceptual Computing SDK paired with the Inventive* Interactive Gesture Digital camera, are paving the way in which for smarter, linked gadgets, toys, and home helpers.

2. Cubotix Rover

When Intel introduced the Perceptual Computing Problem in 2013, our workforce, Devy and Martin Wojtczyk, brainstormed doable use instances using the Intel Perceptual Computing SDK. The mixture of a USB-powered digital camera with an built-in depth sensor and an SDK that permits gesture recognition, face detection, and voice interplay resulted in us constructing an autonomous, cell, gesture-controlled and sensing robotic referred to as Rover. We have been very excited to be chosen for an award. Since then, we launched the web site with updates on Rover and are within the course of of making an open {hardware} group.

The Cubotix Rover is our try to make use of superior robotic algorithms to remodel off-the-shelf {hardware} into a sensible dwelling robotic, able to studying and understanding unknown environments with out prior programming. As an alternative of unintuitive management panels, the robotic is instructed by way of gestures, pure language, and even facial expressions. Superior robotic algorithms make Rover location conscious and allow it to plan collision-free paths.

2.1. Gesture Recognition


Determine 1: Displaying a thumbs-up gesture makes Rover completely happy and mobilizes the robotic. Photograph courtesy California Academy of Sciences.

Hand gestures are a typical type of communication amongst people. Consider the police officer in the course of a loud intersection in Instances Sq. gesturing the cease signal along with his open palm going through approaching visitors.Rover is supplied to acknowledge, reply to, and act available gestures captured by way of the 3D digital camera. You may mobilize this robotic by gesturing thumbs-up, and in response it is going to additionally say “Let’s go!” This robotic frowns if you gesture a thumbs-down. Gesturing a high-five renders Rover to crack jokes, corresponding to “If I had arms, I would totally high-five you”. Gesturing a peace signal renders Rover to say “Peace”. These hand gestures and the ensuing robotic vocal responses are fully customizable and programmable.


Determine 2: Showing a thumbs-down gesture stops the robotic and makes it unhappy. Photograph courtesy California Academy of Sciences.

2.2. Facial Recognition

Facial features is probably probably the most revealing and trustworthy of all the opposite technique of communication. Recognition of those expressions and with the ability to reply appropriately or inappropriately can imply the distinction between forming a bond or a division with one other human being. With synthetic intelligence the hole separating machines and people can start to shut if robots are in a position to empathize. By capturing facial expressions by way of the digital camera, Rover can detect smiles or frowns and reply appropriately. Rover is aware of when a human has come close to it by way of its facial detection algorithms and might greet them by saying “Hello my name is Rover. What’s your name?”, to which most individuals have responded simply as they’d with one other human being by saying “Hello I’m ______”. After initiating the dialog, Rover makes use of the Perceptual Computing SDKs face evaluation options to differentiate three doable states of the particular person in entrance of the digital camera: completely happy, unhappy, or impartial and might reply with an applicable empathetic expression: “Why are you sad today?” or “Glad to see you happy today!” Furthermore the SDKs face recognition permits Rover to be taught and distinguish between people for a customized expertise.

3. {Hardware} Structure


Determine 3: Rover’s cell LEGO* platform. Centrally situated with glowing inexperienced buttons is the LEGO Mindstorms* EV3 microcontroller, which is linked to the servos that transfer the bottom. Additionally observe the help buildings and the locking mechanism to mount a laptop computer.

Rover makes use of extensively accessible and inexpensive off-the-shelf {hardware} that many individuals might already personal and might rework into a sensible dwelling robotic. It consists of a cell LEGO platform that carries a depth-camera and a laptop computer for notion, picture processing, path-planning, and human-robot interplay. The LEGO Mindstorms* EV3 set is a good instrument for speedy prototyping custom-made robotic fashions. It features a microcontroller, sensors, and three servos with encoders, which permit for straightforward calculation of travelled distances.


Determine 4: Rover’s cell platform with an hooked up Inventive* Interactive Gesture Digital camera for gesture recognition, face detection, and 3D notion.

The Inventive Interactive Gesture Digital camera hooked up to EV3 comprises a QVGA depth sensor and a HD RGB picture sensor. The 0.5ft to three.5ft working vary of the depth sensor permits for 3D notion of objects and obstacles in close to vary. It’s powered solely by the USB port and would not require an extra energy provide, which makes it match for cell use on a robotic. Rover’s laptop-an Ultrabook™ with an Intel® Core i7 processor and a contact screen-is mounted on prime of the cell LEGO platform and interfaces the digital camera and the LEGO microcontroller. The laptop computer is highly effective sufficient to carry out face detection and gesture and speech recognition and to judge the depth photographs in mushy actual time to steer the robotic and keep away from obstacles. All depth photographs and encoder information from the servos are filtered and mixed right into a map, which serves the robotic for indoor localization and collision-free path planning.


Determine 5: Full Rover meeting with the cell LEGO* platform base, the Inventive* Interactive Gesture Digital camera within the entrance and the laptop computer hooked up and locked in place.

4. Software program Structure


Determine 6: Rover’s software program structure with most parts for notion, a few planners, and some utility use instances. All of those constructing blocks run concurrently in a number of threads and talk with one another by way of messages. The green-tinted parts make the most of the Intel® Perceptual Computing SDK. All different modules are custom-built.

Rover’s management software program is a multi-threaded utility integrating a graphical person interface applied within the cross-platform utility framework Qt, a notion layer using the Intel Perceptual Computing SDK, and custom-built planning, sensing, and {hardware} interface parts. CMake*, a well-liked open-source construct system, is used to seek out all obligatory dependencies, configure the mission, and create a Visible Studio* answer on Home windows*. The appliance runs on an Ultrabook laptop computer working the Home windows working system and mounted instantly on the cell LEGO platform.

As proven in Determine 7, the applying layer has three completely different use case parts: the seen and audible Human-Robotic Interface, an Exploration use case that lets Rover discover a brand new and unknown surroundings, and a smartphone distant management of the robotic. The planning layer features a collision-free path planner primarily based on a discovered map and a process planner that decides for the robotic to maneuver, discover, and work together with the person. A bigger variety of parts type the notion layer, which is frequent for service robots as they need to sense their usually unknown environments and reply safely to surprising modifications. Simultaneous Localization and Mapping (SLAM) and Impediment Detection are custom-built and primarily based on the depth photographs from the Perceptual Computing SDK, which additionally gives the performance for gesture recognition, face detection, and speech recognition.

The next sections briefly cowl the Human-Robotic Interface and describe in additional element the implementation of gesture recognition and face detection for the robotic.

4.1. Consumer Interface

The human-robot interface of Rover is applied as a Qt5 utility ]. Qt consists of instruments for window and widget creation and generally used options, corresponding to threads and futures for concurrent computations. The principle window depicts a stylized face consisting of two buttons: for the robotic’s eyes and mouth. Relying on the robotic’s temper the mouth types a smile or a frown. When no one interacts with the robotic, it goes to sleep. When it detects an individual in entrance of it, it wakes up and responds to gestures, which set off actions. The robotic’s major program launches a number of completely different threads for the detection of various Intel Perceptual Computing options. It makes use of Qt’s central sign/slot mechanism for communication between objects and threads. Qt’s implementation of future lessons is utilized each time the robotic speaks for asynchronous speech output.

4.2. Notion

The robotic’s notion depends on the digital camera that includes a shade and a depth sensor. The digital camera is interfaced by way of the SDK, which permits functions to simply combine gesture, face, and speech recognition, as effectively speech synthesis.

4.2.1. Gesture Recognition

Easy, easy-to-learn hand gestures, that are realized using the SDK, set off most of Rover’s actions. When an individual exhibits a thumbs-up gesture, the robotic will look completely happy, say “Let’s go!” and might begin autonomous driving or one other configured motion. When the robotic is proven a thumbs-down gesture, it is going to placed on a tragic face, vocalize its unhappiness, and cease cell actions in its default configuration. When displaying the robotic a high-five, it is going to crack a joke. Rover responds to all the SDK’s default gestures, however right here we’ll simply give attention to these three: thumbs-up, thumbs-down, and high-five.

Rover’s gesture recognition is applied in a category GesturePipeline, which runs in a separate thread and is predicated on the category UtilPipeline out of the comfort library pxcutils within the SDK and QObject from the Qt framework. GesturePipeline implements the 2 digital UtilPipeline features OnGesture() and OnNewFrame() and emits a sign for every acknowledged gesture. The category additionally implements the 2 slots work() and cleanup(), that are required to maneuver the pipeline into its personal QThread. Subsequently, the declaration of GesturePipeline may be very easy and just like the offered gesture pattern:

































01 #ifndef GESTUREPIPELINE_H
02 #outline GESTUREPIPELINE_H
03  
04 #embrace
05 #embrace “util_pipeline.h”
06  
07 class GesturePipeline : public QObject, public UtilPipeline
08 {
09 Q_OBJECT
10  
11 public:
12 GesturePipeline();
13 digital ~GesturePipeline();
14  
15 digital void PXCAPI OnGesture(PXCGesture::Gesture *information);
16 digital bool OnNewFrame();
17  
18 protected:
19 PXCGesture::Gesture m_gdata;
20  
21 alerts:
22 void gesturePoseThumbUp();
23 void gesturePoseThumbDown();
24 void gesturePoseBig5();
25 // … additional gesture alerts
26  
27 public slots:
28 void work();
29 void cleanup();
30 };
31  
32 #endif /* GESTUREPIPELINE_H */

Itemizing: GesturePipeline.h

Apart from the empty default constructor and destructor, implementation in GesturePipeline.cpp is proscribed to the 4 strategies talked about above. The strategy work() is executed when the pipeline thread is began as a QThread object. It permits gesture processing from inside UtilPipeline and runs its LoopFrames() technique to course of the digital camera’s photographs and acknowledge gestures in subsequent picture frames. The implementation of labor() is as follows:







01 void GesturePipeline::work()
02 {
03 EnableGesture();
04 if (!LoopFrames()) wprintf_s(L”Failed to initialize or stream data”);
05  
06 };

Itemizing: GesturePipeline.cpp – work()

The strategy cleanup() known as when the GesturePipeline thread is terminated. On this case it does nothing and is applied as an empty perform.

As soon as began by way of LoopFrames(), UtilPipeline calls OnNewFrame() for each acquired picture body. To proceed processing and recognizing gestures, this perform returns true on each name.





01 bool GesturePipeline::OnNewFrame()
02 {
03 return true;
04 };

Itemizing: GesturePipeline.cpp – OnNewFrame()

OnGesture() known as from UtilPipeline when a gesture is acknowledged. It queries the info parameter for activated gesture labels and emits an applicable Qt sign.






















01 void PXCAPI GesturePipeline::OnGesture(PXCGesture::Gesture *information)
02 {
03 if (data->lively)
04 {
05 change (data->label)
06 {
07 case PXCGesture::Gesture::LABEL_POSE_THUMB_UP:
08 emit gesturePoseThumbUp();
09 break;
10  
11 case PXCGesture::Gesture::LABEL_POSE_THUMB_DOWN:
12 emit gesturePoseThumbDown();
13 break;
14  
15 case PXCGesture::Gesture::LABEL_POSE_BIG5:
16 emit gesturePoseBig5();
17 break;
18 // … additional gestures
19 }
20 }
21 };

Itemizing: GesturePipeline.cpp – OnGesture()

The emitted Qt alerts would have little impact in the event that they weren’t linked to applicable slots of the applying’s major management thread MainWindowCtrl. Subsequently, it declares slots for every sign and implements the robotic’s actions.










01 class MainWindowCtrl :public QObject
02 {
03 Q_OBJECT
04  
05 public slots:
06 void gesturePoseThumbUp();
07 void gesturePoseThumbDown();
08 void gesturePoseBig5();
09 // … additional gesture slots

Itemizing: MainWindowCtrl.h snippet declaration of gesture slots.

The implementation of the actions triggered by the abovementioned gestures is pretty easy. The robotic’s state variable is switched to RUNNING or STOPPED, and the robotic’s temper is switched between HAPPY and SAD. Voice suggestions is assigned accordingly and spoken asynchronously by way of SpeakAsync, a way using the QFuture class of the Qt framework for asynchronous computation.































01 void MainWindowCtrl::gesturePoseThumbUp()
02 {
03 std::wstring sentence(L”Let’s go!”);
04 SpeakAsync(sentence);
05  
06 temper = HAPPY;
07 state = RUNNING;
08 stateChange();
09 };
10  
11 void MainWindowCtrl::gesturePoseThumbDown()
12 {
13 std::wstring sentence(L”Aww”);
14 SpeakAsync(sentence);
15  
16 temper = SAD;
17 state = STOPPED;
18 stateChange();
19 };
20  
21 void MainWindowCtrl::gesturePoseBig5()
22 {
23 std::wstring sentence(L”I would completely excessive 5 you, if I
24 had arms.”);
25 SpeakAsync(sentence);
26  
27 temper = HAPPY;
28 state = STOPPED
29 stateChange();
30 };

Itemizing: MainWindowCtrl.cpp – gesture slot implementation.

The one lacking piece between the alerts of GesturePipeline and the slots of MainWindowCtrl is the setup process applied in a QApplication object, which creates the GesturePipeline thread and the MainWindowCtrl object and connects the alerts to the slots. The next itemizing exhibits methods to create a QThread object, transfer the GesturePipeline to that thread, join the thread’s begin/cease alerts to the pipeline’s work()/cleanup() strategies and the gesture alerts to the suitable slots of the principle thread.




















01 // create the gesture pipeline employee thread
02 gesturePipeline = new GesturePipeline;
03 gesturePipelineThread = new QThread(this);
04 // join the alerts from the thread to the employee
05 join(gesturePipelineThread, SIGNAL(began()),
06 gesturePipeline, SLOT(work()));
07 join(gesturePipelineThread, SIGNAL(began()),
08 gesturePipeline, SLOT(cleanup()));
09 gesturePipeline->moveToThread(gesturePipelineThread);
10 // Begin occasion loop and emit Thread->began()
11 gesturePipelineThread->begin();
12 // join gestures from pipeline to mainWindowCtrl
13 join(gesturePipeline, SIGNAL(gesturePoseThumbUp()),
14 mainWindowCtrl, SLOT(gesturePoseThumbUp()));
15 join(gesturePipeline, SIGNAL(gesturePoseThumbDown()),
16 mainWindowCtrl, SLOT(gesturePoseThumbDown()));
17 join(gesturePipeline, SIGNAL(gesturePoseBig5()),
18 mainWindowCtrl, SLOT(gesturePoseBig5()));
19 // … additional gestures

Itemizing: Software.cpp – gesture setup

4.2.2. Face Detection

When Rover stands nonetheless and no one interacts with it, it closes its eyes and goes to sleep. Nevertheless, when an individual exhibits up in entrance of the robotic, Rover will get up and greet them. This performance is realized utilizing the SDK’s face detector.

Face detection is applied in a category FacePipeline that’s structured similar to GesturePipeline and is predicated on the Face Detection pattern within the SDK’s documentation . It runs in a separate thread and is derived from the lessons UtilPipeline and QObject. FacePipeline implements the digital UtilPipeline features OnNewFrame() and emits a sign when at the very least one face is detected within the body and a sign if no face is detected within the body. It additionally implements the 2 slots work() and cleanup(), that are required to maneuver the pipeline into its personal QThread. Following is the declaration of FacePipeline:



























01 #ifndef FACEPIPELINE_H
02 #outline FACEPIPELINE_H
03  
04 #embrace
05 #embrace “util_pipeline.h”
06  
07 class FacePipeline : public QObject, public UtilPipeline
08 {
09 Q_OBJECT
10  
11 public:
12 FacePipeline();
13 digital ~FacePipeline();
14  
15 digital bool OnNewFrame();
16  
17 alerts:
18 void faceDetected();
19 void noFaceDetected();
20  
21 public slots:
22 void work();
23 void cleanup();
24 };
25  
26 #endif /* FACEPIPELINE_H */

Itemizing: FacePipeline.h

The constructor, destructor, and the cleanup() strategies are empty. The strategy work() calls LoopFrames() and begins UtilPipeline.






01 void FacePipeline::work()
02 {
03 if (!LoopFrames()) wprintf_s(L”Didn’t initialize or stream
04 information”);
  };

Itemizing: FacePipeline.cpp – work()

The strategy OnNewFrame known as by UtilPipeline for each acquired body. It queries the face analyzer module of the Intel Perceptual Computing SDK, counts the variety of detected faces, and emits the suitable alerts.

























01 bool FacePipeline::OnNewFrame()
02 {
03 // question the face detector
04 PXCFaceAnalysis* faceAnalyzer = QueryFace();
05 // loop all faces
06 int faces = 0;
07 for (int fidx = 0; ; fidx++)
08 {
09 pxcUID fid = 0;
10 pxcU64 timeStamp = 0;
11 pxcStatus sts = faceAnalyzer->QueryFace(fidx, &fid,
12 &timeStamp);
13 if (sts < PXC_STATUS_NO_ERROR) // no extra faces
14 break;
15 else
16 faces++;
17 };
18 if (faces > 0)
19 emit faceDetected();
20 else
21 emit noFaceDetected();
22  
23 return true;
24 };

Itemizing: FacePipeline.cpp – OnNewFrame()

Respective slots for the face detector are declared within the utility’s major management thread:








01 class MainWindowCtrl :public QObject
02 {
03 Q_OBJECT
04  
05 public slots:
06 void faceDetected();
07 void noFaceDetected();

Itemizing: MainWindowCtrl.h – declaration of face detector slots.

The implementation of the face detector slots replace the robotic’s sleep/awake state, its temper, and its program state. When no face is detected, a timer is launched that places the robotic to sleep until the robotic is finishing up a process. This renders the strategies simple to implement.




























01 void MainWindowCtrl::faceDetected()
02 {
03 // solely transition to the subsequent step, when this system is at
04 // START
05 if (state == START)
06 {
07 awake = AWAKE;
08 temper = HAPPY;
09 state = FACE_DETECTED;
10 stateChange();
11 };
12 };
13  
14 void MainWindowCtrl::noFaceDetected()
15 {
16 if ((state != START) && (state != RUNNING))
17 {
18 startOrContinueAwakeTimeout();
19 if (awakeTimeout)
20 {
21 awake = ASLEEP;
22 temper = HAPPY;
23 state = START;
24 stateChange();
25 };
26 };
27 };

Itemizing: MainWindowCtrl.cpp – face detector slot implementation.

Much like the gesture recognizer, the principle utility creates the FacePipeline object, strikes it right into a Qt thread to run concurrently, and connects the face detector alerts to the suitable slots of the principle management thread.

















01 // create the face pipeline employee thread
02 facePipeline = new FacePipeline;
03 facePipelineThread = new QThread(this);
04 // join the alerts from the thread to the employee
05 join(facePipelineThread, SIGNAL(began()), facePipeline,
06 SLOT(work()));
07 join(facePipelineThread, SIGNAL(completed()), facePipeline,
08 SLOT(cleanup()));
09 facePipeline->moveToThread(facePipelineThread);
10 // Begin occasion loop and emit Thread->began()
11 facePipelineThread->begin();
12 // join occasions from face pipeline to mainWindowCtrl
13 join(facePipeline, SIGNAL(faceDetected()), mainWindowCtrl,
14 SLOT(faceDetected()));
15 join(facePipeline, SIGNAL(noFaceDetected()),
16 mainWindowCtrl, SLOT(noFaceDetected()));

Itemizing: Software.cpp – face detector setup. 5. RESULTS

Primarily based on our observations at latest exhibitions within the U.S. and Europe, together with Cellular World Congress, Maker Faire, CeBIT, California Academy of Science, Robotic Block Get together, and Recreation Developer’s Convention, persons are prepared and excited to strive interacting with a robotic. Google’s official plunge into the world of synthetic intelligence and robotics has impressed most of the people to look deeper and take note of the way forward for robotics.


Determine 7: Rover at Cellular World Congress surrounded by a bunch of individuals.

Worry and apprehension has been changed by curiosity and enthusiasm. Controlling a machine has predominantly been performed by way of devoted {hardware}, unintuitive management panels, and workstations. That boundary is dissolving now as people can talk with machines by way of pure, instinctual interactions due to advancing developments that enable localization and mapping and gesture and facial recognition. Guests are astounded after they see they’ll management an autonomous cell robotic by way of hand gestures and facial expressions using the Ultrabook, Intel Perceptual Computing SDK, and Inventive Interactive Gesture Digital camera. Now we have encountered these responses throughout a really huge spectrum of people-young and previous, man and lady, home and worldwide.

6. Outlook

Not like many shopper robots in the marketplace in the present day, Rover is able to mapping out its surroundings with none exterior {hardware} like a distant management. It could possibly independently localize particular rooms in a house just like the kitchen, rest room, and bed room. If youring thumbs-up, and in response it is going to additionally sa’re on the workplace and have to inspect a sick baby at dwelling, you may merely command Rover to go to a particular room in your own home with out manually navigating it. Resembling a human, this robotic has short- and long-term reminiscence. Its long-term reminiscence is saved within the type of a map that enables it to maneuver independently. It could possibly acknowledge and subsequently maneuver round furnishings, corners, and different architectural boundaries. Its short-term reminiscence is able to recognizing an object that unpredictably darts in entrance of the robotic, prompting it to cease till the 3D digital camera not detects any obstacles in its path. We’re trying ahead to sharing additional particulars about robotic localization, mapping, and path-planning in future articles.

We see the potential for widespread use and adoption of Perceptual Computing know-how is huge. Professions and industries that embody the “human touch” from healthcare to hospitality might reap probably the most advantages from Perceptual Computing know-how. Basically, as human beings all of us search to know and be understood, and the most effective applied sciences are people who make life simpler, extra environment friendly, or enhanced in an impactful means. Simultaneous localization and mapping and gesture and facial recognition all working collectively blur the traces between humanity and machines, bringing us nearer to the robots that may inhabit our realities and imaginations.

7. In regards to the Authors



Determine 8: Devy and Martin Wojtczyk with Rover.

Devy Tan-Wojtczyk is co-founder of Cubotix. She brings over 10 years of enterprise consulting expertise with purchasers from UCLA, GE, Vodafone, Blue Cross of California, Roche, Cooking.com, and New York Metropolis Division for the Growing older. She holds a BA in Worldwide Growth Research from UCLA and an MSW with a give attention to Growing older from Columbia College. For enjoyable one weekend she led a newly fashioned cross-functional workforce consisting of an thought generator, two builders, and a designer in enterprise and advertising and marketing efforts on the 48-hour HP Intel Social Good Hackathon, which resulted in a money award in recognition of know-how, innovation, and social impression. Devy was additionally competitively chosen to attend Y Combinator’s first ever Feminine Founders Convention.

Martin Wojtczyk is an award-winning software program engineer and know-how fanatic. Together with his spouse Devy he based Cubotix, a DIY group, creating sensible and inexpensive service robots for everyone. He graduated in pc science and earned his PhD (Dr. rer. nat.) in robotics from Technical College of Munich (TUM) in Germany after years of analysis within the R&D division of Bayer HealthCare in Berkeley. Talking engagements embrace Google DevFest West, Cellular World Congress, Maker Faire, and plenty of others within the worldwide software program engineering and robotics group. Prior to now 10 years he developed the total software program stack for a number of industrial autonomous cell service robots. He gained a number of awards in world programming competitions, was lately featured on Makezine.com, and acknowledged as an Intel Software program Innovator.

For extra such home windows sources and instruments from Intel, please go to the Intel® Developer Zone

Supply: https://software program.intel.com/en-us/articles/rover-a-lego-self-driving-car

Comments to: How to use Intel Perceptual Computing SDK for human-robot interface