Artificial Intelligence in Plain English

New AI, ML and Data Science articles every day. Follow to join our 3.5M+ monthly readers.

Follow publication

ReAct Agent with DeepSeek on Vertex AI

--

DeepSeek: A New Contender in the LLM Arena

Look, regardless of what you think of DeepSeek, it’s in the game now and it’s a practical option worth evaluating! To make informed decisions, evaluate DeepSeek alongside other LLMs within the context of your specific requirements, taking advantage of its accessibility on Vertex AI.

In this post (and the corresponding GitHub repo), I will walk you through how to deploy DeepSeek models on Vertex AI endpoints and build ReAct Agent using Langchain so you can evaluate it’s performance.

Building a Custom LLM Wrapper

To bridge the gap between DeepSeek on Vertex AI and the LangChain framework, we begin by creating a custom LLM wrapper:

class CustomLLM(LLM):
model_name: str = Field(default="vertex-ai")
generation_config: Dict[str, Any] = Field(default_factory=dict)

@property
def _llm_type(self) -> str:
return "custom-vertex-ai"

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prediction_request = {
"instances": [
{
"@requestFormat": "textGeneration",
"prompt": prompt,
"maxOutputTokens": self.generation_config.get("max_output_tokens", 2048),
"temperature": self.generation_config.get("temperature", 0.8),
"candidateCount": self.generation_config.get("candidate_count", 1),
}
]
}
response = endpoint.predict(instances=prediction_request["instances"])
return response.predictions[0] if response.predictions else ""

This CustomLLM class encapsulates the interaction with the Vertex AI endpoint, allowing us to configure parameters like max_output_tokens, temperature, and candidate_count.

Implementing the ReAct Agent Executor

Next, we create a ReActAgentExecutor to orchestrate the agent's behavior:


class ReActAgentExecutor:
"""
A class to run the ReAct agent with specified configurations and tools.
"""

def __init__(
self,
model: str,
generation_config: Dict,
max_iterations: int,
max_execution_time: int,
google_api_key: str=GOOGLE_API_KEY,
cse_id: str=CSE_ID,
):
self.model = model
self.generation_config = generation_config
self.max_iterations = max_iterations
self.max_execution_time = max_execution_time
self.google_api_key = google_api_key
self.cse_id = cse_id
self.llm = None
self.tools = None
self.agent = None
self.agent_executor = None
self.token_callback = None

self._setup_llm()
self._setup_tools()
self._setup_agent()

def _setup_llm(self):
"""Initializes the custom LLM."""
self.llm = CustomLLM(model=self.model, generation_config=self.generation_config)

def _setup_tools(self):
"""Sets up the tools for the agent."""
search = GoogleSearchAPIWrapper(
google_api_key=self.google_api_key, google_cse_id=self.cse_id
)
self.tools = [
Tool(
name="Google Search",
func=search.run,
description="Useful for finding information on current events, comparisons, or diverse perspectives.",
),
]

def _setup_agent(self):
"""Sets up the ReAct agent and executor."""
prompt = hub.pull("hwchase17/react")
system_instruction = "Once you are done finding the answer, only return Yes or No"
prompt.template = system_instruction + "\n" + prompt.template

self.agent = create_react_agent(self.llm, self.tools, prompt)
self.token_callback = TokenCountingCallbackHandler(self.model)
self.agent_executor = AgentExecutor(
agent=self.agent,
tools=self.tools,
verbose=False,
handle_parsing_errors=True,
max_iterations=self.max_iterations,
max_execution_time=self.max_execution_time,
callbacks=[self.token_callback],
)

def run(self, input_data: Union[Dict, str]) -> Dict:
"""
Runs the agent with the given input data.
"""

if isinstance(input_data, str):
input_data = {"input": input_data}

start_time = time.time()
try:
result = self.agent_executor.invoke(input_data)
result["total_token"] = self.token_callback.total_token
self.token_callback.reset()
except Exception as e:
print(f"An error occurred: {e}")
result = {"error": str(e)}
end_time = time.time()
result["wall_time"] = end_time - start_time

return result

Token Counting Callback Handler

class TokenCountingCallbackHandler(BaseCallbackHandler):
"""Callback handler for counting tokens used by the language model."""
def __init__(self, model_name: str):
self.model_name = model_name
self.total_token = 0

def reset(self):
"""Reset the counters for the next chain run."""
self.total_token = 0

This callback handler provides insight into the token consumption during agent execution, valuable for cost optimization.

Putting It All Together

By combining DeepSeek models, Vertex AI, and the ReAct agent framework, we can build powerful applications capable of tackling complex tasks. This walkthrough provides a foundation for further exploration and experimentation.

For full implementation, including how to deploy DeepSeek models on vertex AI, check this github repo.

Thank you for being a part of the community

Before you go:

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in Artificial Intelligence in Plain English

New AI, ML and Data Science articles every day. Follow to join our 3.5M+ monthly readers.

Written by Amir imani

GenAI @google | Teaching ML @Pratt | Minted @Columbia_dsi |

No responses yet

Write a response