ReAct Agent with DeepSeek on Vertex AI
DeepSeek: A New Contender in the LLM Arena
Look, regardless of what you think of DeepSeek, it’s in the game now and it’s a practical option worth evaluating! To make informed decisions, evaluate DeepSeek alongside other LLMs within the context of your specific requirements, taking advantage of its accessibility on Vertex AI.
In this post (and the corresponding GitHub repo), I will walk you through how to deploy DeepSeek models on Vertex AI endpoints and build ReAct Agent using Langchain so you can evaluate it’s performance.
Building a Custom LLM Wrapper
To bridge the gap between DeepSeek on Vertex AI and the LangChain framework, we begin by creating a custom LLM wrapper:
class CustomLLM(LLM):
model_name: str = Field(default="vertex-ai")
generation_config: Dict[str, Any] = Field(default_factory=dict)
@property
def _llm_type(self) -> str:
return "custom-vertex-ai"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prediction_request = {
"instances": [
{
"@requestFormat": "textGeneration",
"prompt": prompt,
"maxOutputTokens": self.generation_config.get("max_output_tokens", 2048),
"temperature": self.generation_config.get("temperature", 0.8),
"candidateCount": self.generation_config.get("candidate_count", 1),
}
]
}
response = endpoint.predict(instances=prediction_request["instances"])
return response.predictions[0] if response.predictions else ""
This CustomLLM
class encapsulates the interaction with the Vertex AI endpoint, allowing us to configure parameters like max_output_tokens
, temperature
, and candidate_count
.
Implementing the ReAct Agent Executor
Next, we create a ReActAgentExecutor
to orchestrate the agent's behavior:
class ReActAgentExecutor:
"""
A class to run the ReAct agent with specified configurations and tools.
"""
def __init__(
self,
model: str,
generation_config: Dict,
max_iterations: int,
max_execution_time: int,
google_api_key: str=GOOGLE_API_KEY,
cse_id: str=CSE_ID,
):
self.model = model
self.generation_config = generation_config
self.max_iterations = max_iterations
self.max_execution_time = max_execution_time
self.google_api_key = google_api_key
self.cse_id = cse_id
self.llm = None
self.tools = None
self.agent = None
self.agent_executor = None
self.token_callback = None
self._setup_llm()
self._setup_tools()
self._setup_agent()
def _setup_llm(self):
"""Initializes the custom LLM."""
self.llm = CustomLLM(model=self.model, generation_config=self.generation_config)
def _setup_tools(self):
"""Sets up the tools for the agent."""
search = GoogleSearchAPIWrapper(
google_api_key=self.google_api_key, google_cse_id=self.cse_id
)
self.tools = [
Tool(
name="Google Search",
func=search.run,
description="Useful for finding information on current events, comparisons, or diverse perspectives.",
),
]
def _setup_agent(self):
"""Sets up the ReAct agent and executor."""
prompt = hub.pull("hwchase17/react")
system_instruction = "Once you are done finding the answer, only return Yes or No"
prompt.template = system_instruction + "\n" + prompt.template
self.agent = create_react_agent(self.llm, self.tools, prompt)
self.token_callback = TokenCountingCallbackHandler(self.model)
self.agent_executor = AgentExecutor(
agent=self.agent,
tools=self.tools,
verbose=False,
handle_parsing_errors=True,
max_iterations=self.max_iterations,
max_execution_time=self.max_execution_time,
callbacks=[self.token_callback],
)
def run(self, input_data: Union[Dict, str]) -> Dict:
"""
Runs the agent with the given input data.
"""
if isinstance(input_data, str):
input_data = {"input": input_data}
start_time = time.time()
try:
result = self.agent_executor.invoke(input_data)
result["total_token"] = self.token_callback.total_token
self.token_callback.reset()
except Exception as e:
print(f"An error occurred: {e}")
result = {"error": str(e)}
end_time = time.time()
result["wall_time"] = end_time - start_time
return result
Token Counting Callback Handler
class TokenCountingCallbackHandler(BaseCallbackHandler):
"""Callback handler for counting tokens used by the language model."""
def __init__(self, model_name: str):
self.model_name = model_name
self.total_token = 0
def reset(self):
"""Reset the counters for the next chain run."""
self.total_token = 0
This callback handler provides insight into the token consumption during agent execution, valuable for cost optimization.
Putting It All Together
By combining DeepSeek models, Vertex AI, and the ReAct agent framework, we can build powerful applications capable of tackling complex tasks. This walkthrough provides a foundation for further exploration and experimentation.
For full implementation, including how to deploy DeepSeek models on vertex AI, check this github repo.
Thank you for being a part of the community
Before you go:
- Be sure to clap and follow the writer ️👏️️
- Follow us: X | LinkedIn | YouTube | Newsletter | Podcast
- Check out CoFeed, the smart way to stay up-to-date with the latest in tech 🧪
- Start your own free AI-powered blog on Differ 🚀
- Join our content creators community on Discord 🧑🏻💻
- For more content, visit plainenglish.io + stackademic.com