Skip to content

Commit

Permalink
first draft module overview and module / class documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
2pk03 committed Sep 22, 2023
1 parent a84e91e commit 65bc0b3
Show file tree
Hide file tree
Showing 21 changed files with 790 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `ProcessFeeder` class feeds input data to a Python process for execution within the Wayang framework. It ensures proper serialization of data and manages socket communication for data transmission.

### Attributes:
1. **socket (Socket)**: Socket for communication with the Python process.
2. **udf (PythonUDF<Input, Output>)**: The Python user-defined function to be executed by the process.
3. **serializedUDF (PythonCode)**: Serialized representation of the Python UDF.
4. **input (Iterable<Input>)**: Iterable over the input data for the Python UDF.

### Constructor:
- `ProcessFeeder(Socket socket, PythonUDF<Input, Output> udf, PythonCode serializedUDF, Iterable<Input> input)`:
- Initializes the feeder with socket, UDF, serialized UDF, and input data.
- Throws a `WayangException` if input is null.

### Methods:
1. **send()**:
- Sends serialized UDF and input data to the Python process over a socket.
- Writes `END_OF_DATA_SECTION` value to indicate the end of data.

2. **writeUDF(PythonCode serializedUDF, DataOutputStream dataOut)**:
- Writes serialized UDF to the provided data output stream.

3. **writeIteratorToStream(Iterator<Input> iter, DataOutputStream dataOut)**:
- Writes each item from the iterator to the data output stream.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `ProcessReceiver` class is responsible for receiving data from a Python process within the Wayang framework. It manages socket communication for data reception and provides methods to retrieve and print the received data.

### Attributes:
1. **iterator (ReaderIterator<Output>)**: Iterator over the data being read from the Python process.

### Constructor:
- `ProcessReceiver(Socket socket)`:
- Sets up a data input stream attached to the socket's input stream and initializes the `iterator` attribute.

### Methods:
1. **getIterable()**:
- Returns an iterable that uses the `iterator` attribute.

2. **print()**:
- Prints the received data to the console.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `PythonProcessCaller` class manages the execution of Python code within a separate process in the Wayang framework. It sets up socket communication for interaction with the Python process and provides methods to manage the process lifecycle.

### Attributes:
1. **process (Thread)**: Represents the Python process or worker being managed.
2. **socket (Socket)**: Socket for communication with the Python process.
3. **serverSocket (ServerSocket)**: Server socket to listen for connections from the Python process.
4. **ready (boolean)**: Indicates the readiness state of the Python process.
5. **configuration (Configuration)**: Configuration settings, potentially related to executing Python tasks.

### Constructor:
- `PythonProcessCaller(PythonCode serializedUDF)`:
- Initializes the process caller with a serialized Python UDF.
- Sets up a configuration, loads it from a properties file (`wayang-api-python-defaults.properties`).
- Initializes a server socket bound to a local address (127.0.0.1) and a dynamic port.
- Launches a separate thread to start a Python worker process, with the required environment set up, and expects it to connect back via the server socket.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `PythonWorkerManager` class manages the execution of Python UDFs within the Wayang framework. It handles Python worker subprocesses, feeding them input data, and potentially receiving results.

### Attributes:
1. **udf (PythonUDF<Input, Output>)**: The Python user-defined function that this worker manager is set to execute.
2. **serializedUDF (PythonCode)**: Holds a serialized representation of the Python UDF.
3. **inputIterator (Iterable<Input>)**: Represents an iterable over the input data for the Python UDF.

### Constructor:
- `PythonWorkerManager(PythonUDF<Input, Output> udf, PythonCode serializedUDF, Iterable<Input> input)`:
- Initializes the worker manager with a Python UDF, its serialized version, and input data.
- Assigns the provided parameters to the respective class attributes.

### Methods:
1. **execute()**:
- Creates an instance of the `PythonProcessCaller` class with the serialized UDF.
- If the worker (Python process) is ready, it does the following:
- Sets up a `ProcessFeeder` to feed input data to the Python process in a separate thread.
- Sets up a `ProcessReceiver` to receive the results from the Python process.
- Returns an iterable over the received data.
- If the worker is not ready, throws a `WayangException`.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `ReaderIterator` class provides iterator-based access to objects read from a data input stream. It efficiently processes data received from the Python process in a streaming manner.

### Attributes:
1. **nextObj (Output)**: The next object to be returned by the iterator.
2. **eos (boolean)**: Flag indicating the end of the data stream.
3. **fst (boolean)**: Unclear from the provided content; possibly a flag for some state.
4. **stream (DataInputStream)**: The input stream from which data is read.

### Constructor:
- `ReaderIterator(DataInputStream stream)`:
- Initializes the iterator with the data input stream.

### Methods:
1. **read()**:
- Reads and decodes an object from the stream.
- Handles end of stream and relevant exceptions.

2. **hasNext()**:
- Checks if there's another object to read.

3. **next()**:
- Returns the next object or throws an exception if there isn't one.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## Executor - wayang-api-python

### PythonWorkerManager Class
- Manages the execution of Python UDFs within the Wayang framework.
- Handles Python worker subprocesses, feeding them input data, and potentially receiving results.

### PythonProcessCaller Class
- Manages the execution of Python code within a separate process.
- Sets up socket communication for interaction with the Python process and provides methods to manage the process lifecycle.

### ProcessFeeder Class
- Feeds input data to a Python process for execution.
- Ensures proper serialization of data and manages socket communication for data transmission.

### ProcessReceiver Class
- Handles receipt of data from the Python process.
- Reads data over a socket connection and provides an iterator for accessing the received data.

### ReaderIterator Class
- An iterator over a data stream, specifically designed to read serialized data objects.
- Facilitates reading data sent from the Python process to the Java environment.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## function

### PythonFunctionWrapper Class
- Acts as a Java wrapper for Python functions.
- Implements `FunctionDescriptor.SerializableFunction<Iterable<Input>, Iterable<Output>>`.
- Attributes:
- `PythonUDF<Input, Output> myUDF`: Stores the Python UDF the wrapper class handles.
- `PythonCode serializedUDF`: Stores a serialized version of the Python UDF.

### PythonUDF Interface
- Represents Python functions to be used with the Wayang framework.
- Extends `FunctionDescriptor.SerializableFunction<Iterable<Input>, Iterable<Output>>`.
- Does not define any additional methods.

### PythonCode Class
- Represents serialized Python code snippets or blocks.
- Implements the `Serializable` interface for efficient data storage or transmission.
- Attribute `byte[] code`: Stores the serialized Python code.

---
32 changes: 32 additions & 0 deletions documentation/wayang-api/wayang-api-python/function/python_code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `PythonCode` class represents Python code in a serialized form within the Wayang Java framework. By implementing the `Serializable` interface, it ensures that the class can be serialized and deserialized, facilitating the transfer of code between Java and Python environments.

### Attributes:
1. **code (byte[])**: Serialized form of the Python code.

### Constructor:
- `PythonCode(byte[] code)`:
- Initializes the object with the serialized Python code.

### Methods:
1. **toByteArray()**:
- Returns the serialized Python code.

---

Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Class Definition
The `PythonFunctionWrapper` class serves as a bridge between the Java and Python UDFs in the Wayang framework. It ensures that Python functions can be seamlessly integrated into the Java environment of Wayang.

### Attributes:
1. **myUDF (PythonUDF<Input, Output>)**: Represents the Python user-defined function.
2. **serializedUDF (PythonCode)**: Serialized form of the Python UDF.

### Constructor:
- `PythonFunctionWrapper(PythonUDF<Input, Output> myUDF, PythonCode serializedUDF)`:
- Initializes the wrapper with a Python UDF and its serialized form.

### Methods:
1. **apply(Iterable<Input> input)**:
- Executes the Python UDF on the input data and returns the results.

---

28 changes: 28 additions & 0 deletions documentation/wayang-api/wayang-api-python/function/python_udf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Interface Definition
The `PythonUDF<Input, Output>` interface represents Python user-defined functions (UDFs) within the Wayang framework. It is designed to represent functions that handle iterable input and produce iterable output, ensuring seamless integration of Python functions with the Wayang Java framework.

### Inherits:
- `FunctionDescriptor.SerializableFunction<Iterable<Input>, Iterable<Output>>`: Represents serializable functions that operate on iterables.

### Key Points:
- The `PythonUDF` interface enforces a specific function signature on the classes that implement it.
- It plays a vital role in the bridge between Wayang's Java framework and Python UDFs.

---

Loading

0 comments on commit 65bc0b3

Please sign in to comment.